Join Team CARFAX as a Senior Site Reliability Engineer!
Isn't it time you bragged about where you work? At CARFAX, we do, every day. We pride ourselves on being mission-focused on helping to grow a brand built on accuracy and integrity. We care deeply about our products and our customers. We’re more than just a company: We help millions of consumers make more informed decisions every day. We know that our teammates are our most valuable asset, and we value a balanced life while tackling challenging projects in a fast-paced environment. One last thing: Our four-day week continues in Summer 2025!
This role has an expectation of 2 days in the London, ON office per week, subject to change based on future business needs.
What you'll be doing:
- Support DevOps at CARFAX as an engineer in our observability practice.
- Maintain the observability tool stack used by teams throughout CARFAX.
- Work in a dynamic, agile, team environment helping keep CARFAX’s applications up and running.
- Collaborate with engineering teams to design and build monitoring solutions
- Respond to major incidents. Help teams troubleshoot their products and restore service.
- Collaborate closely with DevOps and engineering teams to implement observability best practices.
- Reduce toil by creating observability automation that can be reused across our teams.
- Continuously analyze and evaluate our systems, products, and process for potential improvements
What we're looking for:
- Five or more years of experience with observability solutions.
- Experience with the following:
- Maintaining cloud infrastructure via IaC - Terraform preferred
- AWS EKS and monitoring solutions for K8s.
- Prometheus and Grafana to collect and visualize metrics.
- Platforms such as New Relic, DataDog or Splunk to collect metric and event data.
- Log management: experience operating and managing a large scale ELK track.
- Monitoring and alerting: experience analyzing applications and infrastructure and determining the right type of monitoring and alerting
- Experience with our tech stack: AWS (EKS), Prometheus / Grafana, Terraform / Consul / Vault, NodeJS / GoLang, Java.
- Strong believer in reducing toil for yourself and teammates.
- Ability to troubleshoot complex systems and help resolve major incidents.
- Strong communciation skills for documenting best practices to be implemented.
What’s in it for you:
- Competitive compensation, benefits and generous time-off policies
- 4-Day summer work weeks and a winter holiday break
- 401(k)/DCPPmatching
- Annual bonus program
- Casual, dog-friendly, and innovative office spaces
Don’t just take our word for it:
- 10X Virginia Business Best Places to Work
- 9X Washingtonian Great Places to Work
- 9X Washington Post Top Workplace
- St. Louis Post-Dispatch Best Places to Work