Enable job alerts via email!

Site Reliability Engineer

The Hartford

Hartford (CT)

Hybrid

USD 90,000 - 136,000

Full time

2 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

The Hartford's CARE - RE&A Organization is seeking a Senior Reliability Engineer to lead infrastructure resilience efforts, ensuring stability and performance in cloud and SAAS environments. This role requires a Bachelor's or Master's in Computer Science or Engineering and 5+ years of relevant experience. You'll work on innovative solutions and automate processes to enhance efficiency and reliability.

Qualifications

5+ years of experience in Infrastructure Engineering, SRE, or DevOps.
Hands-on experience with observability tools and IaC.
Expertise in cloud platforms and microservices environments.

Responsibilities

Lead infrastructure resilience efforts for cloud and SAAS environments.
Develop tooling for automation and problem resolution.
Collaborate with stakeholders for operational excellence.

Skills

Strong technical skills

Analytical skills

Problem-solving

Interpersonal skills

Education

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field

Tools

Prometheus

Splunk

Dynatrace

Terraform

AWS

Azure

Kubernetes

We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.

The Hartford’s CARE - RE&A Organization is seeking a highly motivated, detail-oriented, and results-driven Reliability Engineer (Senior) to join our team. This position will play a crucial role to lead infrastructure resilience in ensuring the stability and performance of our systems in cloud and SAAS environments.

Successful candidates will be expected to demonstrate strong technical skills, excellent partnership with stakeholders and partner teams, willingness to understand existing processes and systems, solid technical acumen, experience in delivering quality technical solutions and ensure the systems are stable, performant, and secure.

Primary responsibilities of the position are the following:

Responsibilities:

Assist in the use of best-in-class software engineering standards and design practices for instrumenting code/application technology stack to enable the generation of relevant metrics on overall technology health - availability, performance, quality, currency and resiliency.

Assist the architecture and software engineering teams to influence the technical strategy for the organization, keeping in mind its cross-functional impacts, integration across the organization, and architecture rationalization.

Assist on a team as a technical leader for the applications supported, requiring depth and breadth of knowledge in technologies, applications, integration, interfaces and business domain.

DevSecOps Solution Responsibilities:

Assist in developing effective tooling, alerts, and response mechanisms to identify and address reliability risks leveraging automation to support problem prevention, detection, mitigation, and resolution.

Assist in enhancing the delivery flow by engineering the appropriate solutions to increase delivery speed while adhering to technology standards for sustained reliability.

Partner to implement preventative controls and drive increased automation and self-healing capabilities. Continue to improve cost efficiency baselines

Promote and implement innovative solutions.

IT Ops Responsibilities:

Ensure operational excellence. Collaborate to drive the triaging and service restoration of all high impact incidents in order to minimize the mean time to service restoration and impact to the business. Demonstrate end-to-end ownership.

Partner with infrastructure teams to design and implement intelligent incident routing, enhanced monitoring/alerting capabilities and automated service restoration processes. Take proactive measures to prevent high impactful incidents.

Achieve and maintain the continuity of Hartford and third-party assets that support a business function. Accountable for keeping the IT application and infrastructure metadata repositories current.

Research and implement AI-based anomaly detection to predict infrastructure failures and automate preventive measures.

Develop AI-powered troubleshooting copilots and LLM-driven operational assistants to accelerate incident resolution and root cause analysis.

Implement AI/ML-based runbooks to automate system recovery and optimize operational efficiency.

Qualifications:

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.

5+ years of experience in Infrastructure Engineering, Site Reliability Engineering (SRE), or DevOps.

Hands-on experience with observability tools: Prometheus, Splunk, Dynatrace, OpenTelemetry, CloudWatch.

Deep knowledge of Infrastructure as Code (IaC) with Terraform, CloudFormation, or CDK.

Proven ability to optimize CI/CD pipelines, automate deployments, and enforce DevSecOps best practices.

Expertise in cloud platforms (AWS, GCP, Azure) and Kubernetes-based microservices environments.

Strong proficiency in Python, Java for infrastructure automation and tooling development.

Experience in AI/ML frameworks for observability, predictive failure detection, and AI-driven troubleshooting.

Experience with Oracle and SQL Server relational database technologies. Knowledge of open-source database technologies is beneficial.

Demonstrated experience working within Agile frameworks and methodologies.

Excellent analytical, problem solving and interpersonal skills.

This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday).

Candidates must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position.

Compensation

The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford’s total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is:

$90,320 - $135,480

Reliability Engineer - IE08GE

Primary responsibilities of the position are the following:

Responsibilities:

Assist in the use of best-in-class software engineering standards and design practices for instrumenting code/application technology stack to enable the generation of relevant metrics on overall technology health - availability, performance, quality, currency and resiliency.
Assist the architecture and software engineering teams to influence the technical strategy for the organization, keeping in mind its cross-functional impacts, integration across the organization, and architecture rationalization.
Assist on a team as a technical leader for the applications supported, requiring depth and breadth of knowledge in technologies, applications, integration, interfaces and business domain.

DevSecOps Solution Responsibilities:

Assist in developing effective tooling, alerts, and response mechanisms to identify and address reliability risks leveraging automation to support problem prevention, detection, mitigation, and resolution.
Assist in enhancing the delivery flow by engineering the appropriate solutions to increase delivery speed while adhering to technology standards for sustained reliability.
Partner to implement preventative controls and drive increased automation and self-healing capabilities. Continue to improve cost efficiency baselines
Promote and implement innovative solutions.

IT Ops Responsibilities:

Ensure operational excellence. Collaborate to drive the triaging and service restoration of all high impact incidents in order to minimize the mean time to service restoration and impact to the business. Demonstrate end-to-end ownership.
Partner with infrastructure teams to design and implement intelligent incident routing, enhanced monitoring/alerting capabilities and automated service restoration processes. Take proactive measures to prevent high impactful incidents.
Achieve and maintain the continuity of Hartford and third-party assets that support a business function. Accountable for keeping the IT application and infrastructure metadata repositories current.

AI-Driven Automation:

Research and implement AI-based anomaly detection to predict infrastructure failures and automate preventive measures.
Develop AI-powered troubleshooting copilots and LLM-driven operational assistants to accelerate incident resolution and root cause analysis.
Implement AI/ML-based runbooks to automate system recovery and optimize operational efficiency.

Qualifications:

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
5+ years of experience in Infrastructure Engineering, Site Reliability Engineering (SRE), or DevOps.
Hands-on experience with observability tools: Prometheus, Splunk, Dynatrace, OpenTelemetry, CloudWatch.
Deep knowledge of Infrastructure as Code (IaC) with Terraform, CloudFormation, or CDK.
Proven ability to optimize CI/CD pipelines, automate deployments, and enforce DevSecOps best practices.
Expertise in cloud platforms (AWS, GCP, Azure) and Kubernetes-based microservices environments.
Strong proficiency in Python, Java for infrastructure automation and tooling development.
Experience in AI/ML frameworks for observability, predictive failure detection, and AI-driven troubleshooting.
Experience with Oracle and SQL Server relational database technologies. Knowledge of open-source database technologies is beneficial.
Demonstrated experience working within Agile frameworks and methodologies.
Excellent analytical, problem solving and interpersonal skills.

This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday).

Candidates must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position.

Compensation

$90,320 - $135,480

Equal Opportunity Employer/Sex/Race/Color/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age

About Us | Culture & Employee Insights | Diversity, Equity and Inclusion | Benefits

The Hartford Financial Services Group, Inc., usually known as The Hartford, is a United States-based investment and insurance company.

Notice

Talentify is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.

Talentify provides reasonable accommodations to qualified applicants with disabilities, including disabled veterans. Request assistance at accessibility@talentify.io or 407-000-0000.

Federal law requires every new hire to complete Form I-9 and present proof of identity and U.S. work eligibility.

An Automated Employment Decision Tool (AEDT) will score your job-related skills and responses. Bias-audit & data-use details: www.talentify.io/bias-audit-report . NYC applicants may request an alternative process or accommodation at aedt@talentify.io or 407-000-0000.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Lead Site Reliability Engineer - Java/ProC

Enterprise Holdings

St. Louis

Remote

USD 90,000 - 1,20,000

3 days ago

Be an early applicant