Enable job alerts via email!

Site Reliability Engineer - Remote

Optum

Basking Ridge (NJ)

Remote

USD 110,000 - 115,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading healthcare organization is seeking a Site Reliability Engineer to enhance system reliability and performance. This remote role involves leading initiatives, collaborating with teams, and driving automation in a dynamic environment. Candidates should have extensive experience in cloud platforms, programming, and leadership within SRE teams. Join us to make a significant impact on health equity and organizational efficiency.

Qualifications

  • 6+ years of site reliability engineering experience.
  • 3+ years in a leadership or technical lead role.
  • 3+ years of programming skills in Python, Go, or Java.

Responsibilities

  • Design, implement, and maintain scalable infrastructure solutions.
  • Lead the Site Reliability Engineering team in automating processes.
  • Manage incident response efforts and conduct root cause analyses.

Skills

Leadership
Automation
Cloud Platforms
Programming

Tools

Docker
Kubernetes
Terraform
Ansible
Prometheus
Grafana
Jenkins
GitLab CI

Job description

Join to apply for the Site Reliability Engineer - Remote role at Optum

Join to apply for the Site Reliability Engineer - Remote role at Optum

Get AI-powered advice on this job and more exclusive features.

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health equity on a global scale. Join us to start Caring. Connecting. Growing together.

Software engineering is the application of engineering to the design, development, implementation, testing and maintenance of software in a systematic method. The roles in this function will cover all primary development activity across all technology functions that ensure we deliver code with high quality for our applications, products and services and to understand customer needs and to develop product roadmaps.

These roles include, but are not limited to analysis, design, coding, engineering, testing, debugging, standards, methods, tools analysis, documentation, research and development, maintenance, new development, operations and delivery. With every role in the company, each position has a requirement for building quality into every output. This also includes evaluating new tools, new techniques, strategies; Automation of common tasks; build of common utilities to drive organizational efficiency with a passion around technology and solutions and influence of thought and leadership on future capabilities and opportunities to apply technology in new and innovative ways.

You’ll enjoy the flexibility to work remotely * from anywhere within the U.S. as you take on some tough challenges.

Primary Responsibilities

  • Lead digital-first initiatives for the UHC Provider Portal to improve customer experience
  • Design, implement, and maintain scalable, reliable, and secure infrastructure solutions to support application deployment and operational excellence
  • Develop and manage comprehensive monitoring, alerting, and incident response systems to ensure high availability and optimal performance of services
  • Lead the Site Reliability Engineering team in automating processes, reducing manual interventions, and enhancing system efficiencies through innovative tooling
  • Collaborate with development, product management, and architecture teams to integrate reliability and performance best practices into the software development lifecycle
  • Drive the creation and upkeep of documentation for system architectures, operational procedures, and SRE best practices to ensure knowledge sharing and consistency
  • Manage incident response efforts, conduct root cause analyses, and implement preventive measures to minimize downtime and enhance system resilience
  • Mentor and develop team members, fostering a culture of continuous improvement, learning, and professional growth
  • Align SRE initiatives with organizational goals, ensuring that reliability, security, and performance objectives support overall business strategies
  • Advocate for and implement security best practices within infrastructure and operational processes to safeguard systems and data

You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.

Required Qualifications

  • 6+ years of site reliability engineering experience, including hands-on management of large-scale, distributed systems
  • 6+ years of experience with public cloud platforms such as AWS, Azure, or Google Cloud, with proficiency in at least two major services within each platform
  • 3+ years in a leadership or technical lead role, overseeing SRE teams and driving reliability-focused initiatives
  • 3+ years of experience in containerization and orchestration technologies, including Docker and Kubernetes, with a minimum of 5 years of relevant experience
  • 3+ years of programming and scripting skills in languages such as Python, Go, or Java, with 5+ years of hands-on development experience

Preferred Qualifications

  • 4+ years of experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack) to ensure system health and performance
  • 4+ years of experience with infrastructure as code (IaC) tools, such as Terraform or Ansible
  • 4+ years of incident management, including root cause analysis and post-incident reviews
  • 3+ years of experience with CI/CD pipelines and automation, utilizing tools like Jenkins, GitLab CI, or similar, with at least 5 years of experience
  • 3+ years of enterprise security best practices, including implementing security measures within SRE processe
  • 3+ years of experience with microservices architecture and deploying microservices at scale
  • All employees working remotely will be required to adhere to UnitedHealth Group’s Telecommuter Policy

At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes — an enterprise priority reflected in our mission.

UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations.

UnitedHealth Group is a drug free workplace. Candidates are required to pass a drug test before beginning employment.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology
  • Industries
    Hospitals and Health Care

Referrals increase your chances of interviewing at Optum by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles.

Edison, NJ
$110,000.00
-
$115,000.00
2 days ago

Software Engineer - Full Stack Developer

Iselin, NJ $100,000 - $105,000 3 weeks ago

Intern, Embedded and Platform Software Engineer, Summer 2025
Software Development Engineer - ADAS Parking Feature

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

FIS

New York

Remote

USD 84,000 - 143,000

Today
Be an early applicant

Site Reliability Engineer

Altimetrik

Austin

Remote

USD 85,000 - 300,000

Today
Be an early applicant

Senior Site Reliability Engineers

Centene Corporation

Clayton

Remote

USD 112,000 - 159,000

Today
Be an early applicant

Software Engineering Site Reliability Engineer Professional JERSEY CITY, US

Avature

New Jersey

Remote

USD 111,000 - 191,000

13 days ago

Site Reliability Engineer (Middle)

Agileengine

Remote

USD 90,000 - 130,000

Today
Be an early applicant

Lead Site Reliability Engineer (Remote)

Livepeer

New York

Remote

USD 90,000 - 150,000

21 days ago

System Safety Engineer

Leidos

Huntsville

Remote

USD 89,000 - 163,000

2 days ago
Be an early applicant

Site Reliability Engineer

Diverse Lynx

Los Angeles

Remote

USD 100,000 - 130,000

3 days ago
Be an early applicant

Site Reliability Engineer

Pythian

Remote

USD 90,000 - 150,000

4 days ago
Be an early applicant