Enable job alerts via email!

Staff Software Engineer - Site Reliability

MedStar Health

United States

Remote

USD 120,000 - 170,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading healthcare company is seeking a Staff Software Engineer - Site Reliability to lead complex software solutions that enhance user experience and system performance. The candidate will collaborate across teams, ensuring reliability and scalability while managing incident responses and monitoring systems effectively.

Benefits

Excellent medical, dental, and vision benefits
Mental health benefits through TelaDoc
Prescription drug coverage
Generous paid time off, plus 13 paid holidays
Paid parental leave
100% vested 401(K) retirement plans
Educational assistance up to $2500 per year

Qualifications

  • 8-12 years of relevant work experience.
  • Proven experience as a Site Reliability Engineer.
  • Strong expertise in Kubernetes and container management.

Responsibilities

  • Lead design of scalable software solutions.
  • Monitor and alert systems to ensure performance.
  • Manage incidents and lead post-incident reviews.

Skills

Kubernetes
Cloud platforms
Infrastructure as code
CI/CD pipelines

Education

Bachelor's Degree in a related field

Tools

Terraform
New Relic
OpenTelemetry

Job description

The Staff Software Engineer - Site Reliability is responsible for all stages of the software development lifecycle using a variety of technologies and tools to build impactful software solutions. The scope of this job includes building and optimizing comprehensive solutions that prioritize end-user efficiency and experience.

This opening is with our Personal Care Engineering team. If the details below sound like you, we invite you to apply today and join us in shaping the future of healthcare!

Key Responsibilities:

  • Lead the design of complex software development features and ensure solutions are scalable, effective, and maintainable.
  • Collaborate with solution managers, designers, and other teams to gather requirements, translate them into technical specifications, and ensure alignment with priorities and project goals.
  • Analyze and solve complex technical problems, identify bottlenecks, and prepare technical documentation to optimize system performance.
  • Facilitate code reviews, provide constructive feedback, and lead by example in code quality, development best practices, and problem-solving approaches.
  • Ensure code meets functional and performance requirements, and advocate for high-quality software and ensure rigorous testing processes, including automated unit tests, integration tests, and other testing frameworks.
  • Leverage common GenAI tools for AI assisted development and understand the basics of prompt engineering.
  • Ensure the reliability, availability, and performance of our systems and services.
  • Work closely with various teams to build and maintain scalable, efficient, and resilient infrastructure.
  • Incident management; lead the response to system outages and incidents, ensuring quick resolution and minimal impact on end-users. Conduct post-incident reviews and implement improvements to prevent recurrence.
  • Monitoring and Alerting; design, implement, and maintain monitoring and alerting systems using tools like New Relic, Grafana, and ELK stack to ensure system health and performance.
  • Perform other job duties as assigned.

Required Qualifications:

  • Bachelor's Degree in a related field, or equivalent work experience
  • At least 8-12 years relevant work experience
  • Proven experience in a Site Reliability Engineer role
  • Strong expertise in Kubernetes and container management
  • Experience with cloud platforms such as Google Cloud Platform (GCP)
  • Familiarity with observability and APM tools (e.g., New Relic, OpenTelemetry)
  • Proficiency in infrastructure as code (e.g., Terraform)
  • Solid understanding of CI/CD pipelines and deployment automation

Preferred Qualifications:

  • Experience with Azure DevOps Pipelines and Argo CD
  • Strong networking fundamentals, including experience with Istio and service mesh technologies

Job Expectations:

  • Willing to work additional or irregular hours as needed
  • Must work in accordance with applicable security policies and procedures to safeguard company and client information
  • Must be able to sit and view a computer screen for extended periods of time

#LI-Remote

WellSky is where independent thinking and collaboration come together to create an authentic culture. We thrive on innovation, inclusiveness, and cohesive perspectives. At WellSky you can make a difference.

WellSky provides equal employment opportunities to all people without regard to race, color, national origin, ancestry, citizenship, age, religion, gender, sex, sexual orientation, gender identity, gender expression, marital status, pregnancy, physical or mental disability, protected medical condition, genetic information, military service, veteran status, or any other status or characteristic protected by law. WellSky is proud to be a drug-free workplace.

Applicants for U.S.-based positions with WellSky must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire. Certain client-facing positions may be required to comply with applicable requirements, such as immunizations and occupational health mandates.

Here are some of the exciting benefits full-time teammates are eligible to receive at WellSky:

  • Excellent medical, dental, and vision benefits
  • Mental health benefits through TelaDoc
  • Prescription drug coverage
  • Generous paid time off, plus 13 paid holidays
  • Paid parental leave
  • 100% vested 401(K) retirement plans
  • Educational assistance up to $2500 per year
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Staff Software Engineer - Site Reliability

WellSky Corporation

null null

Remote

Remote

USD 120,000 - 160,000

Full time

Yesterday
Be an early applicant

Staff Software Engineer - Site Reliability

WellSky

null null

Remote

Remote

USD 120,000 - 180,000

Full time

Yesterday
Be an early applicant

Staff Site Reliability Engineer

Wikimedia Foundation

null null

Remote

Remote

USD 129,000 - 201,000

Full time

9 days ago

Staff Software Engineer - Reliability

The Hartford

Hartford null

Hybrid

Hybrid

USD 126,000 - 190,000

Full time

4 days ago
Be an early applicant

Staff Site Reliability Engineer (Staff SRE) (Remote)

SailPoint

null null

Remote

Remote

USD 129,000 - 240,000

Full time

28 days ago

Sr. Staff Site Reliability Engineer

Davita Inc.

null null

Remote

Remote

USD 140,000 - 200,000

Full time

8 days ago

Senior / Staff Site Reliability Engineer

Scroll (scroll.io)

null null

Remote

Remote

USD 120,000 - 180,000

Full time

4 days ago
Be an early applicant

Staff Data Platform Engineer - (Remote - US)

Jobgether

null null

Remote

Remote

USD 120,000 - 160,000

Full time

27 days ago

Staff Data Platform Engineer San Francisco or remote

ClassDojo

null null

Remote

Remote

USD 146,000 - 208,000

Full time

30+ days ago