Enable job alerts via email!

Site Reliability Engineer III

ZipRecruiter

Atlanta (GA)

Remote

USD 142,000 - 181,000

Full time

9 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as a Senior Site Reliability Engineer, where you'll be pivotal in shaping and optimizing the infrastructure that powers a leading email platform. This role offers the chance to solve complex challenges in distributed systems while collaborating with talented teams to drive improvements and ensure high reliability standards. With flexible remote work options and a commitment to employee well-being through comprehensive benefits, this position is perfect for those looking to make a significant impact in the tech industry. If you're passionate about innovation and reliability, we encourage you to apply!

Benefits

Comprehensive medical, dental, and vision plans
Virtual counseling resources
401(k) plans with employer matching
Generous paid leave
Paid parental leave
Flexible remote work options
Paid volunteer time

Qualifications

  • Strong background in infrastructure and software engineering focused on reliability.
  • Experience with cloud platforms and configuration management tools.

Responsibilities

  • Collaborate to define and implement system requirements.
  • Design and maintain cloud-based microservices infrastructure.
  • Mentor junior engineers and enhance team growth.

Skills

Cloud Platforms (GCP, AWS)
Configuration Management (Terraform, Ansible)
Monitoring Tools (Prometheus, Grafana)
Distributed Databases (Cassandra, Elasticsearch)
Coding (Python, Go)
Production Linux Systems
CI/CD Automation
Containerization
Distributed Systems Problem Solving

Job description

Job Description

At Sinch Mailgun, we're building the infrastructure that powers communication at internet scale. As one of the largest email providers in the world, our platform delivers billions of emails every day for developers, startups, and global enterprises alike.

We’re looking for a Senior Site Reliability Engineer to join our SRE team, responsible for keeping our systems fast, reliable, and secure. In this role, you will help shape, scale, and optimize the infrastructure that underpins each Mailgun service. You’ll collaborate with product engineering teams to drive improvements, automate workflows, and ensure our systems meet high reliability standards.

This role involves engineering the future of a trusted platform, solving complex distributed systems challenges, and innovating in email infrastructure design and operation.

Responsibilities
  1. Collaborate with teams to define and implement system requirements.
  2. Design, build, and maintain cloud-based microservices infrastructure.
  3. Automate routine operational tasks and remediation processes to enhance efficiency and reliability.
  4. Proactively resolve issues, working with support and engineering teams, and using monitoring tools to maintain system health.
  5. Ensure datastores operate efficiently, meeting performance and availability goals.
  6. Mentor junior engineers and share best practices to foster team growth.
  7. Plan and execute strategies for scaling systems and infrastructure as needed.
Requirements
  • Strong background in infrastructure, operations, or software engineering focused on reliability.
  • Experience with cloud platforms such as GCP or AWS.
  • Proficiency with configuration management tools like Terraform and Ansible.
  • Hands-on experience with monitoring and observability tools such as Prometheus and Grafana.
  • Experience with distributed databases (e.g., Cassandra, Elasticsearch) at scale.
  • Familiarity with distributed event stores and stream-processing platforms.
  • Strong coding skills in languages like Python or Go.
  • Experience maintaining production Linux systems and cloud infrastructure.
  • Ability to architect solutions for complex challenges and lead initiatives from conception to execution.
  • Excellent interpersonal and communication skills for cross-functional collaboration.
  • Ability to mentor junior engineers and promote a collaborative team environment.
  • Experience with container orchestration platforms.
  • Expertise in CI/CD automation and infrastructure as code.
  • Knowledge of network architecture and security in cloud environments.
  • Experience with containerization and microservices architectures.
  • Advanced problem-solving skills in distributed systems.
Our Hiring Process

We are committed to a fair, objective, and inclusive recruitment process, including pre-employment assessments to support diversity and high performance. Even if you don’t meet all requirements, consider applying to help us pioneer new ways of communication.

Benefits
  • Health: Comprehensive medical, dental, and vision plans, including telehealth.
  • Self-care: Virtual counseling resources through our Employee Assistance Program.
  • Future security: Roth and Pre-tax 401(k) plans with employer matching.
  • Time off: Generous paid leave to rest and rejuvenate.
  • Family: Paid parental leave and family support.
  • Remote work: Flexible options to work from anywhere.
  • Impact: Paid volunteer time to support your community.

We are an equal opportunity employer. All qualified applicants will be considered regardless of creed, veteran status, marital status, gender identity, citizenship, or other protected classes.

The starting salary range is $142,768 - $180,960, depending on factors like location, skills, and experience. Applications are accepted until 4/28/25, with possible flexibility for thorough candidate evaluation.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer III

Sinch

Atlanta

Remote

USD 142,000 - 181,000

4 days ago
Be an early applicant

Senior Site Reliability Engineer

Circle

Atlanta

On-site

USD 120,000 - 195,000

10 days ago

Engineer III - Data Reliability Engineer (Remote)

CrowdStrike Holdings, Inc.

Austin

Remote

USD 110,000 - 180,000

7 days ago
Be an early applicant

Database Reliability Engineer III - Data Services (Remote, CAN)

CrowdStrike

Remote

CAD 110,000 - 180,000

30+ days ago