Enable job alerts via email!

Site Reliability Engineer

MacStadium

United States

Remote

USD 110,000 - 130,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Site Reliability Engineer to enhance their cloud solutions for Mac-based virtualization. This role involves supporting critical deployments, troubleshooting outages, and implementing monitoring systems. Join a passionate team dedicated to simplifying technology for businesses while enjoying a flexible work environment and competitive benefits. If you thrive in a dynamic setting and are eager to contribute to impactful projects, this opportunity is perfect for you.

Benefits

Competitive medical insurance
Generous paid time-off policies
Parental leave
Flexible work environment
401(k) matching
Continuing education
Wellness reimbursements
Free company swag

Qualifications

  • 5 years of experience required with a strong focus on Kubernetes and observability platforms.
  • Experience in troubleshooting production applications and monitoring systems.

Responsibilities

  • Support deployments at the core of data center and networking automation.
  • Define and implement monitoring and alerting for systems deployed with Kubernetes.

Skills

Kubernetes
Ansible + AWX
Terraform
AWS
Open-source observability tooling
Docker
JSON
SQL DB's (MySQL, Postgres, Maria)
Linux (Ubuntu, Fedora, CentOS, CoreOS)
CI/CD Tools
Agile development tools (Jira)
Shell scripting (Bash, Python)
Networking

Education

BA/BS in Computer Science

Tools

Docker
Kubernetes
Ansible
Terraform
AWS
Grafana
Prometheus
OpenTelemetry

Job description

Get AI-powered advice on this job and more exclusive features.

Meet MacStadium. We build cloud solutions to simplify Mac for business. We actively participate in and influence the Apple ecosystem in a cool way and have been a part of it since day one. Developers and end users at leading tech companies, big enterprises, and small teams rely on MacStadium’s innovative solutions every day. We have a passionate team of hard working, hard playing professionals with a big, shared vision. Come join us as we grow again!


What we need:

We are seeking a Site Reliability Engineer to join our growing Platform team. You will be responsible for supporting deployments at the core of our data center and networking automation capabilities, as well as infrastructure supporting Mac-based virtualization – all of which are run on Kubernetes.

The Site Reliability Engineer will help to define and implement monitoring and alerting for a variety of systems deployed with Kubernetes. They will work with counterparts in Ops to troubleshoot internal outages and diagnose problems with customer deployments.

This position is a part of our Software Development team, reporting to the VP – Engineering; Atlanta, GA location and Eastern Time zone is preferred.

MacStadium's current U.S. office locations are in Atlanta, GA and Las Vegas, NV. While it is ideal to have this position located in close proximity to one of our offices, we are open to filling the role remotely outside of the states of Georgia and Nevada (within the United States) for the right candidate.


What you will be doing:
  • Kubernetes
  • Ansible + AWX
  • Terraform
  • AWS
  • Open-source observability tooling (Loki, Grafana, Prometheus, OpenTelemetry)
  • Docker
  • JSON
  • SQL DB's (MySQL, Postgres, Maria, etc.)
  • Linux (Ubuntu, Fedora, CentOS, CoreOS)
  • CI/CD Tools (GitHub Actions, Jenkins, Bamboo, etc.)
  • Agile development tools (Jira)
  • Code and Image repositories (Git, GitHub, DockerHub, ECR)
  • Automated testing tools (Pytest, etc.)
  • Shell scripting (Bash, Python)
  • Networking - Understanding of DNS, TCP/IP, NAT, PAT, Routing, Load Balancing as well as Packet inspection tools (Wireshark, etc.)

What skills and experience you need to have:
  • BA/BS in Computer Science, Engineering or similar preferred
  • 5 years of experience and at least 3 - 5 years of professional experience with the top 10 technical skills
  • A background in tooling and supporting observability platforms for monitoring production applications and alerting on issues related to performance, stability, and reliability
  • Extensive troubleshooting experience with Kubernetes applications in production
  • Experience with preparation and organization of blameless postmortems
  • CKA or CKAD certification is a big plus

What you will get:

Day one benefits. Coverage starts on day one. We offer competitive medical insurance, health and dependent care spending accounts, health savings account, disability insurance, and company paid and voluntary life insurance.

Balanced life. We offer employees generous paid time-off policies, parental leave, holiday schedule, and a flexible work environment; MacStadium understands life also happens outside of work. Did we mention free company swag?

Solid future. Beyond competitive salary and 401(k) matching, MacStadium offers continuing education, professional development, and wellness reimbursements.

For California, Colorado, and Illinois applicants, the compensation range for this role is $110,000 to $130,000.


MacStadium has a defined Information Security Policy and all employees are required to adhere to this policy and sign an acknowledgment and receipt of this policy upon hire.

All offers of employment are conditioned upon successful completion of a background screening process and all employees must comply with the immigration rules and laws in the jurisdiction in which he/she/they will provide MacStadium services.

MacStadium is an Equal Opportunity Employer. All applicants are considered without regard to race, color, ancestry, national origin, gender/gender identity, sexual orientation, marital and family status, religion and religious belief, age, disability, results of genetic information, and service in the military.

Studies have shown that women and people of color are less likely to apply for jobs unless they believe they can perform every job description task. We are most interested in finding the best candidate for the job, and that candidate may come from a less traditional background. MacStadium may consider an equivalent combination of knowledge, skills, education, and experience to meet minimum qualifications. If you are interested in applying, we encourage you to think broadly about your background and skill set for the role.


No recruiting agencies please.
Seniority level
  • Mid-Senior level
Employment type
  • Full-time
Job function
  • Engineering and Information Technology
  • Industries
  • Software Development
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

Kforce Inc

Atlanta

Remote

USD 125,000 - 150,000

2 days ago
Be an early applicant

Site Reliability Engineer, Customer Security

Coalition Inc

Remote

USD 108,000 - 164,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer (Data Platforms SRE)

Wikimedia Foundation

Remote

USD 101,000 - 158,000

13 days ago

FlightAware- Sr. Site Reliability Engineer (Remote)

Pratt & Whitney

Remote

USD 101,000 - 203,000

6 days ago
Be an early applicant

Lead Site Reliability Engineer (Remote -CST)

Cognizant North America

Riverwoods

Remote

USD 81,000 - 142,000

6 days ago
Be an early applicant

Sr. Site Reliability Engineer

Dayforce

Remote

USD 80,000 - 120,000

2 days ago
Be an early applicant

Lead Site Reliability Engineer/Architect (Remote)

Cognizant

Riverwoods

Remote

USD 120,000 - 162,000

3 days ago
Be an early applicant

Lead Site Reliability Engineer/Architect (Remote)

Cognizant North America

Riverwoods

Remote

USD 120,000 - 162,000

7 days ago
Be an early applicant

Platform - Site Reliability Engineer I (FinOps)

Referral Board

Remote

USD 92,000 - 147,000

5 days ago
Be an early applicant