Enable job alerts via email!

[Hiring] Site Reliability Engineer @Platform.sh

Platform.sh

United States

Remote

USD 80,000 - 130,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Site Reliability Engineer to enhance system reliability and efficiency in a remote work setting. This role is pivotal in transitioning traditional operations to an automation-driven model, focusing on improving infrastructure and streamlining processes. You'll collaborate with cross-functional teams to ensure reliability throughout the application lifecycle while driving technical innovation. Join a diverse and inclusive global team that values your contributions and offers a flexible work environment. This is a unique opportunity to make a significant impact in a company dedicated to transforming how businesses manage web applications.

Benefits

Flexible PTO
Company stock options
Professional development budget
Office equipment budget
Wellness budget
Annual team gatherings
Internet reimbursement
Inclusive parental leave
Remote work travel program

Qualifications

  • Solid understanding of DevOps, Cloud Operations, or SRE principles.
  • Hands-on experience with Linux systems and performance tuning.
  • Proficiency in programming languages like Go or Python.

Responsibilities

  • Enhance system monitoring with tools like Prometheus and Grafana.
  • Automate deployments using IaC tools like Terraform and Ansible.
  • Collaborate with teams to integrate reliability practices.

Skills

DevOps principles
Cloud Operations
SRE principles
Linux systems
Programming (Go, Python)
Scripting (Python, Bash)
Cloud platforms (AWS, GCP, Azure)
Containerization (Docker, Kubernetes)
Problem-solving

Tools

Prometheus
Grafana
ELK Stack
Terraform
Ansible

Job description

Mar 28, 2025 - Platform.sh is hiring a remote Site Reliability Engineer. Location: USA, UK, Canada, Germany, France, Spain.

About Platform.sh

Platform.sh is Platform-as-a-Service (PaaS) that removes the complexities of cloud infrastructure management and optimizes development-to-production workflows, reducing the time it takes to build and deploy applications. Delivering efficiency, reliability, and security, giving development teams both control and peace of mind. Built for developers, by developers.

Adopted and loved by 16,000+ developers, 7,000 customers, and for nearly a decade, Platform.sh has been providing innovative capabilities that serve as the launchpad for creative development teams’ out-of-the-box thinking.

We provide 24x7 support, managed cloud infrastructure, and automated security and compliance with an all-in-one PaaS. We give our customers complete control over their data by keeping applications secure and available around the clock.

Platformers are a remote, global workforce, and we thrive in a multicultural team. We are committed to open source and an open, welcoming environment. Our team spans the globe and the experience spectrum. What's our commonality, our cultural fabric? A curious spirit and a thirst for knowledge; an eagerness for innovative ideas and cultures. We believe we can build anything together in an environment that frees you to do your best work.

Bring your expertise and enthusiasm to our growing, global organization. Your contributions, collaboration, and unique point of view are recognized and valued here.

Impact of a Site Reliability Engineer

As a Site Reliability Engineer, you are a key part of our team’s transition to the Site Reliability Engineering (SRE) model, moving from traditional Cloud Operations to an automation-driven approach. This shift enhances system reliability, scalability, and efficiency, positioning SRE as a core function within the company.

Moreover, in this role, you focus on improving infrastructure, automating operational tasks, and streamlining processes. You work closely with developers, engineers, and product teams to ensure reliability is embedded throughout the application lifecycle.

As part of this transition, you also help optimize cloud-based systems, reduce manual work, and drive continuous improvements, playing a vital role in the organization’s overall success and long-term stability.

What to expect
  • Refine Monitoring and Observability: Enhance system monitoring with tools like Prometheus, Grafana, and ELK Stack, ensuring visibility and alignment with business objectives.
  • Automate Deployments and Workflows: Transition manual processes to automated solutions using IaC tools (e.g., Terraform, Ansible) to streamline deployments and improve operational efficiency.
  • Optimize CI/CD Pipelines: Improve pipeline architecture for fast, reliable releases, ensuring scalability and resilience to handle high volumes of changes.
  • Cloud Infrastructure Management: Help scale cloud-based systems on platforms like AWS, GCP, and Azure while minimizing technical debt and operational complexity.
  • Incident Response and Post-Mortem: Support incident management and lead post-mortem analysis, ensuring continuous improvement and knowledge sharing.
  • Collaborate with Cross-Functional Teams: Work closely with engineering and product teams to integrate reliability practices into the development lifecycle and prioritize reliability efforts.
  • Drive Technical Innovation: Introduce and champion new tools, technologies, and practices that improve system reliability, performance, and scalability.
What you bring
  • DevOps, Cloud Operations, or SRE Expertise: A solid understanding of DevOps, Cloud Operations, or SRE principles, with a focus on reliability and scalability.
  • Advanced Linux Internals Expertise: Hands-on experience with Linux systems, including performance tuning, kernel configurations, and troubleshooting.
  • Programming Languages: Proficiency in programming languages such as Go (preferred) or Python, with a focus on building tools and automating processes.
  • Scripting Skills: Strong skills in scripting languages like Python, Bash, or Go to automate workflows, streamline tasks, and manage infrastructure.
  • Cloud Infrastructure Knowledge: Extensive experience with cloud platforms like AWS, GCP, and Azure, along with expertise in monitoring/logging frameworks and CI/CD pipelines.
  • Containerization and Orchestration: Hands-on experience with Docker, Kubernetes, and other containerization technologies for building and deploying scalable applications is a nice to have.
  • Problem-Solving and Collaboration: Strong problem-solving skills, system design experience, and the ability to collaborate effectively across teams.
Where we hire

At Platform.sh, remote work isn't just a trend - it's our foundation. The freedom of remote work with the support of a diverse, global team has been our successful model for nearly a decade. Our culture celebrates flexibility and collaboration, and while we have team members in over 30 countries around the globe, we are currently focused on hiring for this role in France, Germany, Spain, the United Kingdom, and the West Coast in the United States or Canada. Although we’re unable to provide visa sponsorship at this time, we welcome applications from all qualified candidates who are legally authorized to work in these countries.

How we hire

We know that a great hire won’t meet every requirement that we’ve outlined. If you can see yourself elevating the team, we want to hear your story. Few of us would be here had we not taken a chance.

You can expect4 interviews on Google Meet to follow the order below. Should you successfully move through the entire process you will have the opportunity to meet with a variety of Platformers. Our goal is to ensure you can make the most informed decision on whether this role, and our culture aligns with what you’re looking for in your future working environment.

  1. 45 Minutes with Talent Acquisition
  2. 60 Minutes with Hiring Manager (Director, Site Reliability Engineering)
  3. 60 Minutes with Team (Site Reliability Engineer, Director, Site Reliability Engineering)
  4. 60 Minutes with Executive (Senior Director, Site Reliability Engineering)

All roles require background checks.

What we offer

A product you can believe in - Join us in transforming how businesses build and manage web applications, driven making a positive impact as a proud B Corp.

An Award-Winning Workplace - We’ve been recognized by Forbes’ Top 30 Companies for Remote Jobs and France’s Best Workplaces for Women.

A culture that values your voice - Join a flexible, open, and inclusive work environment where your voice is encouraged, and your ideas shape our growth and evolution.

A global team - Collaborate with colleagues from diverse backgrounds across the world, embracing different perspectives

Benefits and perks - Make the most of what matters to you

Flexible PTO

Company stock options

Professional development budget

Office equipment budget

️ Wellness budget

Annual team gatherings

???? Internet reimbursement

Inclusive parental leave

️ Remote work travel program

You belong here

At Platform.sh, we celebrate diversity in all its forms and are committed to fostering an inclusive, equitable, and supportive workplace where everyone can thrive. We embrace and value different perspectives, backgrounds, and experiences, because they make us stronger as a team. Whoever you are, wherever you're from, and whatever path you've taken, you are welcome here. We encourage you to bring your whole self to work, connect with others, and share your passion.

If you need accommodations at any stage of our hiring process, please let us know. We're here to ensure an accessible and comfortable experience for you.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.