Enable job alerts via email!

Site Reliability Engineer - Plex

Rockwell Automation

Milwaukee (WI)

On-site

USD 100,000 - 130,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in automation is seeking a Site Reliability Engineer focused on enhancing a Kubernetes-based platform. The role includes managing platform availability, improving automation with tools such as Terraform and Helm, and collaborating with developers to optimize workflows. Ideal candidates will have a background in Kubernetes management and infrastructure operations.

Benefits

Health Insurance including Medical, Dental and Vision
401k
Paid Time off
Parental and Caregiver Leave
Flexible Work Schedule

Qualifications

  • 5+ years of experience with Kubernetes in production.
  • Experience with Azure and vSphere as infrastructure providers.
  • Familiarity with GitOps practices.

Responsibilities

  • Manage and improve Kubernetes platform for high availability.
  • Implement infrastructure automation with Terraform and Helm.
  • Troubleshoot production incidents and perform root cause analysis.

Skills

Kubernetes management
Infrastructure as code
Networking
CI/CD optimization
Troubleshooting

Education

Bachelor's degree or equivalent work experience

Tools

Terraform
Helm
Docker
OpenTelemetry
Elastic Stack

Job description

Join to apply for the Site Reliability Engineer - Plex role at Rockwell Automation

Continue with Google Continue with Google

Join to apply for the Site Reliability Engineer - Plex role at Rockwell Automation

Rockwell Automation is a global technology leader focused on helping the world’s manufacturers be more productive, sustainable, and agile. With more than 28,000 employees who make the world better every day, we know we have something special. Behind our customers - amazing companies that help feed the world, provide life-saving medicine on a global scale, and focus on clean water and green mobility - our people are energized problem solvers that take pride in how the work we do changes the world for the better.

We welcome all makers, forward thinkers, and problem solvers who are looking for a place to do their best work. And if that’s you we would love to have you join us!

Job Description

Position Overview:

We are looking for a Site Reliability Engineer to join our Container Platform Team. You will design, maintain, and scale our Kubernetes-based platform to ensure high availability, security, and performance. You will work closely with development, security, and infrastructure teams to automate operations, improve multi-cluster management, and enhance developer workflows. You will participate in an on-call rotation to support critical platform operations.

You will report to a Manager, Software Engineering.

Your Responsibilities

  • Manage, maintain, and improve our Kubernetes platform, ensuring high availability and scalability.
  • Implement infrastructure as code (Terraform, Helm, Flux, Kustomize) to automate platform operations.
  • Enhance observability and logging using OpenTelemetry and Elastic Stack.
  • Improve networking and security policies within Kubernetes (e.g., Istio, Cilium, and Network Policies).
  • Support developers by optimizing CI/CD pipelines and containerized application deployment workflows.
  • Troubleshoot production incidents, perform root cause analysis, and drive reliability improvements.
  • Evaluate and implement cloud-native technologies to enhance platform efficiency.
  • Collaborate with security teams to ensure best practices for container security and compliance.
  • Work with multi-cluster management solutions such as Rancher, Cluster API (CAPI), or other Kubernetes fleet management tools.
  • Manage Kubernetes infrastructure on Azure and vSphere.
  • Participate in an on-call rotation to support platform operations and respond to incidents.

The Essentials - You Will Have

  • Bachelor's degree or equivalent years of relevant work experience.
  • Legal authorization to work in the U.S. We will not sponsor individuals for employment visas, now or in the future, for this job opening.

The Preferred - You Might Also Have

  • Typically requires 5+ years of experience working with Kubernetes in a production environment.
  • Proficiency in Terraform, Helm, and Kubernetes manifests for infrastructure automation.
  • Strong experience with networking (CNI, Istio, Ingress controllers, and multi-cluster networking).
  • Experience with Linux administration and container runtimes (Docker, containerd).
  • Familiarity with observability tools (OpenTelemetry, Elastic Stack).
  • Experience managing multi-cluster Kubernetes environments using Rancher or Cluster API (CAPI).
  • Experience with RBAC, security policies, and secrets management in Kubernetes.
  • Hands-on experience with Azure and vSphere as Kubernetes infrastructure providers.
  • Experience with GitOps practices (FluxCD, ArgoCD).
  • Prior experience in SRE or Platform Engineering roles.
  • Knowledge of database management in Kubernetes (e.g., PostgreSQL, MySQL, or distributed storage solutions like Ceph or Longhorn).

What We Offer

  • Health Insurance including Medical, Dental and Vision
  • 401k
  • Paid Time off
  • Parental and Caregiver Leave
  • Flexible Work Schedule where you will work with your manager to enjoy a work schedule that can be flexible with your personal life.
  • To learn more about our benefits package, please visit at www.raquickfind.com.

At Rockwell Automation we are dedicated to building a diverse, inclusive and authentic workplace, so if you're excited about this role but your experience doesn't align perfectly with every qualification in the job description, we encourage you to apply anyway. You may be just the right person for this or other roles.



We are an Equal Opportunity Employer including disability and veterans.

If you are an individual with a disability and you need assistance or a reasonable accommodation during the application process, please contact our services team at +1 (844) 404-7247.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology
  • Industries
    Automation Machinery Manufacturing

Referrals increase your chances of interviewing at Rockwell Automation by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles.

Continue with Google Continue with Google

Continue with Google Continue with Google

Site Reliability Engineer III - IntelliScript (Remote)
Senior Site Reliability / Gitops Engineer
Python and Kubernetes Software Engineer - Data, AI/ML & Analytics
Software Engineer (Python/Linux/Packaging)
Python and Kubernetes Software Engineer - Data, Workflows, AI/ML & Analytics
Software Engineer - Solutions Engineering
Software Engineer, Ceph & Distributed Storage
Senior Software Engineer (Remote) - React, Node
Software Engineer - packaging - optimize Ubuntu Server for public clouds
Python Software Engineer - Ubuntu Hardware Certification Team
Distributed Systems Software Engineer, Python / Go
Golang System Software Engineer - Containers / Virtualisation
Graduate Software Engineer, Open Source and Linux, Canonical Ubuntu
System Software Engineer - GCC/LLVM compiler, tooling, and ecosystem
Software Engineer - packaging - optimize Ubuntu Server
Software Engineer - packaging - optimize Ubuntu Server

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer - Plex

MedStar Health

Milwaukee null

Remote

Remote

USD 90,000 - 130,000

Full time

Yesterday
Be an early applicant