Enable job alerts via email!

Site Reliability Engineer - Plex

Rockwell Automation

Milwaukee (WI)

On-site

USD 100,000 - 130,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in automation is seeking a Site Reliability Engineer focused on enhancing a Kubernetes-based platform. The role includes managing platform availability, improving automation with tools such as Terraform and Helm, and collaborating with developers to optimize workflows. Ideal candidates will have a background in Kubernetes management and infrastructure operations.

Benefits

Health Insurance including Medical, Dental and Vision

401k

Paid Time off

Parental and Caregiver Leave

Flexible Work Schedule

Qualifications

5+ years of experience with Kubernetes in production.
Experience with Azure and vSphere as infrastructure providers.
Familiarity with GitOps practices.

Responsibilities

Manage and improve Kubernetes platform for high availability.
Implement infrastructure automation with Terraform and Helm.
Troubleshoot production incidents and perform root cause analysis.

Skills

Kubernetes management

Infrastructure as code

Networking

CI/CD optimization

Troubleshooting

Education

Bachelor's degree or equivalent work experience

Tools

Terraform

Helm

Docker

OpenTelemetry

Elastic Stack

Join to apply for the Site Reliability Engineer - Plex role at Rockwell Automation

Continue with Google Continue with Google

Join to apply for the Site Reliability Engineer - Plex role at Rockwell Automation

Rockwell Automation is a global technology leader focused on helping the world’s manufacturers be more productive, sustainable, and agile. With more than 28,000 employees who make the world better every day, we know we have something special. Behind our customers - amazing companies that help feed the world, provide life-saving medicine on a global scale, and focus on clean water and green mobility - our people are energized problem solvers that take pride in how the work we do changes the world for the better.

We welcome all makers, forward thinkers, and problem solvers who are looking for a place to do their best work. And if that’s you we would love to have you join us!

Job Description

Position Overview:

We are looking for a Site Reliability Engineer to join our Container Platform Team. You will design, maintain, and scale our Kubernetes-based platform to ensure high availability, security, and performance. You will work closely with development, security, and infrastructure teams to automate operations, improve multi-cluster management, and enhance developer workflows. You will participate in an on-call rotation to support critical platform operations.

You will report to a Manager, Software Engineering.

Your Responsibilities

Manage, maintain, and improve our Kubernetes platform, ensuring high availability and scalability.
Implement infrastructure as code (Terraform, Helm, Flux, Kustomize) to automate platform operations.
Enhance observability and logging using OpenTelemetry and Elastic Stack.
Improve networking and security policies within Kubernetes (e.g., Istio, Cilium, and Network Policies).
Support developers by optimizing CI/CD pipelines and containerized application deployment workflows.
Troubleshoot production incidents, perform root cause analysis, and drive reliability improvements.
Evaluate and implement cloud-native technologies to enhance platform efficiency.
Collaborate with security teams to ensure best practices for container security and compliance.
Work with multi-cluster management solutions such as Rancher, Cluster API (CAPI), or other Kubernetes fleet management tools.
Manage Kubernetes infrastructure on Azure and vSphere.
Participate in an on-call rotation to support platform operations and respond to incidents.

The Essentials - You Will Have

Bachelor's degree or equivalent years of relevant work experience.
Legal authorization to work in the U.S. We will not sponsor individuals for employment visas, now or in the future, for this job opening.

The Preferred - You Might Also Have

Typically requires 5+ years of experience working with Kubernetes in a production environment.
Proficiency in Terraform, Helm, and Kubernetes manifests for infrastructure automation.
Strong experience with networking (CNI, Istio, Ingress controllers, and multi-cluster networking).
Experience with Linux administration and container runtimes (Docker, containerd).
Familiarity with observability tools (OpenTelemetry, Elastic Stack).
Experience managing multi-cluster Kubernetes environments using Rancher or Cluster API (CAPI).
Experience with RBAC, security policies, and secrets management in Kubernetes.
Hands-on experience with Azure and vSphere as Kubernetes infrastructure providers.
Experience with GitOps practices (FluxCD, ArgoCD).
Prior experience in SRE or Platform Engineering roles.
Knowledge of database management in Kubernetes (e.g., PostgreSQL, MySQL, or distributed storage solutions like Ceph or Longhorn).

What We Offer

Health Insurance including Medical, Dental and Vision
401k
Paid Time off
Parental and Caregiver Leave
Flexible Work Schedule where you will work with your manager to enjoy a work schedule that can be flexible with your personal life.
To learn more about our benefits package, please visit at www.raquickfind.com.

At Rockwell Automation we are dedicated to building a diverse, inclusive and authentic workplace, so if you're excited about this role but your experience doesn't align perfectly with every qualification in the job description, we encourage you to apply anyway. You may be just the right person for this or other roles.

We are an Equal Opportunity Employer including disability and veterans.

If you are an individual with a disability and you need assistance or a reasonable accommodation during the application process, please contact our services team at +1 (844) 404-7247.

Seniority level

Seniority level
Mid-Senior level

Employment type

Employment type
Full-time

Job function

Job function
Engineering and Information Technology
Industries
Automation Machinery Manufacturing

Referrals increase your chances of interviewing at Rockwell Automation by 2x

Continue with Google Continue with Google

Site Reliability Engineer III - IntelliScript (Remote)

Senior Site Reliability / Gitops Engineer

Python and Kubernetes Software Engineer - Data, AI/ML & Analytics

Software Engineer (Python/Linux/Packaging)

Python and Kubernetes Software Engineer - Data, Workflows, AI/ML & Analytics

Software Engineer - Solutions Engineering

Software Engineer, Ceph & Distributed Storage

Senior Software Engineer (Remote) - React, Node

Software Engineer - packaging - optimize Ubuntu Server for public clouds

Python Software Engineer - Ubuntu Hardware Certification Team

Distributed Systems Software Engineer, Python / Go

Golang System Software Engineer - Containers / Virtualisation

Graduate Software Engineer, Open Source and Linux, Canonical Ubuntu

System Software Engineer - GCC/LLVM compiler, tooling, and ecosystem

Software Engineer - packaging - optimize Ubuntu Server

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs