Enable job alerts via email!

Senior Site Reliability Engineer

Talent Groups

McKinney (TX)

Hybrid

USD 120,000 - 160,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is looking for a Senior Site Reliability Engineer to oversee Kubernetes infrastructure in a high-performance multi-tenant SaaS environment. The role involves extensive hands-on work with Kubernetes internals, automation, and performance optimization. Candidates should have proven expertise in Kubernetes management, software development with Go, and building observability stacks.

Benefits

Medical insurance
Vision insurance
401(k)

Qualifications

  • Expertise in managing on-prem Kubernetes clusters.
  • Experience with Kubernetes internals and Go programming.
  • Background in Linux engineering.

Responsibilities

  • Architect and manage on-prem Kubernetes clusters.
  • Build observability stacks using monitoring tools.
  • Implement Kubernetes-native traffic enforcement.

Skills

Production-grade Kubernetes expertise
Go
Linux engineering
Observability stacks
Python
Bash
CNI plugins
OpenStack
Service mesh

Job description

1 week ago Be among the first 25 applicants

Direct message the job poster from Talent Groups

Senior Technical Recruiter at Talent Groups

Location: Hybrid - McKinney, TX 75070

Type: Full-Time, Direct Hire. Applicants must be authorized to work in the United States without the need for current or future visa sponsorship. At this time, we are unable to consider candidates who require sponsorship.

We’re looking for a Senior Site Reliability Engineer (SRE) to architect, build, and own a Kubernetes-based infrastructure platform powering a high-performance, real-time, multi-tenant SaaS environment. This role is centered around on-premises Kubernetes in data center environments, with a strong focus on traffic enforcement, observability, and reliability at scale.

You’ll join a forward-thinking engineering team responsible for developing control systems, building traffic-routing logic, and maintaining a resilient cloud-native platform from the metal up. You’ll be hands-on in Kubernetes internals, networking, and automation—playing a critical role in ensuring reliability, performance, and visibility across the platform.

What You'll Do

  • Own the architecture, operations, and lifecycle of on-prem Kubernetes clusters in high-scale production environments.
  • Build and maintain observability stacks using tools like Prometheus, Grafana, OpenTelemetry, Jaeger, and Loki to provide actionable insight and proactive alerting.
  • Implement and optimize Kubernetes-native traffic enforcement across multi-tenant SaaS workloads, including per-tenant fairness and routing enforcement.
  • Work directly in Go to extend Kubernetes functionality via CRDs, operators, or controllers.
  • Manage the care, feeding, and scaling of OpenStack clusters across global data centers.
  • Lead SRE best practices: from automated remediation and capacity planning to disaster recovery and performance optimization.
  • Design and manage advanced CNI configurations (Cilium, overlay networks, etc.) and modern SDN/NFV patterns.
  • Collaborate with software engineering teams to ensure seamless integration between infrastructure and application layers.

Must-Have Qualifications

  • Production-grade Kubernetes expertise—you’ve architected, deployed, and managed your own clusters on-prem (not just EKS/GKE/AKS).
  • Hands-on experience with Kubernetes internals, including CRDs, controllers, and cluster APIs.
  • Fluency in Go, with experience developing Kubernetes operators or integrations.
  • Strong Linux engineering background, with deep command-line and troubleshooting expertise.
  • Proven success building observability stacks (Prometheus, Grafana, OpenTelemetry, etc.).
  • Experience with CNI plugins (e.g., Cilium, Calico, overlay networks) and container networking.
  • Working knowledge of Python and Bash for scripting and automation.
  • Experience with OpenStack (Nova, Neutron, Ceph) and its integration with Kubernetes environments.
  • Familiarity with Helm, Terraform, ArgoCD, or Flux for Kubernetes GitOps and infrastructure automation.
  • Exposure to service mesh tools like Istio or Linkerd.
Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Information Technology
  • Industries
    IT Services and IT Consulting, IT System Custom Software Development, and Computer and Network Security

Referrals increase your chances of interviewing at Talent Groups by 2x

Inferred from the description for this job

Medical insurance

Vision insurance

401(k)

Get notified when a new job is posted.

Sign in to set job alerts for “Site Reliability Engineer” roles.
Software and Documentation Engineer (Remote)

Austin, TX $83,200.00-$156,000.00 2 weeks ago

Site Reliability Engineer (SRE, Remote US)

Austin, TX $120,000.00-$160,000.00 3 months ago

Site Reliability Engineer (FULLY REMOTE)
Principal Cloud Security Engineer – Azure

Dallas, TX $152,311.00-$197,689.00 4 days ago

Senior Site Reliability Engineer (SRE) - REMOTE

Texas, United States $120,000.00-$160,000.00 1 week ago

Austin, TX $175,000.00-$200,000.00 1 month ago

United States $130,000.00-$140,000.00 1 day ago

Austin, TX $85,000.00-$95,000.00 5 days ago

Dallas, TX $80,000.00-$125,000.00 3 days ago

Irving, TX $149,600.00-$224,400.00 1 day ago

Site Reliability Engineer-FedRAMP (FULLY REMOTE)
Senior Site Reliability Engineer (SRE) - REMOTE

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

General Dynamics Mission Systems

Aurora

Remote

USD 129,000 - 141,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer

MongoDB

Remote

USD 127,000 - 249,000

6 days ago
Be an early applicant

Senior Site Reliability Engineer ( Remote - US)

Jobgether

Remote

USD 120,000 - 160,000

3 days ago
Be an early applicant

Senior Site Reliability Engineer

Censys

Remote

USD 145,000 - 195,000

3 days ago
Be an early applicant

[Hiring] Senior Site Reliability Engineer @Intetics

Intetics

Remote

USD 120,000 - 160,000

5 days ago
Be an early applicant

Mid to Senior Site Reliability Engineer (SRE) - AWS Cloud (Security Clearance Required)

ZipRecruiter

Great Falls Crossing

Remote

USD 120,000 - 160,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer - remote

ZipRecruiter

Salt Lake City

Remote

USD 141,000 - 176,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer

Rocket Lab

Remote

USD 126,000 - 193,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer II

Instacart

Remote

USD 120,000 - 200,000

8 days ago