Enable job alerts via email!

Principal Site Reliability Developer

TN United Kingdom

United Kingdom

On-site

GBP 60,000 - 100,000

Full time

20 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm in the tech industry is seeking a Principal Site Reliability Developer to join their dynamic team. This role involves collaborating with the Site Reliability Engineering team to ensure the reliability and performance of critical services. You will design and deliver robust technology stacks, focusing on security and scalability while applying automation principles. With a strong emphasis on problem-solving and communication, you will thrive in a fast-paced environment, managing large-scale infrastructure and cloud services. If you have a passion for technology and a drive to enhance service architecture, this is the perfect opportunity for you.

Qualifications

5+ years in Systems Engineering, DevOps, or SRE roles managing large-scale infrastructure.
Proficiency with Kubernetes, Terraform, Helm, Docker required.

Responsibilities

Work with SRE team on full stack ownership of services and technology areas.
Design and deliver mission-critical stacks focusing on security and performance.

Skills

Kubernetes

Terraform

Helm

Docker

Python

Bash/Shell

AWS

Azure

Google Cloud

Agile methodologies

Problem-solving

Tools

JIRA

Social network you want to login/join with:

Principal Site Reliability Developer, United Kingdom

Client: Oracle

Location: United Kingdom

Job Category: Other

EU work permit required: Yes

Job Reference: f51a0ac612f8

Job Views: 4

Posted: 01.05.2025

Expiry Date: 15.06.2025

Job Description:

Work with the Site Reliability Engineering (SRE) team on shared full stack ownership of services and technology areas.
Understand configuration, dependencies, and behavioral characteristics of production services.
Design and deliver mission-critical stacks focusing on security, resiliency, scale, and performance.
Own end-to-end performance and operability, collaborate with development teams to improve service architecture, and enhance capabilities.
Communicate service attributes like scale, capacity, security, and performance requirements.
Apply automation and orchestration principles; serve as escalation point for complex issues.
Utilize knowledge of service topology for troubleshooting and mitigation.
Explain the impact of architecture decisions on distributed systems.
Maintain professional curiosity and deepen understanding of services and technologies.
Require 5+ years in Systems Engineering, DevOps, or SRE roles managing large-scale infrastructure or cloud services.
Proficiency with Kubernetes, Terraform, Helm, Docker.
Skills in programming languages such as Go, Python, Bash/Shell.
Familiarity with IaaS platforms (AWS, Azure, Google Cloud), CI/CD tools, proxies like Envoy or Nginx.
Understanding of Linux OS, especially Oracle Linux, Red Hat, CentOS.
Strong problem-solving, communication, ownership, and drive.
Solid knowledge of PKI, mTLS, SSL, SSH.
Experience with Agile methodologies (Scrum or Kanban).
Experience in production operations, deploying quality code, and troubleshooting.
Excellent teamwork, communication, organization, and interpersonal skills.
Comfortable in complex, fast-changing environments.
Experience managing large-scale customer-facing web services.
Proficiency with ticketing systems like JIRA.