Enable job alerts via email!

Observability Site Reliability Engineer

Apple Inc.

Greater London

On-site

GBP 70,000 - 90,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology company in London, is seeking an Observability Site Reliability Engineer to enhance infrastructure and services for hundreds of millions of users. This role requires expertise in Linux, coding in Go or Python, and managing diverse systems through configuration tools like Puppet or Ansible. Candidates should have a strong understanding of networking protocols and experience in microservices and container orchestration. Join a collaborative team focused on innovative solutions and infrastructure observability.

Qualifications

Strong understanding of the Linux operating system and TCP/IP suite of networking protocols.
Ability to design, author, and release code in languages like Go or Python.
Hands-on experience managing large systems with configuration management tools.

Responsibilities

Solve problems using data, teamwork, and expertise.
Own the full infrastructure stack and improve existing tools.
Collaborate with development teams to deliver solutions.

Skills

Linux operating system

Go programming language

Python programming language

Configuration management

Kubernetes

Microservices architecture

Education

BS/MS in Computer Science or Equivalent

Tools

Puppet

Chef

Ansible

Prometheus

Observability Site Reliability Engineer

London, England, United Kingdom Software and Services

People at Apple don’t just build products — they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it. The Apple Service Engineering (ASE) team builds and provides systems and infrastructure that fuel Apple’s services (such as iCloud, iTunes, Siri, and Maps). We are the foundation on which Apple’s software developers build the products that our customers love. We are looking for passionate and talented Site Reliability Engineers to continue our focus in providing our customers the highest quality Apple Services experience. Our services have to scale globally, stay highly available, and "just work." If you love designing, engineering and running systems and infrastructure that will help millions of customers, then this is the place for you! The Observability SRE organization is specifically tasked with enabling other teams to better understand their infrastructure and services, providing world‑class observability capabilities.

Description

Apple Services Engineering infrastructure is BIG. Operating at our scale, across multiple geographically dispersed data centers and servicing hundreds of millions of users presents unique challenges. As an SRE at Apple, you'll need to solve these problems using data, teamwork, and your own expertise. SREs at Apple own the full infrastructure stack; from device driver performance debugging to content delivery network traffic management — our responsibilities are both broad and deep. ASE runs the majority of its systems on Linux. We run a mix of open source, vendor licensed, and internally developed tools to perform functions such as system configuration management, provisioning, software deployment, logging, and monitoring. You'll learn these tools and have opportunities to improve them. Our team is collaborative; we work closely with the development teams we support to deliver the best results for Apple. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded.

Minimum Qualifications

Strong understanding of the Linux operating system and TCP/IP suite of networking protocols
Ability to design, author, and release code in languages like Go or Python
Hands‑on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Chef, Ansible)
Familiarity with microservices architecture and container orchestration with Kubernetes

Preferred Qualifications

Bare metal management experience and experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
Acute drive to automate manual operations and to improve them through repeated iteration
Experience with scale testing, disaster recovery, and capacity planning and experienced in managing and scaling distributed systems in a public, private, or hybrid cloud environment
Experience with the Prometheus ecosystem and a good understanding of infrastructure observability principles

Education & Experience

BS/MS in Computer Science or Equivalent ( + in depth experience of software development or production operations experience in a large‑scale environment)

At Apple, we’re not all the same. And that’s our greatest strength. We draw on the differences in who we are, what we’ve experienced and how we think. Because to create products that serve everyone, we believe in including everyone. Therefore, we are committed to treating all applicants fairly and equally. As a registered Disability Confident employer, we will work with applicants to make any reasonable accommodations. Apple will consider for employment all qualified applicants with criminal backgrounds in a manner consistent with applicable law. Learn more

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions