Enable job alerts via email!

Site Reliability Engineering Manager

General Motors

United Kingdom

Remote

GBP 70,000 - 100,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Site Reliability Engineering Manager to lead a team in enhancing system reliability and efficiency. The role involves direct technical contributions, mentoring engineers, and managing incident responses. Candidates should have strong programming skills and a solid understanding of systems architecture. This remote position allows for flexibility within the UK, with immigration sponsorship available.

Qualifications

Proficiency in at least one programming language (e.g., Python, Go, Java).
Experience handling production incidents and root cause analysis.
Strong skills in explaining technical concepts to diverse stakeholders.

Responsibilities

Lead the team in setting priorities and ensuring alignment with organizational goals.
Develop tools to automate operational processes and improve system reliability.
Participate in an on-call rotation to mitigate production incidents.

Skills

Programming Skills

Systems Knowledge

Incident Management

Communication and Collaboration

Automation Focus

Tools

Cloud platforms (AWS, GCP, Azure)

Container orchestration systems (Kubernetes)

Join to apply for the Site Reliability Engineering Manager role at General Motors

Job Description

As an SRE Engineering Manager, you will be expected to lead your team in setting priorities and ensuring alignment with organizational goals, while also being deeply technical. Our managers are expected to contribute directly through coding, reviewing code, and mentoring engineers. Although not the primary focus, the ability and willingness to engage in technical details, solve problems hands-on, and support your team's technical decisions are crucial. You will serve as a mentor, guide, and partner, helping engineers grow and ensuring the reliability and efficiency of their systems. We set a high standard for engineering managers who lead by example in both technical expertise and people leadership.

Required Experience:

Automation and Reliability Improvements: Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention.
Observability and Monitoring: Lead, implement, and improve monitoring and observability frameworks to enable proactive incident detection and resolution.
Incident Response: Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime and swift resolution.
Collaboration with Development Teams: Work alongside developers to ensure the quality, scalability, and reliability of services, fostering a "You build it, you run it" culture.
Service Level Management: Manage SLIs, SLOs, and SLAs to effectively handle reliability expectations.
Engineering for Reliability: Have a strong understanding of common application reliability patterns and experience implementing them.
Failure Analysis and Post-Incident Reviews: Conduct deep-dive analyses of incidents, collaborate on reviews, and champion a culture of continuous improvement.
Cost Efficiency: Evaluate system performance and advocate for optimizations that reduce infrastructure costs while maintaining reliability.

Skills and Qualifications:

Programming Skills: Proficiency in at least one language (e.g., Python, Go, Java) and familiarity with multiple ecosystems.
Systems Knowledge: Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures.
System Fundamentals: Deep understanding of how code runs on hardware, including OS, algorithms, and data structures, with the ability to troubleshoot and optimize code.
Incident Management: Experience handling production incidents, root cause analysis, and complex system failures.
Communication and Collaboration: Strong skills in explaining technical concepts to diverse stakeholders and fostering shared ownership.
Automation Focus: Proven experience automating manual processes, building deployment pipelines, or managing configuration systems.

Preferred Experience:

Experience with cloud platforms (AWS, GCP, Azure).
Familiarity with container orchestration systems like Kubernetes.
Experience managing or developing distributed systems.
Prior experience with Java in production environments.

This role is remote, and the successful candidate may be based anywhere in the UK, without the need to report to a GM worksite unless directed. GM will provide immigration sponsorship for this role.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Site Reliability Engineering Specialist

BT Group

Birmingham

On-site

GBP 55.000 - 75.000

4 days ago

Be an early applicant

Site Reliability Engineering Lead

JR United Kingdom

London

Hybrid

GBP 60.000 - 100.000

15 days ago