Enable job alerts via email!

Site Reliability Engineer with Python

Onyx-Conseil

Greater London

Hybrid

GBP 80,000 - 100,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading consulting firm is seeking a Site Reliability Engineer to manage and enhance complex cloud-based tools and services. Candidates should have over 7 years of experience in software engineering, strong Python skills, and the ability to collaborate across teams. This role offers a salary of approximately £80K - £100K, with the option to work in London or New York. If you are passionate about automating workflows and improving system reliability, we encourage you to apply.

Qualifications

7+ years of experience with software engineering or system operations.
Experience debugging complex problems and implementing solutions.
Experience with container orchestration like Docker or Kubernetes.

Responsibilities

Ensure internal applications and services are operational.
Collaborate with internal teams to solve production issues.
Automate workflows and address infrastructure needs.

Skills

Python

Software engineering

DevOps engineering

Git

Docker

JavaScript

Problem solving

Tools

AWS

Kubernetes

MySQL

Postgres

Cloudwatch

Site Reliability Engineer with Python

Our Client looking to bring on a site reliability engineer to help deploy, manage, troubleshoot, and enhance our complex cloud-based set of internal tools and externally managed services for a variety of users across our wide-ranging organization.

You will have at least 7 to 10 years hands‑on expertise working as a Site Reliability Engineer.

You will work closely with IT, product, and engineering to extend and maintain this set of tools and services and to help debug and resolve problems.

In addition, the ideal candidate will proactively look for system weaknesses and find ways to resolve them before they can cause production issues via monitoring and data we aggregate through various tools within our organization's IT & DevOps toolkit.

Responsibilities

Keep our suite of internal apps and services up and running or getting it back up and running quickly if a failure were to occur.
Be the technical point person of operational responsibility for two core platforms (one mobile and one web application) engaging as appropriate upon escalations from the IT support group whether it be problem solving, addressing production issues, enhancing features etc. - collaborating with engineers and others as needed.
Work closely with internal partners and teams as well as external vendors to ensure that we ship software that meets our code quality, security and performance requirements.
Write, update, and use our documentation, including runbooks and/or playbooks.
Help automate existing or build new internal workflows including ongoing infrastructure needs, testing, failover mitigations, and more.
Debug complex problems across our entire web and mobile application stack and advise key stakeholders on solutions, as well as implement said solutions if appropriate.
Further our internal CI/CD processes to improve release cadence and developer experience.
Participate in the daily / weekly software development process (standups, sprint planning, retros, issue tracking, etc.).
Actively lead any critical issue post‑mortem processes, including coordination of any meetings and further steps to take.

Qualifications

7+ years experience with software engineering, software development, and/or system operations.
Experience debugging complex problems and implementing timely cost‑effective solutions.
Experience designing, building, and operating large‑scale production systems.
Deep knowledge of Python is preferred, though other languages like Java, Go, Rust, or similar will also be heavily considered.
Experience using source control (Git, GitHub) and feature branching strategies.
Experience with a variety of open‑source databases (MySQL, Postgres, Redis, etc.).
Experience with DevOps engineering and working with container orchestration, such as with Docker or Kubernetes.
Experience with log monitoring and observability via platforms like Sumologic or Cloudwatch.
Experience automating infrastructure, testing, and deployments using tools like CircleCI, configuration management tooling, and infrastructure as code knowledge is preferred but not required.
Experience working with AWS services, with knowledge of Azure / Google ecosystems helpful but not required.
Strong familiarity with general modern web and mobile application development, including hands‑on experience working with JavaScript (Typescript preferred) and Python stacks.
Cross‑functional team collaboration experience, especially working with engineers and user experience / product designers, as well as external stakeholders.
Strong skills for weighing and managing scope, risk, quality and timelines.
Strong focus on quality, security, performance, and end user experience.

This is an exciting position with an exciting organisation based in Central London and New York.

The position can be London or New York based.

The salary for this position will be circa £80K - £100K.

Do send your CV to us in Word format along with your salary and notice period.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions