Enable job alerts via email!
Boost your interview chances
A leading company is seeking a Senior Site Reliability Engineer in London to manage cloud infrastructure and improve deployment pipelines. You will work on ensuring system reliability and automated processes, collaborating with various teams for scalable solutions. The role requires solid DevOps skills with a focus on AWS services and incident management.
Social network you want to login/join with:
col-narrow-left
Avance Consulting
London, United Kingdom
Other
-
Yes
col-narrow-right
7cd2e309a4a5
5
25.06.2025
09.08.2025
col-wide
The Role
As a DevOps Engineer, you will play a critical role in managing cloud infrastructure, ensuring the reliability of production systems, and improving end-to-end deployment pipelines. This role combines deep operational responsibilities with a strong focus on automation, observability, and continuous improvement. You will be responsible for maintaining high system availability, enabling rapid delivery through CI/CD, and supporting development teams with robust infrastructure and tooling. A key part of the role includes proactive monitoring using Prometheus, Grafana, and Splunk, as well as participating in on-call rotations to respond to live incidents. Collaboration across engineering, security, and product teams is essential to build scalable and resilient systems.
Your responsibilities:
1. Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security.
2. Respond to and resolve infrastructure and service incidents with root cause analysis and preventive measures.
3. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability.
4. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk.
5. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics.
6. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies.
7. Develop and maintain automation scripts in Python, Bash, Go, or SQL for routine infrastructure tasks.
8. Utilize Git-based workflows for infrastructure changes, version control, and automated deployments.
9. Operate, troubleshoot, and optimize Kubernetes clusters and containerized workloads.
10. Participate in a rotating on-call schedule to ensure 24/7 availability of production systems.
Your Profile
Essential skills/knowledge/experience:
1. Working knowledge and prior hands-on experience using AWS services at the DevOps Engineer level
2. Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements
3. Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL
4. Proficient in one or more languages of Python, Go, Bash, SQL
5. Familiar with GitHub / GitOps / container orchestration / Kubernetes operations
6. Working configuration and deployment management experience with CI/CD
Desirable skills/knowledge/experience: (As applicable)
1. Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation.
2. Strong knowledge of Splunk for log analysis and troubleshooting.
3. Strong problem-solving skills and analytical thinking.