Enable job alerts via email!

Lead Site Reliability Engineer

Thomas Reuters

London

Hybrid

GBP 80,000 - 120,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in London is seeking a Lead – Site Reliability Engineer to mentor a growing team, manage production environments, and ensure system reliability on AWS. The ideal candidate will have strong cloud operations experience, excellent communication skills, and a strategic mindset. This role offers a hybrid work model and opportunities for career development.

Benefits

Flexible vacation
Mental Health Days
Tuition reimbursement
Employee incentive programs
Work from anywhere for up to 8 weeks

Qualifications

  • 5+ years experience in Site Reliability Engineering or related position.
  • At least 2 AWS Certifications required.

Responsibilities

  • Lead SRE team and mentor members on SRE principles.
  • Ensure uptime to meet customer SLA and manage incidents.
  • Automate processes and improve system monitoring.

Skills

Communication
Analytical
Troubleshooting
Automation

Education

Bachelor's in Computer Science
Master's in Computer Science

Tools

AWS
Docker
Kubernetes
SQL

Job description

Job Description

The role of the Lead – Site Reliability Engineer is to be hands-on and provide mentorship to other team members on core SRE principles and tools. The lead SRE will participate in end to end operational aspects of Production environment. The individual concerned will be able to work on cloud systems, networks, databases and help drive incident lifecycle management. As a member of the SRE team, you will also be working closely with the Architects, DevOps, Product and development teams to ensure we get the most out of the software on AWS platform. This role requires a highly skilled technology professional with excellent communication skills, strategic mindset, strong analytical and troubleshooting skills on AWS Cloud Platform.

Other responsibilities include working with internal business partners to gather requirements, prototyping, architecting, implementing/updating solutions, building and executing test plans, performing quality reviews, managing operations, and triaging and fixing operational issues. Site Reliability Engineers must be able to adjust to constant business change; common types of changes include new requirements, evolving goals and strategies, and emerging technologies.

About the Role:

  • Be hands-on and provide mentorship to a growing SRE team on core SRE principles and tools.
  • Foster a sense of automation in issue resolution; everything possible should be automated, and only when automation can’t resolve an issue should people get involved in the resolution
  • Lead efforts for updating production with new versions/infrastructures as they are available
  • Lead capacity planning efforts in collaboration with Architects and DevOps engineers to determine changes to infrastructure that are needed to support new load and performance characteristics
  • Leads engagement with software developers, DevOps and other infrastructure engineers to integrate software development and delivery from inception to full operation, ensuring robust released software and systems.
  • Ensure highest level of uptime to meet the customer SLA by implementing system wide corrections to prevent reoccurrence of issues.
  • Mentor other SRE team members to further develop their soft and hard skills
  • Triage, troubleshoot and resolve issues using golden signals and go past golden signals
  • Go past golden signals with additional principles such as chaos engineering to detect failure points and lead Game days for testing resiliency of team when it comes to incident response and remediations and synthetic monitoring.
  • Lead SRE team members to create and maintain Recovery Procedures, RCA’s in collaboration with other engineering teams.
  • Ensure Incidents assigned to the team are being managed within agreed SLAs
  • Ensure alarms are documented in up to date Knowledge Base Articles.
  • Ensures Production infrastructure is up to date with server/security patches and certificates.
  • Continuous improvement of system and application monitoring and automation
  • Identify and automate manual workarounds and process improvements
  • Proactive monitoring of Monitor the availability, latency, scalability and efficiency of all services
  • Perform periodic on-call duty as part of the SRE team

About You:

  • Skilled with cloud operations/administration in Amazon AWS.
  • Tax/Accounting domain experience
  • Bachelors or Master’s in Computer Science discipline.
  • 5+ years’ experience focussed on Site Reliability Engineering or related position in AWS Cloud Platform.
  • At least 2 AWS Certifications are must. (AWS Sysops Admin and Architects certifications preferred).
  • Experience working with SQL, Windows Servers, Load balancers, Linux
  • Deep experience with AWS, Docker and Kubernetes, CloudFormation, CloudWatch, CodeDeploy, DynamoDB, Lambda, SQS, Amazon FSX, Elastic Search and networking concepts are must.
  • Program at a high level in at least one language such as: Java, C#, Javascript, Python or Ruby.
  • Integration experience with PagerDuty, ServiceNow, Datadog, CloudWatch.
  • Good understanding of Site Reliability Engineering (SRE) philosophies, technologies, platforms and tools, SLO management, incident resolution, and automation;
  • Ability to explain technical concepts in clear, non-technical language
  • Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)
  • Knowledge of security and compliance standards such as SOC/PCI is a plus

#LI-HS1

What’s in it For You?

  • Hybrid Work Model: We’ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected.

  • Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance.

  • Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future.

  • Industry Competitive Benefits: We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing.

  • Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our values: Obsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together.

  • Social Impact: Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives.

  • Making a Real-World Impact:We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world.


About Us

Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal, tax, accounting, compliance, government, and media. Our products combine highly specialized software and insights to empower professionals with the data, intelligence, and solutions needed to make informed decisions, and to help institutions in their pursuit of justice, truth, and transparency. Reuters, part of Thomson Reuters, is a world leading provider of trusted journalism and news.

We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity, accuracy, fairness, and transparency are under attack, we consider it our duty to pursue them. Sound exciting? Join us and help shape the industries that move society forward.

As a global business, we rely on the unique backgrounds, perspectives, and experiences of all employees to deliver on our business goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace.

We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law. More information on requesting an accommodation here.

Learn more on how to protect yourself from fraudulent job postings here.

More information about Thomson Reuters can be found on thomsonreuters.com.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Principal Site Reliability Engineer

TN United Kingdom

London

Hybrid

GBP 80,000 - 110,000

Yesterday
Be an early applicant

Principal Site Reliability Engineer

Orgvue

London

Hybrid

GBP 80,000 - 120,000

Yesterday
Be an early applicant

Principal Site Reliability Engineer

Orgvue

London

Hybrid

GBP 70,000 - 100,000

6 days ago
Be an early applicant

Lead Site Reliability Engineer

TN United Kingdom

London

On-site

GBP 60,000 - 100,000

5 days ago
Be an early applicant

Lead Site Reliability Engineer

Connells Group

Milton Keynes

On-site

GBP 70,000 - 90,000

Yesterday
Be an early applicant

Lead Site Reliability Engineer

Finders Keepers Ltd.

Milton Keynes

On-site

GBP 65,000 - 85,000

Yesterday
Be an early applicant

Lead Platform Architect (m/f/d)-AI

TN United Kingdom

Greater London

Remote

GBP 70,000 - 110,000

10 days ago

Lead Site Reliability Engineer

JR United Kingdom

Greater London

On-site

GBP 60,000 - 100,000

10 days ago

Lead Site Reliability Engineer

JR United Kingdom

London

On-site

GBP 60,000 - 100,000

8 days ago