Enable job alerts via email!

Site Reliability Engineer |, AI/ML Platform

TN United Kingdom

Glasgow

On-site

GBP 50,000 - 90,000

Full time

14 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Site Reliability Engineer to join their AI/ML Data Platform Team in Glasgow. This role offers the opportunity to drive technological advancements by optimizing and maintaining complex systems. You'll work closely with software engineers to enhance the reliability and scalability of applications, while also mentoring junior engineers. If you're passionate about problem-solving and eager to contribute to cutting-edge projects, this is the perfect opportunity for you.

Qualifications

  • Expertise in SRE principles and applied experience.
  • Proficiency in Python and Terraform for automation.

Responsibilities

  • Collaborates with teams to design and implement reliable solutions.
  • Debugs and solves issues in production environments.

Skills

Python
Site Reliability Engineering
Terraform
AWS
Problem-solving
Communication

Education

Formal training in Site Reliability Engineering

Tools

Infrastructure as Code tools
Cloud-native architecture

Job description

Social network you want to login/join with:

Site Reliability Engineer |||, AI/ML Platform, Glasgow

Client:

Location: Glasgow, United Kingdom

Job Category: -

EU work permit required: Yes

Job Reference: 6d686af05d15

Job Views: 5

Posted: 02.05.2025

Expiry Date: 16.06.2025

Job Description:

There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.

As a Site Reliability Engineer III at JPMorgan Chase within the AIML Data Platform Team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.

Job responsibilities
  • Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications.
  • Implements infrastructure, configuration, and network as code for the applications and platforms in your remit.
  • Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers.
  • Designs and implements solutions to enhance the reliability and scalability of AI/ML platforms and applications to accommodate fast-growing demands.
  • Partners with product engineering teams to ensure the AI/ML systems are reliable and high performing.
  • Develops observability, security, automation, and fin-ops tools and orchestration.
  • Builds strong cross-functional relationships that foster engagement across the organization and deliver solutions to user problems.
  • Debugs and solves issues in a production environment, identifies root causes, and remediates.
  • Participates in on-call rotations, incident management, and escalation workflows.
  • Takes full ownership of problems, develops solutions, and acquires new knowledge to complete tasks.
  • Mentors and guides junior engineers.
Required qualifications, capabilities, and skills
  • Formal training or certification in Site Reliability Engineering concepts and applied experience.
  • Expertise in SRE principles, reliability, scalability, and performance of applications and infrastructure.
  • Proficiency in programming with Python and Infrastructure as Code tools such as Terraform.
  • Experience working with distributed systems and cloud-native architecture in AWS.
  • Strong problem-solving and troubleshooting skills in complex systems.
  • Excellent communication skills and ability to present technical and business concepts to stakeholders.
  • Self-managed, motivated, with a strong sense of ownership, urgency, and drive.

There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.

As a Site Reliability Engineer III at JPMorgan Chase within the AIML Data Platform Team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer |, AI/ML Platform

JPMorgan Chase & Co.

Glasgow

On-site

GBP 60.000 - 100.000

14 days ago

Site Reliability Engineer |, AI/ML Platform

J.P. MORGAN

Scotland

On-site

GBP 60.000 - 80.000

3 days ago
Be an early applicant