Job Search and Career Advice Platform

Enable job alerts via email!

Director of SRE, Chief Technology Office - Global Technology Asset Management (CTO-GTAM)

J.P. Morgan

Greater London

On-site

GBP 100,000 - 130,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading financial services firm seeks a Director of Site Reliability Engineering to maintain the reliability of mission-critical applications. This role involves collaborating with engineering and operations teams, participating in incident management, and implementing automation solutions. Ideal candidates will have significant SRE/DevOps experience and expertise with cloud platforms, monitoring tools, and CI/CD pipelines. Strong communication skills and a focus on continuous improvement are essential.

Qualifications

  • Formal training or certification in SRE and Application Support.
  • Expert applied experience in SRE, DevOps, or application support roles.
  • Hands-on with monitoring and observability tools.

Responsibilities

  • Collaborates with engineering teams to maintain application reliability.
  • Participates in incident management and improvement initiatives.
  • Implements automation solutions for system reliability.

Skills

SRE and Application Support concepts
CI/CD pipelines
Incident response
Monitoring tools (Grafana, Prometheus)
Cloud platforms (AWS, GCP, Azure)
Containerization (Docker)
Orchestration (Kubernetes)
Programming languages (Java, Python, Shell scripting)
Job description

You have discovered the perfect setting to expand your skills and make a meaningful impact. Partner with an organization committed to defining the future of site reliability in the financial sector.

As a Director of Site Reliability Engineering at JPMorgan Chase within the Chief Technology Office Global Technology Asset Management (CTO-GTAM) team, youare constantly establishing new collaborative partnerships that allow your team to work across functions. Proactively engage team members, initiate career conversations, and delegate assignments and opportunities equitably.

Job responsibilities
  • Collaborates with engineering, support, and operations teams to maintain and improve the reliability of mission-critical applications.
  • Participates in incident management, troubleshooting, and continuous improvement initiatives.
  • Implements automation and monitoring solutions to enhance system reliability.
  • Joins an on-call rotation and respond effectively to production incidents.
  • Shares knowledge and follow best practices to foster a culture of learning and innovation.
  • Communicates clearly with stakeholders and proactively solve problems.
  • Focuses on customer needs and deliver high-quality support.
  • Documents solutions and incident responses for future reference.
  • Analyzes system performance and recommend improvements.
  • Contributes to post-incident reviews and drive process enhancements.
  • Supports the integration of new tools and technologies to improve operational efficiency.
Required qualifications, capabilities, and skills
  • Formal training or certification on SRE and Application Support concepts and expert applied experience
  • Demonstrable experience in SRE, DevOps, or application support roles, including knowledge of SLIs, SLOs, incident response, and troubleshooting.
  • Experience utilizing monitoring and observability tools such as Grafana, Prometheus, Splunk, and Open Telemetry.
  • Hands-on experience with CI/CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).
  • Experience with cloud platforms such as AWS, GCP, or Azure, and automate infrastructure and deployments.
  • Able to break down complex issues, document solutions, and communicate effectively with team members and customers.
  • Implemented automation and monitoring solutions to support operational goals.
  • Experience collaborating with cross-functional teams to resolve incidents and improve reliability.
  • Contributed to continuous improvement of support processes and system performance.
Preferred qualifications, capabilities, and skills
  • Deep experience in building enterprise software and proficiency in multiple languages preferably Java, Python, Shell scripting
  • Demonstrates experience in banking, fintech, or regulated environments.
  • Participates in resilience engineering activities such as game days or chaos engineering.
  • Mentors peers by sharing knowledge and best practices.
  • Contributes to the adoption of innovative tools and approaches in support operations
  • Experience hiring, developing, and recognizing talent
  • Draws upon leadership experience to engage team members to expresses complex ideas with appropriate level of detail
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.