You have discovered the perfect setting to expand your skills and make a meaningful impact. Partner with an organization committed to defining the future of site reliability in the financial sector.
As a Director of Site Reliability Engineering at JPMorgan Chase within the Chief Technology Office Global Technology Asset Management (CTO-GTAM) team, youare constantly establishing new collaborative partnerships that allow your team to work across functions. Proactively engage team members, initiate career conversations, and delegate assignments and opportunities equitably.
Job responsibilities
- Collaborates with engineering, support, and operations teams to maintain and improve the reliability of mission-critical applications.
- Participates in incident management, troubleshooting, and continuous improvement initiatives.
- Implements automation and monitoring solutions to enhance system reliability.
- Joins an on-call rotation and respond effectively to production incidents.
- Shares knowledge and follow best practices to foster a culture of learning and innovation.
- Communicates clearly with stakeholders and proactively solve problems.
- Focuses on customer needs and deliver high-quality support.
- Documents solutions and incident responses for future reference.
- Analyzes system performance and recommend improvements.
- Contributes to post-incident reviews and drive process enhancements.
- Supports the integration of new tools and technologies to improve operational efficiency.
Required qualifications, capabilities, and skills
- Formal training or certification on SRE and Application Support concepts and expert applied experience
- Demonstrable experience in SRE, DevOps, or application support roles, including knowledge of SLIs, SLOs, incident response, and troubleshooting.
- Experience utilizing monitoring and observability tools such as Grafana, Prometheus, Splunk, and Open Telemetry.
- Hands-on experience with CI/CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).
- Experience with cloud platforms such as AWS, GCP, or Azure, and automate infrastructure and deployments.
- Able to break down complex issues, document solutions, and communicate effectively with team members and customers.
- Implemented automation and monitoring solutions to support operational goals.
- Experience collaborating with cross-functional teams to resolve incidents and improve reliability.
- Contributed to continuous improvement of support processes and system performance.
Preferred qualifications, capabilities, and skills
- Deep experience in building enterprise software and proficiency in multiple languages preferably Java, Python, Shell scripting
- Demonstrates experience in banking, fintech, or regulated environments.
- Participates in resilience engineering activities such as game days or chaos engineering.
- Mentors peers by sharing knowledge and best practices.
- Contributes to the adoption of innovative tools and approaches in support operations
- Experience hiring, developing, and recognizing talent
- Draws upon leadership experience to engage team members to expresses complex ideas with appropriate level of detail