Social network you want to login/join with:
Site Reliability Engineer III - Market Risk, Glasgow
Location: Glasgow, United Kingdom
EU work permit required: Yes
Job Reference: 3090166c95d7
Job Views: 2
Posted: 14.05.2025
Expiry Date: 28.06.2025
Job Description
There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
Our team is globally located and focused on ensuring production stability, automation, reliability, and observability. We are looking for solution-oriented, commercially minded, customer-focused individuals, used to working in an agile environment, who want to be part of building something new from the ground up within a diverse and inclusive team.
Culture is important to us, and we seek intellectually curious, technology passionate individuals eager to expand their skills while working on an exciting new venture. Your work will have a significant impact on our company, clients, and business partners worldwide.
Responsibilities
- Drive continuous improvement of reliability, monitoring, and alerting for mission-critical microservices.
- Reduce toil through automation, creating reliable infrastructure and tooling to expedite feature development.
- Develop metrics for microservices, define user-journeys, SLOs, error budgets, and set up dashboards and alerts.
- Facilitate blameless post-mortems and ensure incidents are permanently resolved.
- Collaborate with development teams throughout the software lifecycle to develop reliability and scalability solutions, designing self-healing and resiliency patterns.
- Work across the organization to influence and support application portfolios.
- Respond to incidents alongside developers and infrastructure engineers, providing support and insights.
- Design and implement deployment strategies using automated CI/CD pipelines.
- Implement infrastructure, configuration, and network as code for applications and platforms.
- Understand SLIs and SLOs to proactively resolve issues, supporting SRE best practices such as metrics, alerting, logging, automation, resiliency, capacity, and performance.
Minimum Qualifications and Skills
- Formal training or certification in site reliability engineering concepts.
- Proficiency in at least one programming language, such as Python.
- Experience with designing, coding, testing, and delivering software within a technology stack.
- Experience with Kubernetes.
- Experience with cloud computing platforms (AWS or others).
- Expertise in one or more technology domains, capable of solving complex, mission-critical problems.
- Strong debugging and troubleshooting skills.
- Ability to work collaboratively in large teams, proactively recognizing obstacles and learning new technologies.
- Experience with CI/CD tools like Jenkins, GitLab, Terraform.
- Experience with observability tools such as Dynatrace, Datadog, New Relic, CloudWatch, AppDynamics, Splunk, Geneos.