Enable job alerts via email!

Senior Site Reliability engineer (SRE)

Axiom Software Solutions Limited

London

On-site

GBP 68,000 - 80,000

Full time

23 days ago

Job summary

A tech solutions company is seeking a Senior Site Reliability Engineer to ensure reliable software operations. The role entails creating monitoring dashboards and defining service level objectives, requiring 12+ years of experience and proficiency in tools like Datadog and AWS. This is a full-time position based in London.

Qualifications

  • 12+ years of experience in site reliability engineering or related field.
  • Proven track record with monitoring tools and cloud platforms.
  • Strong scripting skills in Python or shell.

Responsibilities

  • Create dashboards for monitoring infrastructure and applications.
  • Ensure reliability and stability of software operations.
  • Define and measure service level objectives and agreements.

Skills

Datadog
Cloud monitoring
Python
Automation
DevOps
Ansible
Terraform

Tools

Docker
AWS
Jenkins
Job description
Overview

Role- Senior Site Reliability Engineer (SRE)

Location - London (onsite full-time, 5 days a week)

Salary - Perm up to 80K gross

Minimum requirement: 12+ years of profile

PFB updated JD

Core Competencies / Responsibilities
  • Datadog, Splunk, Dynatrace, Grafana, Prometheus, Thousand Eyes, Gremlin, etc.
  • Efficiency in creating dashboards for Infra / APM / E2E workflows.
  • Monitoring, logging, alerting and error budgets (SLA metrics: 99.9, 99.99, 99.999%) for software, operations and business.
  • Define SLO, SLI, SLA with business/operations/engineering teams.
  • Automation / auto-healing – Python, shell scripting, Java scripts. Developing custom services – monitoring.
  • Experience with logging, monitoring, and event detection on cloud or distributed platforms.
  • ITIL – Incident/Change, proficient in problem management – blameless postmortems, findings, applying permanent fixes, documentation for lessons learned.
  • Technical operations: application support, stability, reliability and resiliency experience.
  • DevOps, Ansible, Terraform, Docker, AWS (Atlas), Jenkins CI/CD pipelines.
  • Unix/Linux, Windows Server, Oracle, MSSQL, MongoDB.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.