Enable job alerts via email!

Senior Infrastructure Engineer - Observability - Remote from United Kingdom

Aircall

United Kingdom

Remote

GBP 60,000 - 80,000

Full time

Today

Be an early applicant

Job summary

A leading technology firm in the United Kingdom is seeking an Observability Engineer to enhance monitoring and observability practices. This role involves collaboration with engineering teams and the automation of monitoring setups. Ideal candidates will have 3-5 years of experience in observability, strong skills in Datadog and Terraform, and a solid understanding of Kubernetes and microservices. This position offers a competitive salary and a vibrant work environment.

Benefits

Competitive salary package

Work-life balance

Fast-learning environment

Entrepreneurial and strong team spirit

Qualifications

3-5 years of experience in observability within SRE, DevOps, or platform engineering roles.
Strong hands-on experience with Datadog.
Solid understanding of Kubernetes, microservices, and cloud infrastructure.

Responsibilities

Develop comprehensive observability best practices.
Collaborate strategically with engineering teams.
Automate monitoring setup and provisioning.

Skills

Datadog

Terraform

Kubernetes

Python

Bash

Communication skills

Aircall is a unicorn AI‑powered customer communications platform used by 22,000+ companies worldwide to drive revenue, faster resolutions, and scale. We’re redefining what a customer communications platform can be—by combining voice, SMS, WhatsApp, and AI into one seamless workspace.

Our momentum comes from a simple but powerful idea: help every customer‑facing team work smarter, not harder. Aircall’s AI Voice Agent automates routine calls, AI Assist streamlines post‑call tasks, and AI Assist Pro delivers real‑time guidance that helps people do their best work. The result—companies grow revenue, deliver faster resolutions, and scale service.

We’ve built a product customers love and a business that scales fast. Aircall operates in nine global offices (Paris, New York, San Francisco, Sydney, Madrid, London, Berlin, Seattle, and Mexico City), and is backed by world‑class investors. Our teams are shipping AI innovation faster than ever and expanding across new product lines and markets.

At Aircall, you’ll join a company in motion—ambitious, profitable, and product‑driven—where impact is visible, decisions are fast, and growth is real.

How We Work at Aircall: At Aircall, we believe in customer obsession, continuous learning, and delivering extraordinary outcomes. We value open collaboration, taking ownership, and making smart, informed decisions with speed and precision. If you thrive in a fast‑paced, team‑driven environment where curiosity, trust, and impact matter, you’ll fit right in.

We’re looking for an Observability Engineer to own and evolve Aircall’s monitoring, alerting, and observability stack. You’ll work cross‑functionally with backend, front end and infrastructure and teams to ensure our systems are transparent, measurable, and continuously improving in reliability and performance.

This role is ideal for someone passionate about observability‑as‑code, metric design, and helping engineering teams gain meaningful visibility into their systems.

Key Responsibilities

Develop comprehensive observability best practices: Define and standardize guidelines for metrics, traces, and logs, ensuring consistent implementation and adoption across all engineering teams. This includes establishing naming conventions, data collection methodologies, and retention policies to ensure high‑quality and actionable observability data whilst optimising cost and waste.
Collaborate strategically with engineering teams: Partner closely with various engineering teams to enhance overall system reliability and performance. This involves actively participating in architectural reviews, defining clear Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and seamlessly integrating observability practices into continuous integration and continuous deployment (CI/CD) pipelines to promote a culture of "observability by design."
Automate monitoring setup and provisioning: Drive the automation of monitoring infrastructure through Infrastructure-as-Code (e.g., leveraging the Terraform Datadog provider) and develop intuitive self‑service observability tools. This empowers engineering teams to rapidly provision and manage their monitoring resources, reducing manual overhead and accelerating time to insight.
Improve alerting hygiene and effectiveness: Continuously refine and optimise alerting mechanisms by meticulously tuning thresholds, implementing intelligent noise reduction strategies, and ensuring all alerts are directly aligned with potential business impact. The goal is to deliver timely, relevant, and actionable alerts that enable proactive incident response and minimise service disruption.
Train and empower product teams: Provide comprehensive training and ongoing support to product teams, enabling them to effectively utilise observability tools. This includes guiding them in building insightful dashboards that visualise key performance indicators and creating robust alerts that proactively detect issues within their respective services.
Evaluate and integrate advanced observability tools: Proactively research, evaluate, and integrate new and emerging observability tools and technologies as needed. This may include exploring solutions for OpenTelemetry adoption, advanced log aggregation platforms, distributed tracing systems, and other tools that enhance our overall observability capabilities and support the evolving needs of our infrastructure and applications.

Qualifications

3‑5 years of experience in observability within SRE, DevOps, or platform engineering roles.
Strong hands‑on experience with Datadog (dashboards, monitors, synthetics, logs, APM, RUM).
Proficiency with Terraform or other Infrastructure‑as‑Code tools.
Solid understanding of Kubernetes, microservices, and cloud infrastructure (EKS, Lambda, RDS, S3, AWS networking).
Familiarity with distributed tracing and OpenTelemetry concepts.
Strong scripting skills (Python, Bash, or similar).
Experience defining and managing SLIs/SLOs and service‑level observability frameworks.
Excellent collaboration and communication skills; you can work with both engineers and non‑technical stakeholders.

Nice to Have

Experience with incident management and on‑call processes.
Exposure to data visualisation or analytics tools beyond Datadog.
Knowledge of logging pipelines (e.g., FluentBit, Logstash).
Experience working in high‑scale SaaS environments.
Previous experience in developer enablement or platform teams.

Why join us?

🚀 Key moment to join Aircall in terms of growth and opportunities

💆♀️ Our people matter, work‑life balance is important at Aircall

📚 Fast‑learning environment, entrepreneurial and strong team spirit

🌍 45+ Nationalities: cosmopolite & multi‑cultural mindset

💶 Competitive salary package & benefits

DE&I Statement:

At Aircall, we believe diversity, equity and inclusion – irrespective of origins, identity, background and orientations – are core to our journey.

We pride ourselves on promoting active inclusion within our business to foster a strong sense of belonging for all. We’re working to create a place filled with diverse people who can enrich and learn from one another. We’re committed to ensuring that everyone not only has a seat at the table but is valued and respected at it by providing equal opportunities to develop and thrive.

We are strongly committed to hiring a diverse and multicultural team and we encourage applications from traditionally underrepresented backgrounds.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.