Enable job alerts via email!

Lead Observability Engineer – Sumo Logic

E-Solutions

United States

Remote

USD 150,000 - 210,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading IT services company is seeking a highly skilled Lead Observability Engineer to implement Sumo Logic solutions crucial for migrating from Dynatrace. The role involves designing and implementing observability strategies, ensuring service reliability, and leveraging expertise in AWS and Kubernetes. Ideal candidates will have substantial experience in observability practices and demonstrate strong leadership and communication skills to drive SRE maturity.

Qualifications

  • Expert-level experience with Sumo Logic including dashboarding, alerting, collector deployment.
  • Hands-on experience with OpenTelemetry for distributed tracing.
  • Strong scripting experience with tools like Terraform, Helm.

Responsibilities

  • Lead the end-to-end implementation of Sumo Logic for AWS and EKS.
  • Define and implement SLIs/SLOs for containerized services.
  • Collaborate with DevOps and SRE teams for complete service tracing.

Skills

Site Reliability Engineering
Sumo Logic
Kubernetes
OpenTelemetry
AWS services
Terraform
Helm

Job description

Lead Observability Engineer – Sumo Logic
Lead Observability Engineer – Sumo Logic

Direct message the job poster from E-Solutions

Please share resume - nitin.k@e-solutionsinc.com; mohit.k@e-solutionsinc.com; naveen.p@e-solutionsinc.com; pragati.s@e-solutionsinc.com;…

Role : Lead Observability Engineer – Sumo Logic & SRE

Location : Remote

JD:

Experience: 10+ years (with 3+ years in Sumo Logic & Cloud-native observability)

Job Summary:

We are seeking a highly skilled Lead Observability Engineer to lead a critical implementation of Sumo Logic for a client migrating from Dynatrace. This role requires deep expertise in Sumo Logic, Site Reliability Engineering (SRE) practices, and Kubernetes (EKS) observability. The ideal candidate will design and implement scalable dashboards, alerts, and tracing strategies, drive service-level reliability, and enable a steady-state SRE operations model.

Key Responsibilities:

• Lead the end-to-end implementation of Sumo Logic observability platform for AWS and EKS environments.

• Migrate monitoring and alerting assets from Dynatrace to Sumo Logic.

• Define and implement SLIs/SLOs, error budgets, and reliability metrics for containerized services.

• Deploy and configure Sumo Logic collectors across AWS and Kubernetes workloads (EKS).

• Configure log, metric, and trace ingestion pipelines using OpenTelemetry and Sumo Logic apps.

• Design and maintain dashboards for service health, performance, and reliability insights.

• Implement intelligent alerting and notification workflows, using thresholds, baselines, and anomaly detection.

• Collaborate with DevOps, SRE, and development teams to ensure complete tracing coverage across services.

• Ensure best practices for alert noise reduction, escalation policies, and incident response are in place.

• Contribute to observability runbooks, operational handover, and training for the client SRE team.

Required Skills & Qualifications:

• Expert-level experience with Sumo Logic, including dashboarding, alerting, collector deployment, and ML features.

• Strong background in Site Reliability Engineering (SRE), including SLIs/SLOs, error budgets, MTTR/MTTD metrics.

• Proficiency in AWS services (especially CloudWatch, CloudTrail, Lambda, RDS) and EKS (Amazon Kubernetes Service).

• Hands-on experience with OpenTelemetry for distributed tracing and service maps.

• Strong understanding of Kubernetes metrics, pod health, container resource usage, and cluster monitoring.

• Proven ability to define alert thresholds, configure notification routing (e.g. Slack, PagerDuty, ServiceNow), and manage alert fatigue.

• Strong scripting experience with tools like Terraform, Helm, YAML, and GitOps workflows.

• Experience with incident triage, RCA documentation, and building operational maturity in observability teams.

• Excellent communication and stakeholder engagement skills.

Preferred Qualifications:

• Sumo Logic certifications (Admin, Advanced Analytics) are a plus.

• Experience with Dynatrace (for migration purposes).

• Familiarity with integrating observability into CI/CD pipelines.

• Exposure to service mesh (Istio/Linkerd) and monitoring microservices in that context.

Deliverables This Role Will Drive:

• Sumo Logic observability reference architecture

• EKS and AWS observability configuration

• SLI/SLO documentation and tracking

• Alerting and tracing setup across services

• Production-ready dashboards and runbooks

• Knowledge transfer and enablement sessions for SRE/DevOps teams

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Contract
Job function
  • Job function
    Information Technology
  • Industries
    IT Services and IT Consulting and Information Services

Referrals increase your chances of interviewing at E-Solutions by 2x

Get notified about new Site Reliability Engineer jobs in United States.

Site Reliability Engineer L4, Netflix Technology Services
Site Reliability Engineer L5 - Open Connect

United States $100,000.00-$720,000.00 2 weeks ago

Junior Site Reliability Engineer (Remote)

United States $80,237.00-$139,077.00 1 week ago

DevOps Software Engineer (Remote - United States)
DevOps Software Engineer (Remote - United States)
DevOps Software Engineer (Remote - United States)

United States $100,000.00-$720,000.00 2 weeks ago

DevOps Software Engineer (Remote - United States)

United States $100,000.00-$720,000.00 2 weeks ago

Senior Site Reliability Engineer (Remote)

United States $133,109.00-$239,596.00 1 week ago

Junior Site Reliability Engineer (Remote)

San Francisco, CA $175,000.00-$250,000.00 1 day ago

United States $170,000.00-$720,000.00 4 days ago

United States $64,000.00-$112,000.00 5 days ago

United States $170,000.00-$210,000.00 2 weeks ago

United States $140,000.00-$140,000.00 1 week ago

United States $147,000.00-$208,000.00 1 day ago

United States $150,000.00-$170,000.00 1 day ago

Site Reliability Engineer (FULLY REMOTE)

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.