Job Search and Career Advice Platform

Enable job alerts via email!

Dev Ops Engineer – Lead

ICONMA

Richmond

On-site

GBP 74,000 - 97,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A financial services company in Richmond, VA is seeking a Dev Ops Engineer – Lead to oversee full-stack observability solutions using Datadog. Responsibilities include instrumenting applications and designing service monitoring. The ideal candidate has 2+ years of experience with observability tools, a strong background in programming, and automation skills. This role offers health benefits and excellent growth opportunities in a supportive work environment.

Benefits

Health Benefits
Referral Program
Excellent growth and advancement opportunities

Qualifications

  • 2+ years of experience in cloud-based observability solutions.
  • Mandatory certifications in Datadog Fundamentals and APM.
  • System operations and software development background required.

Responsibilities

  • Implement and manage full-stack observability using Datadog.
  • Design and deploy key service monitoring including dashboards.
  • Automate monitoring configurations and telemetry collection.

Skills

Proficiency in Prometheus and Grafana
Expertise in Python and Go
Experience with AWS, GCP, and Azure
Familiarity with Terraform and Ansible
Understanding of CI/CD pipelines
Strong understanding of observability concepts
Expertise in security & vulnerability management

Tools

Datadog
ELK Stack
Swagger
Splunk
Job description

Our Client, a Financial company, is looking for a Dev Ops Engineer – Lead for their Richmond, VA location.

Responsibilities
  • Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services.
  • Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring.
  • Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications.
  • Configure and integrate Datadog with third-party services such as ServiceNow, SSO enablement, and other ITSM tools.
  • Design & Implement Solutions: Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces.
  • System Instrumentation: Instrument applications, infrastructure, and services to collect telemetry data using frameworks like OpenTelemetry.
  • Data Analysis & Visualization: Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues.
  • Collaboration: Work with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals.
  • Automation: Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection.
Requirements
  • Observability Tools: Proficiency in monitoring, logging, and tracing tools, including Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, New Relic, and cloud-native solutions like AWS CloudWatch.
  • Programming Languages: Expertise in languages such as Python and Go for scripting and automation.
  • Infrastructure & Cloud Platforms: Experience with cloud platforms (AWS, GCP, Azure) and container orchestration systems like Kubernetes.
  • Infrastructure as Code (IaC): Familiarity with Terraform and Ansible for managing infrastructure and configurations.
  • CI/CD & Automation: Experience with CI/CD pipelines and automation tools like Jenkins.
  • System & Software Engineering: A strong background in both system operations and software development.
  • Optimize cloud agent instrumentation, with cloud certifications being a plus.
  • Datadog Fundamental, APM and Distributed Tracing Fundamentals & Datadog Demo Certification (Mandatory)
  • Strong understanding of Observability concepts (Logs, Metrics, Tracing)
  • Expertise in security & vulnerability management in observability
  • Possesses 2 years of experience in cloud-based observability solutions, specializing in monitoring, logging, and tracing across AWS, Azure, and GCP environments.
Why Should You Apply?
  • Health Benefits
  • Referral Program
  • Excellent growth and advancement opportunities

As an equal opportunity employer, ICONMA provides an employment environment that supports and encourages the abilities of all persons without regard to race, color, religion, gender, sexual orientation, gender identity or express, ethnicity, national origin, age, disability status, political affiliation, genetics, marital status, protected veteran status, or any other characteristic protected by federal, state, or local laws.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.