Enable job alerts via email!

Senior Site Reliability Engineer

ECS

Fairfax (VA)

On-site

USD 120,000 - 180,000

Full time

21 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology services provider is looking for a Senior Site Reliability Engineer in their Fairfax, VA office. You will play a key role in enhancing the reliability and performance of critical production environments while working closely with cross-functional teams. The role requires strong technical skills, problem-solving capabilities, and a proactive approach to ensure system resilience and accessibility.

Qualifications

  • 6+ years experience as a Site Reliability Engineer or equivalent.
  • 3+ years of hands-on programming or scripting (e.g., Python, Bash).
  • Strong knowledge of microservices and containerization.

Responsibilities

  • Define and implement SRE practices ensuring reliability and performance.
  • Set up logging, monitoring, and alerting solutions.
  • Collaborate with teams to integrate reliability into software development.

Skills

Problem-solving
Critical thinking
Analytical skills
Collaboration
Attention to detail

Education

Bachelor's degree in Computer Science or Engineering

Tools

Elastic
Prometheus
Grafana
Splunk
Docker
Kubernetes
AWS

Job description

Join to apply for the Senior Site Reliability Engineer role at ECS

1 day ago Be among the first 25 applicants

Join to apply for the Senior Site Reliability Engineer role at ECS

Get AI-powered advice on this job and more exclusive features.

Job Description

ECS is seeking a

Job Description

ECS is seeking a Senior Site Reliability Engineer to work in our Fairfax, VA office.

ECS is seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better awareness and visibility into their security posture and cyber threats. ECS is responsible for designing, building, deploying, operating, and maintaining a complete ‘Data Services’ solution which includes the collection, normalization, visualization, and sharing of cyber data from more than 100 Federal agencies. The CDM Data Services product is an integrated suite of multiple Commercial Off the Shelf (COTS) products, software configuration packages, and custom code which work together to operate as an integrated solution tailored to meet Department of Homeland Security (DHS) requirements.

We are seeking professionals who thrive in a dynamic, fast-paced, and highly collaborative environment where problem-solving, critical thinking, and a holistic approach to serving the mission are key. Our program operates within the Scaled Agile Framework (SAFe). An aptitude and enthusiasm for continuous learning, improvement, and cyber security is a must!

Role & Responsibilities

ECS is seeking a talented Senior Site Reliability Engineer (SRE) to play a key role in defining, implementing, and growing our SRE practice to ensure the reliability, availability, and performance of our critical production environments.

The Senior SRE will contribute to a culture of continuous improvement, identifying areas for enhancement, and driving initiatives to improve system reliability, scalability, and efficiency.

The successful candidate will have demonstrated hands-on experience designing, implementing, and maintaining solutions to ensure that systems, including infrastructure and applications, are resilient, highly available, and performant. The Senior SRE will also play a critical role in defining and measuring the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our solution.

The Senior SRE will be responsible for setting up comprehensive logging, monitoring, and alerting solutions using the Elastic stack and other tools as necessary to ensure the continuous performance of services. Additionally, they will respond to incidents, perform root cause analyses, and implement solutions to prevent reoccurrences. The Senior SRE will work in close collaboration with other SRE team members, developers, testers, infrastructure engineers, DevOps engineers, and other stakeholders to integrate reliability and observability into the software development lifecycle.

Required Skills

  • US citizenship with ability to obtain Public Trust Suitability
  • 6+ years of experience as a Site Reliability Engineer (SRE) or equivalent
  • 6+ years of demonstrated experience designing, implementing, and maintaining observability solutions to include logging, monitoring, and alerting
  • 6+ years of hands-on experience with SRE tools (e.g., Elastic, Prometheus, Grafana, Splunk, etc.)
  • 3+ years defining and measuring SLOs and SLIs
  • 3+ years of relevant experience using cloud platforms (AWS GovCloud preferred)
  • 3+ years of hands-on programming or scripting (e.g., Python, Bash, etc.)
  • Strong knowledge of microservices, containerization, and orchestration tools (Docker, Kubernetes)
  • Proven ability to collaborate with cross-functional teams (development, testing, and product) to integrate reliability and observability into the software development lifecycle
  • Strong problem-solving and analytical skills
  • Proactive, detail-oriented approach to identifying inefficiencies and implementing improvements

Desired Skills

  • Bachelor's degree in Computer Science, Engineering, or a related field (or 4 additional years of related experience)
  • Experience working in an Agile/SAFe environment using ALM tools (Jira, Confluence, or similar)
  • Strong understanding of CI/CD principles and platforms (Jenkins, CircleCI, GitLab, GitHub Actions, Argo, Travis CI, etc.)
  • Expertise in configuration management tools (Ansible, Puppet, Chef)
  • Experience with infrastructure as code (Terraform, CloudFormation)
  • In-depth understanding of networking, security, and system administration of Linux operating systems
  • Knowledge of version control platforms and branching strategies
  • Knowledge of disaster recovery planning, backup strategies, and data replication
  • Experience supporting large Federal programs ($200M+)

#ECS1

ECS is an equal opportunity employer and does not discriminate or allow discrimination on the basis any characteristic protected by law. All qualified applicants will receive consideration for employment without regard to disability, status as a protected veteran or any other status protected by applicable federal, state, or local jurisdiction law.

ECS is a leading mid-sized provider of technology services to the United States Federal Government. We are focused on people, values and purpose. Every day, our 3800+ employees focus on providing their technical talent to support the Federal Agencies and Departments of the US Government to serve, protect and defend the American People.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology
  • Industries
    IT Services and IT Consulting

Referrals increase your chances of interviewing at ECS by 2x

Sign in to set job alerts for “Senior Site Reliability Engineer” roles.
Build Manager (DevOps/Configuration Engineer)

Bethesda, MD $172,300.00-$241,200.00 2 weeks ago

McLean, VA $76,000.00-$157,300.00 3 weeks ago

Global End Point Senior Manager - Cloud DevOps
Software Development Engineer, DevOps ( US Federal)
Senior Director, Software Engineer - Risk Tech
Senior Manager, Software Engineering, DevOps (Enterprise Platforms Technology)
Senior Cyber Capability Developer (Reverse Engineer)

Lorton, VA $104,650.00-$189,175.00 2 weeks ago

Senior Manager, Software Engineering, DevOps (Cloud Operations Resilience Engineering)

Washington, DC $126,100.00-$227,950.00 2 weeks ago

Senior Director, Software Engineer - Risk Tech
Commercial Product Software Engineer, Senior
Senior Director, Software Engineer - Risk Tech
Sr. Software Programmer MUST HAVE TS/SCI CI Poly - ONSITE
Commercial Product Software Engineer, Senior
Senior ServiceNow Developer with Public Trust or Secret (Remote)
Senior ServiceNow Developer with SECRET (Remote DC MD VA area)

Washington, DC $139,000.00-$221,000.00 2 weeks ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Remote Senior Site Reliability Engineer (SRE) - Zetachain

Blockchain Works

San Francisco

Remote

USD 120,000 - 160,000

4 days ago
Be an early applicant

Mid to Senior Site Reliability Engineer (SRE) - AWS Cloud (Security Clearance Required)

ZipRecruiter

Great Falls Crossing

Remote

USD 120,000 - 160,000

10 days ago

Remote - Senior Site Reliability Engineer (SRE)

Green Dot Corporation

Remote

USD 87,000 - 132,000

2 days ago
Be an early applicant

Senior Site Reliability Engineer

Bentley Systems

Remote

USD 100,000 - 150,000

Today
Be an early applicant

Senior Site Reliability Engineer

Upgrade, Inc.

Remote

USD 120,000 - 160,000

2 days ago
Be an early applicant

Senior Site Reliability Engineer (Remote)

3C Deutschland GmbH

Remote

USD 133,000 - 240,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Davita Inc.

Remote

USD 120,000 - 160,000

3 days ago
Be an early applicant

Senior Site Reliability Engineer

Roadie

Remote

USD 120,000 - 160,000

5 days ago
Be an early applicant

Remote Senior Site Reliability Engineer, Onchain - Gemini

WorksHub

New York

Remote

USD 120,000 - 160,000

5 days ago
Be an early applicant