Enable job alerts via email!

SRE Technical Manager (Application Services)

Leidos

Arlington (VA)

Remote

USD 112,000 - 204,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as a Site Reliability Engineering Technical Manager, where you will lead a talented team to enhance the reliability and scalability of critical systems. This role offers a unique opportunity to shape the future of IT services for the Navy, focusing on strategic and operational aspects of site reliability. You will collaborate with engineering and operations teams, implement automation best practices, and foster a culture of innovation. If you are passionate about technology and leadership, this position is designed for you to make a significant impact in a dynamic environment.

Qualifications

8-10 years of SRE or DevOps experience with at least 4 years in a leadership role.
US Citizen with an active DoD Secret Clearance required.

Responsibilities

Manage and mentor SRE teams, ensuring reliability and performance of critical systems.
Drive continuous improvement initiatives and implement best practices in automation.

Skills

Site Reliability Engineering (SRE)

DevOps

Incident Management

Agile Development

Leadership

Communication Skills

Education

BS Degree in Cybersecurity or related field

Master's Degree

Tools

AWS

Azure

Jira

Confluence

Terraform

GitLab

Ansible

Description

More About the Role:
Leidos currently has an opening on the Service Management Integration and Transport (SMIT) Contract for a Site Reliability Engineering (SRE) Technical Manager. This is an exciting opportunity to use your experience and leadership skills to successfully execute the mission of the Navy’s largest IT services program. Under the SMIT Contract, the Leidos team is responsible for the core backbone for the Navy-Marine Corps Intranet, including cybersecurity services, network operations, network engineering, service desk, seat support services, and data transport.

We are seeking a highly skilled and experienced SRE Technical Manager to lead our Application Services Site Reliability Engineering (SRE) team. In this role, you will manage a group of talented engineers responsible for ensuring the reliability, performance, and scalability of critical systems across 4-6 SRE Pods. You will work closely with engineering, product, and operations teams to implement best practices in automation, incident management, and system monitoring. This role will focus on both the strategic and operational aspects of site reliability, ensuring that the team meets performance objectives while fostering a culture of innovation and continuous improvement. The SRE Technical Manager will collaborate with the Director of Site Reliability Engineering and is responsible for supporting, migrating, automation and optimization of software development and deployment process, infrastructure as code, and maturing the Site Reliability Engineering program. The manager will mentor and coach technical staff performing collaborative code reviews to strengthen the SRE skills across the teams.

What You'll Get to Do:

Manage and mentor 4-6 SRE teams (pods) and 40+ FTEs, providing guidance, setting performance expectations, and fostering professional development.
Work collaboratively with SRE Resource Managers to staff and maintain engineering resources for your SRE vertical teams' reliability and scalability goals.
Responsible for the P&L across the Data Center Services vertical. Manage the SRE team’s resources, including budget planning, tool selection, and infrastructure investments to meet reliability and scalability needs.
Meet regularly with your team members, participate in performance reviews and interviews, and development planning.
Oversee the reliability, availability, and performance of critical systems by leading the SRE teams within the Application Services vertical in implementing monitoring, incident response, and performance optimization strategies.
Ensure the team adheres to best practices for system reliability, automation, and operational efficiency.
Drive continuous improvement initiatives by analyzing performance metrics (e.g., SLOs, MTTR, MTBF) and identifying areas for enhancement.
Collaborate with operations, quality, cybersecurity and other SRE engineering teams to define and enforce Service Level Objectives (SLOs) and manage error budgets.
Act as a liaison between the SRE team and other departments to prioritize reliability and operational needs in the product development process.
Collaborate with senior leadership to define the SRE strategy, set long-term reliability goals, and ensure alignment with business objectives.
Lead efforts to reduce operational toil through automation. Work with the team to build or enhance automation tools that manage infrastructure, monitor systems, and respond to incidents.
Oversee the development and adoption of Infrastructure as Code (IaC) tools, CI/CD pipelines, and other automation processes.
Ensure that SRE practices align with organizational security policies and compliance requirements.
Ensure systems meet or exceed agreed-upon service levels by proactively addressing potential issues and working with stakeholders to align on reliability expectations.
Work within a SRE team, collaborating with other Developers, Security, and Operations, to continuously deliver products and increase the value stream for the organization and customers.
Embrace and champion Agile development processes and adoption to modern Site Reliability Engineering workflows and practices while providing technical guidance to team members and coworkers on best practices.
Stay up to date on the latest Site Reliability Engineering practices and technologies.
Strive to provide internal and external customers with excellent customer service and world-class service.
Resolve most conflicts between timeline, budget, and scope independently but intuitively raise sophisticated or consequential issues to senior management.

You'll Bring These Qualifications:

Requires BS Degree (or equivalent) in Cybersecurity, Information Security, IT, Network Engineering, Computer Science, or related field or master’s with 6+ years of prior relevant experience with 8-10 years of SRE or DevOps experience and at least 4 years in a leader or manager capacity.
US Citizen with an active DoD Secret Clearance.
Minimum of DoD 8570.01 IAT Level II Certification required prior to onboarding and must maintain certification while supporting the SMIT Contract.
Must be able to support program execution in classified environments and access SIPRNet from an NMCI location on short notice (local travel).
Exceptional written and oral communication skills including producing technical analysis/reports, presentations and executive level briefings with internal and external stakeholders.
Ability to review requirements, comprehend, and solution capabilities that satisfy customer requirements.
Strong track record of managing incidents, conducting postmortems, and implementing reliability improvements.
Experience implementing and managing Agile or DevOps processes, with a focus on continuous improvement, efficiency, and team productivity.
Ability to lead teams through strategic initiatives such as reliability maturity assessments, process automation, and tooling selection.
Experience with commercial cloud infrastructure deployment environments such as AWS and Azure.
Strong knowledge of automation tools, CI/CD pipelines, and Infrastructure as Code (IaC).
Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.).
Experience creating JIRA and/or Azure DevOps workflows, projects, custom configurations.
Solid experience with integrating/maintaining with various 3rd party CI/CD tools like Jenkins and Gitlab.
Experience with automated provisioning and configuration tools like Terraform, Cloud Formation, Ansible, or similar technologies.
Working knowledge of the Risk Management Framework (RMF), DISA STIGs.

These Qualifications Would be Nice to Have:

Previous work experience providing support to the NGEN-NMCI program is highly desired.
Experience with microservices architecture and distributed systems.
Familiarity with serverless and event-driven architectures.
Certification in cloud platforms (e.g., Azure Certified DevOps Engineer).
Experience in high-growth environments or managing teams during significant scaling periods.
Agile SAFe certifications or applicable experience.

Original Posting Date:

2025-01-30

While subject to change based on business needs, Leidos reasonably anticipates that this job requisition will remain open for at least 3 days with an anticipated close date of no earlier than 3 days after the original posting date as listed above.

Pay Range:

Pay Range $112,450.00 - $203,275.00

The Leidos pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.

#Remote

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Sr. Technical Program Manager

Tala

null null

Remote

USD 100,000 - 150,000

Full time

5 days ago

Be an early applicant

Technical Product Manager, KSM

Keeper Security, Inc.

Cameron Park null

Remote

USD 120,000 - 180,000

Full time

13 days ago

Pessoa Tech Lead

Pulsus Mobi

null null

Remote

USD 120,000 - 180,000

Full time

10 days ago

Tech Lead Manager, Infrastructure

MachineFi Lab

null null

Remote

USD 90,000 - 150,000

Full time

30+ days ago

Tech Lead/Manager, K8s

PulsePoint, Inc.

null null

Remote

USD 120,000 - 180,000

Full time

30+ days ago

Lead Technical Program Manager - Security, Vulnerabilities

Klaviyo

Boston null

On-site

USD 169,000 - 254,000

Full time

30+ days ago

Product Manager, Technical

T-MOBILE USA, Inc.

Town of Texas,San Francisco null

On-site

USD 90,000 - 120,000

Full time

30+ days ago

Observability Technical Lead

Fisher Investments

Town of Texas null

On-site

USD 80,000 - 140,000

Full time

30+ days ago

Technical Lead – DevOps Mid - Senior Cloud Services Cloud Services

Sysco Corporation (nyse:syy)

Town of Texas null

On-site

USD 120,000 - 150,000

Full time

30+ days ago

SRE Technical Manager (Application Services)

Leidos

Arlington (VA)

Remote

USD 112,000 - 204,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs

Sr. Technical Program Manager

null null

Remote

Remote

USD 100,000 - 150,000

Full time

Technical Product Manager, KSM

Cameron Park null

Remote

Remote

USD 120,000 - 180,000

Full time

Pessoa Tech Lead

null null

Remote

Remote

USD 120,000 - 180,000

Full time

Tech Lead Manager, Infrastructure

null null

Remote

Remote

USD 90,000 - 150,000

Full time

Tech Lead/Manager, K8s

null null

Remote

Remote

USD 120,000 - 180,000

Full time

Lead Technical Program Manager - Security, Vulnerabilities

Boston null

On-site

On-site

USD 169,000 - 254,000

Full time

Product Manager, Technical

Town of Texas,San Francisco null

On-site

On-site

USD 90,000 - 120,000

Full time

Observability Technical Lead

Town of Texas null

On-site

On-site

USD 80,000 - 140,000

Full time

Technical Lead – DevOps Mid - Senior Cloud Services Cloud Services

Town of Texas null

On-site

On-site

USD 120,000 - 150,000

Full time