Enable job alerts via email!

Sr. Manager - Site Reliability Engineering (SRE)

Analog Devices

Wilmington (DE)

On-site

USD 120,000 - 180,000

Full time

12 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a dynamic Sr. Manager for Site Reliability Engineering. This role involves hands-on leadership in enhancing the reliability and performance of cloud and on-premises applications. You will design and automate systems in a hybrid environment while collaborating with a global team to implement scalable solutions. Ideal candidates will have extensive experience in managing distributed systems, strong leadership skills, and a passion for continuous improvement. Join a forward-thinking company that values innovation and teamwork, and make a significant impact on operational excellence.

Qualifications

10+ years in SRE or DevOps managing hybrid and cloud infrastructure.
Strong leadership with experience managing globally distributed teams.

Responsibilities

Manage and improve performance, scalability, and reliability of systems.
Lead automation initiatives for cloud and on-prem environments.

Skills

Site Reliability Engineering

DevOps

Cloud Infrastructure Management

Leadership

Automation

Incident Management

Distributed Systems

Programming/Scripting (Python, Go, Ruby, Shell)

CI/CD Practices

Education

Bachelor’s degree in Computer Science

Tools

Terraform

Ansible

CloudFormation

Kubernetes

Docker

Prometheus

Grafana

ELK Stack

Datadog

Jenkins

Sr. Manager - Site Reliability Engineering (SRE)

Join to apply for the Sr. Manager - Site Reliability Engineering (SRE) role at Analog Devices.

About Analog Devices
Analog Devices, Inc. (NASDAQ: ADI) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologies into solutions that help drive advancements in digitized factories, mobility, and digital healthcare, combat climate change, and reliably connect humans and the world. With revenue of more than $9 billion in FY24 and approximately 24,000 people globally, ADI ensures today's innovators stay Ahead of What's Possible. Learn more at www.analog.com and on LinkedIn and Twitter (X).

Job Title:

Site Reliability Engineering (SRE) Sr. Manager

Location:

Wilmington, MA

Overview:

We are seeking an experienced Site Reliability Engineering Manager with leadership skills to maintain and improve the reliability, availability, and performance of our cloud and on-premises-based applications and infrastructure. You will be responsible for designing, automating, and managing systems in a hybrid environment, ensuring that both on-prem and cloud services run seamlessly. This role requires strong collaboration with a globally distributed team to implement scalable, automated solutions and foster continuous improvement in operational processes.

Responsibilities:

Hands-on Leadership: Take an active, hands-on role in managing and improving the performance, scalability, and reliability of both cloud and on-prem systems. Lead by example in resolving complex issues and troubleshooting system failures. This will be a hands-to-keyboard role for at least 30-40% of your time.
Hybrid Infrastructure Management: Design, implement, and manage infrastructure across on-premises, cloud-based, and hybrid environments to ensure high availability, performance, and cost-effectiveness.
Automation & Optimization: Lead automation initiatives to streamline the management of on-prem and cloud environments, including infrastructure provisioning, deployment pipelines, monitoring, and incident resolution. Develop and maintain automation scripts using tools like Terraform, Ansible, and CloudFormation.
Reliability & Performance: Ensure systems are highly available, resilient, and scalable, with proactive monitoring and incident response mechanisms across on-prem, hybrid, and cloud infrastructures. Manage and report out KPIs to ensure continuous improvement.
Distributed Systems: Manage the challenges associated with globally distributed systems and teams, ensuring systems are synchronized and perform consistently across multiple regions.
Infrastructure as Code: Leverage Infrastructure as Code (IaC) tools like Terraform, Ansible, and CloudFormation to manage infrastructure efficiently and consistently.
CI/CD & DevOps Practices: Collaborate with development teams to refine and improve CI/CD pipelines, ensuring seamless integration and delivery across a hybrid infrastructure environment.
Incident Management & Postmortems: Oversee incident management, ensure quick resolution of issues, and lead root cause analysis and postmortem activities to improve future reliability.
Team Leadership & Collaboration: Lead and guide a team of engineers across different geographies, ensuring effective communication and collaboration with globally distributed teams.
Cross-Functional Collaboration: Work closely with product, development, and infrastructure teams to design and implement solutions for hybrid and global environments.
Mentorship & Coaching: Provide mentorship to engineers, fostering continuous learning and best practices in infrastructure management.
Project Management: Oversee and prioritize initiatives spanning multiple regions, ensuring infrastructure needs and timelines are met.
Process Improvement: Implement processes to enhance system stability, scalability, and efficiency across cloud and on-prem environments, including incident management and on-call rotations.

Qualifications:

Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent work experience).
10+ years of experience in SRE, DevOps, or Systems Engineering managing hybrid and cloud infrastructure.
Strong leadership skills with experience managing globally distributed teams for at least 5+ years.
Expertise in cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Docker).
Proficiency in programming/scripting languages such as Python, Go, Ruby, or Shell.
5+ years hosting enterprise applications on various platforms (Windows, Linux, Cloud, On-Prem).
Experience with Infrastructure as Code tools (Terraform, Ansible, etc.).
Knowledge of monitoring/logging tools (Prometheus, Grafana, ELK Stack, Datadog, etc.).
Familiarity with CI/CD tools (Jenkins, GitLab CI).
Strong experience with SOA, microservices, and managing distributed systems across regions.
Excellent communication and leadership skills, with ability to work remotely.
Familiarity with VMs, Containers, networking, VPNs, and load balancing in hybrid environments.

Additional Qualifications:

Experience in manufacturing or similar domain, familiarity with SAP Datasphere and Databricks, and cloud certifications are a plus.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Manager, Site Reliability Engineering (IaC)

Axon

Boston

Remote

USD 142,000 - 228,000

Today

Be an early applicant

Senior Manager Site Reliability Engineering (Kubernetes)- Remote

Akamai Technologies

Remote

USD 155,000 - 324,000

9 days ago