Enable job alerts via email!

Sr. Manager - Site Reliability Engineering (SRE)

Analog Devices, Inc.

Wilmington (MA)

On-site

USD 120,000 - 180,000

Full time

26 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is on the lookout for a seasoned Site Reliability Engineering Manager to enhance the reliability and performance of their hybrid cloud and on-premises applications. This role involves hands-on leadership, guiding a globally distributed team while implementing scalable solutions. You will design and automate systems, ensuring high availability and efficiency across diverse environments. If you are passionate about driving continuous improvement and fostering collaboration, this opportunity offers a chance to make a significant impact in a dynamic and innovative setting, where your expertise will help shape the future of technology.

Qualifications

10+ years of experience in SRE, DevOps, or Systems Engineering.
Strong leadership skills managing globally distributed teams.
Expertise in cloud platforms and container orchestration.

Responsibilities

Manage and improve performance of cloud and on-prem systems.
Lead automation initiatives for infrastructure management.
Oversee incident management and postmortem activities.

Skills

Site Reliability Engineering

DevOps

Cloud Platforms (AWS, GCP, Azure)

Kubernetes

Docker

Python

Ruby

Shell Scripting

Monitoring Tools (Prometheus, Grafana)

Education

Bachelor's degree in Computer Science or Engineering

Tools

Terraform

Ansible

CloudFormation

Jenkins

GitLab CI

ELK Stack

Datadog

Sr. Manager - Site Reliability Engineering (SRE)

Apply locations US, MA, Wilmington time type Full time posted on Posted 2 Days Ago job requisition id R251663

About Analog Devices

Analog Devices, Inc. (NASDAQ: ADI ) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologies into solutions that help drive advancements in digitized factories, mobility, and digital healthcare, combat climate change, and reliably connect humans and the world. With revenue of more than $9 billion in FY24 and approximately 24,000 people globally, ADI ensures today's innovators stay Ahead of What's Possible.

Job Title: Site Reliability Engineering (SRE) Sr. Manager

Location: Wilmington, MA

Overview:

We are seeking an experienced Site Reliability Engineering Manager with leadership skills to maintain and improve the reliability, availability, and performance of our cloud and on-premises-based applications and infrastructure. You will be responsible for designing, automating, and managing systems in a hybrid environment, ensuring that both on-prem and cloud services run seamlessly. This role requires strong collaboration with a globally distributed team to implement scalable, automated solutions and foster continuous improvement in operational processes.

Responsibilities

Hands-on Leadership: Take an active, hands-on role in managing and improving the performance, scalability, and reliability of both cloud and on-prem systems. Lead by example in resolving complex issues and troubleshooting system failures. This will be a hands-to-keyboard role for at least 30-40% of your time.
Hybrid Infrastructure Management: Design, implement, and manage infrastructure across on-premises, cloud-based, and hybrid environments to ensure high availability, performance, and cost-effectiveness.
Automation & Optimization: Lead automation initiatives to streamline the management of on-prem and cloud environments, including infrastructure provisioning, deployment pipelines, monitoring, and incident resolution. Develop and maintain automation scripts using tools like Terraform, Ansible, and CloudFormation.
Reliability & Performance: Ensure systems are highly available, resilient, and scalable, with proactive monitoring and incident response mechanisms across on-prem, hybrid, and cloud infrastructures. Manage and report out KPI's to make sure the team is constantly striving for continuous improvements.
Distributed Systems: Manage the challenges associated with globally distributed systems and teams, ensuring that systems are synchronized and can perform consistently across multiple regions.
Infrastructure as Code: Leverage Infrastructure as Code (IaC) tools like Terraform, Ansible, and CloudFormation to manage both cloud and on-prem infrastructure efficiently and consistently.
CI/CD & DevOps Practices: Collaborate with development teams to refine and improve CI/CD pipelines, ensuring seamless integration and delivery across a hybrid infrastructure environment.
Incident Management & Postmortems: Oversee incident management, ensuring that any issues across both cloud and on-prem infrastructure are resolved quickly. Lead root cause analysis and postmortem activities to improve future reliability.
Team Leadership & Collaboration: Lead and guide a team of App Hosting engineers across different geographies, ensuring effective communication, coordination, and collaboration with globally distributed teams.
Cross-Functional Collaboration: Work closely with product, development, and infrastructure teams across different locations to design and implement solutions that meet the needs of our hybrid and global environments.
Mentorship & Coaching: Provide mentorship to engineers, promoting a culture of continuous learning and fostering best practices in both hybrid and on-prem infrastructure management.
Project Management: Oversee and prioritize initiatives that span multiple regions, ensuring that global infrastructure needs and timelines are met.
Process Improvement: Define and implement processes to ensure the stability, scalability, and efficiency of systems across both cloud and on-prem environments. Improve incident management, postmortem processes, and on-call rotations across global teams.

Qualifications:

Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent work experience).
10+ years of experience in SRE, DevOps, or Systems Engineering with expertise in managing hybrid and on-prem infrastructure in addition to cloud-based systems.
Strong leadership skills and experience managing globally distributed teams for at least 5+ years.
Expertise in cloud platforms (AWS, GCP, Azure) and container orchestration systems (Kubernetes, Docker).
Proficiency in programming/scripting languages such as Python, Go, Ruby, or Shell.
5+ years of experience in hosting enterprise level applications on various platforms such as Windows, Linux, Cloud, OnPrem, etc.
Extensive experience with Infrastructure as Code (IaC) tools and automation frameworks (Terraform, Ansible, etc.).
Solid knowledge of monitoring and logging tools (Prometheus, Grafana, ELK Stack, Datadog, etc.).
Familiarity with CI/CD tools and processes (Jenkins, GitLab CI, etc.).
Strong experience with service-oriented architecture (SOA), microservices, and managing distributed systems across geographically diverse regions.
Excellent communication, interpersonal, and leadership skills with the ability to work effectively with remote and distributed teams.
Familiarity with VMs, Containers, networking, VPNs, and load balancing in hybrid environments.

Additional Qualifications (good to have):

Experience in the manufacturing industry or a similar domain. Experience with SAP Datasphere and/or Databricks platform support and management. Certifications in cloud platforms.

For positions requiring access to technical data, Analog Devices, Inc. may have to obtain export licensing approval from the U.S. Department of Commerce - Bureau of Industry and Security and/or the U.S. Department of State - Directorate of Defense Trade Controls. As such, applicants for this position – except US Citizens, US Permanent Residents, and protected individuals as defined by 8 U.S.C. 1324b(a)(3) – may have to go through an export licensing review process.

Analog Devices is an equal opportunity employer. We foster a culture where everyone has an opportunity to succeed regardless of their race, color, religion, age, ancestry, national origin, social or ethnic origin, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, parental status, disability, medical condition, genetic information, military or veteran status, union membership, and political affiliation, or any other legally protected group.

EEO is the Law: Notice of Applicant Rights Under the Law.

Job Req Type: Experienced

Required Travel: No

Shift Type: 1st Shift/Days

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Manager, Site Reliability Engineering (IaC)

Axon

Boston

Remote

USD 142.000 - 228.000

Today

Be an early applicant

Senior Manager Site Reliability Engineering (Kubernetes)- Remote

Akamai Technologies

Remote

USD 155.000 - 324.000

9 days ago