Enable job alerts via email!

Site Reliability Engineer Lead

IGT Solutions

United States

Remote

USD 90,000 - 150,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a skilled IT Operations Manager to lead service management and infrastructure operations. This role involves overseeing high-availability systems, conducting thorough problem investigations, and ensuring operational reliability. You will collaborate with cross-functional teams to enhance service delivery and implement continuous improvements in deployment processes. The ideal candidate will have extensive experience with CI/CD pipelines and automation, making a significant impact in a fast-paced, dynamic environment. Join a forward-thinking company where your expertise will drive operational excellence.

Qualifications

8+ years in IT operations, service management, or infrastructure management.
Experience in managing high-availability systems and operational reliability.

Responsibilities

Conduct problem investigations and root cause analyses (RCA).
Define and maintain an event catalog for efficient incident handling.
Manage CI/CD pipelines and automate operational processes.

Skills

IT operations

Service management

Infrastructure management

Site Reliability Engineering

DevOps lead

Root cause analysis

Incident management

CI/CD pipelines

Automation

Cloud technologies

Education

Bachelor's degree in Computer Science

Advanced degree (Masters or equivalent)

Certifications in Linux Administration

Certified Kubernetes Administrator (CKA)

Certifications in cloud platforms (AWS, Azure, Google Cloud)

Certified DevOps Professional

Educational Background

Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
Advanced degree (Masters or equivalent) is often preferred for senior positions.
Relevant certifications such as Linux Administration, Certified Kubernetes Administrator (CKA)
Certifications in cloud platforms (AWS, Azure, Google Cloud) or DevOps methodologies (e.g., Certified DevOps Professional)

Skills-

8+ years of experience in IT operations, service management, or infrastructure management, including roles such as Site Reliability Engineer, or DevOps lead
Proven experience in managing high-availability systems and ensuring operational reliability
Extensive experience in root cause analysis (RCA), incident management, and developing permanent solutions for recurring service disruptions.
Hands-on experience with CI/CD pipelines, automation, system performance monitoring, and the implementation of infrastructure as code.
Strong background in collaborating with cross-functional teams (development, operations, engineering, etc.) to improve operational processes and service delivery.
Experience in managing deployments, risk assessments, and optimizing event and problem management processes.
Familiarity with cloud technologies, containerization, and scalable architecture, including experience with zero-downtime deployment strategies.

Responsibilities:-

Problem Management

Conduct thorough problem investigations and root cause analyses (RCA) to diagnose recurring incidents and service disruptions
Coordinate with incident management teams,operations experts and collaborate with different Service Operations and Engineering teams to develop and implement permanent solutions.
Monitor the effectiveness of problem resolution activities, provide regular reports on problem management activities, and ensure continuous improvement.

Event Management

Define and maintain an event catalog, specifying active events, thresholds, and relevant remediation, and optimize it for efficiency.
Develop event response protocols, provide training to teams, and ensure quick and efficient handling of incidents.
Collaborate with stakeholders to define events, ensure coverage across the Service Operations, and drive improvements based on post-event reviews and feedback.

Deployment Management

Own the quality of new release deployment for the Service Operations, ensuring a clear process and responsibilities are assigned for smooth implementation.
Develop and maintain deployment schedules, conduct operational readiness assessments, and manage deployment risk assessments to ensure service stability.
Oversee the execution of deployment plans, coordinate resources & process with delivery and lifecycle engineering, communicate with stakeholders, and continuously work with different stakeholders to improve deployment processes based on feedback.

DevOps/NetOps Management

Manage continuous integration and deployment (CI/CD) pipelines, ensuring smooth integration between development and operational teams.
Automate operational processes, monitor system performance, and resolve issues related to automation scripts to increase efficiency.
Implement and manage infrastructure as code, provide ongoing support for automation tools, and continuously improve DevOps practices.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Reliability Engineer

Continental Reliability LLC

Remote

USD 120,000 - 180,000

2 days ago

Be an early applicant

Lead Site Reliability Engineer (Remote -CST)

Cognizant North America

Riverwoods

Remote

USD 81,000 - 142,000

5 days ago

Be an early applicant

Sr. Site Reliability Engineer

Dayforce

Remote

USD 80,000 - 120,000

Yesterday

Be an early applicant

FlightAware- Sr. Site Reliability Engineer (Remote)

Lensa

Austin

Remote

USD 101,000 - 203,000

2 days ago

Be an early applicant

Staff Software Engineer, Reliability Engineer - Store Systems & Services (Remote)

Lensa

Atlanta

Remote

USD 120,000 - 190,000

Yesterday

Be an early applicant

FlightAware- Sr. Site Reliability Engineer (Remote)

Pratt & Whitney

Remote

USD 101,000 - 203,000

5 days ago

Be an early applicant

Senior Reliability Engineer

JLL

Chicago

Remote

USD 120,000 - 140,000

2 days ago

Be an early applicant

Principal Platform Engineer – Data Ops Engineer

Directvbundles

El Segundo

Remote

USD 127,000 - 233,000

7 days ago

Be an early applicant

Sr. Site Reliability Engineer

Dayforce US, Inc.

Minnesota

Remote

USD 80,000 - 130,000

7 days ago

Be an early applicant

Site Reliability Engineer Lead

IGT Solutions

United States

Remote

USD 90,000 - 150,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Education

Job description

Similar jobs

Reliability Engineer

Remote

USD 120,000 - 180,000

Lead Site Reliability Engineer (Remote -CST)

Riverwoods

Remote

USD 81,000 - 142,000

Sr. Site Reliability Engineer

Remote

USD 80,000 - 120,000

FlightAware- Sr. Site Reliability Engineer (Remote)

Austin

Remote

USD 101,000 - 203,000

Staff Software Engineer, Reliability Engineer - Store Systems & Services (Remote)

Atlanta

Remote

USD 120,000 - 190,000

FlightAware- Sr. Site Reliability Engineer (Remote)

Remote

USD 101,000 - 203,000

Senior Reliability Engineer

Chicago

Remote

USD 120,000 - 140,000

Principal Platform Engineer – Data Ops Engineer

El Segundo

Remote

USD 127,000 - 233,000

Sr. Site Reliability Engineer

Minnesota

Remote

USD 80,000 - 130,000