Enable job alerts via email!

Site Reliability Engineer

Rangam Infotech Pvt. Ltd.

Bengaluru

On-site

INR 12,00,000 - 18,00,000

Full time

7 days ago
Be an early applicant

Job summary

A leading tech company in Bengaluru is seeking a Senior Associate to enhance service reliability and operations through Site Reliability Engineering. The role involves driving initiatives for service stability, building partnerships across teams, and ensuring adherence to ITSM best practices. Ideal candidates should possess strong technical knowledge in observability and incident management. A Bachelor’s degree in IT or related field is preferred. This position offers a dynamic working environment with opportunities for continuous improvement.

Qualifications

  • Bachelor’s degree or equivalent experience in an IT-related discipline preferred.
  • Technical knowledge of SRE focus areas, including observability with Datadog.
  • Excellent communication and influencing skills.
  • Experience with industry best practices and process improvement.
  • Initiative-driven, resilient, and positive attitude.

Responsibilities

  • Drive high stability and availability of services through Site Reliability Engineering practices.
  • Build partnerships with Product Engineering teams and drive beneficial initiatives.
  • Be available 24/7 as an escalation point for operational teams.
  • Reduce MTTR and service impact.
  • Implement ITSM best practices.

Skills

Site Reliability Engineering
ITSM methodologies
Incident Management
Problem Management
Change Management
Capacity planning
Communication skills
Monitoring tools (Datadog)

Education

Bachelor’s degree in IT or equivalent experience

Tools

Datadog
ITRS

Job description

Infrastructure Platform Engineering (IPE), part of the client Infrastructure & Cloud organization, is searching for a senior Associate to drive Site Reliability Engineering (SRE) and a professional, best-in-class approach to service operations across the Production infrastructure environment.

IPE operates globally with around 600 people in functionally aligned teams across Data Centres, Storage, Platforms, Database, Middleware, and the virtualized Private Cloud.

This role requires working as a senior Associate, collaborating with teams across IPE, and promoting a Site Reliability culture during APAC hours. The role involves partnering with regional squads to improve infrastructure and service centricity across all teams.

As an Infrastructure SRE, the candidate should have a sound understanding of ITSM methodologies, specifically Service Operations including Incident, Problem, and Change Management. The role champions continuous service improvement, using policies as a framework and reporting on service performance, focusing on SRE areas.

The role also entails driving Site Reliability principles, enhancing service resilience, scalability, and performance across our critical infrastructure in collaboration with IPE teams. Responsibilities include ensuring service data quality, policy compliance, hygiene/metrics, SLAs, best practices in infrastructure management, proactive monitoring, capacity planning, security collaboration, vendor engagement, scenario testing, and continuous training/upskilling.

KEY RESPONSIBILITIES:

  • Drive high stability and availability of services through Site Reliability Engineering practices.
  • Build partnerships with Product Engineering teams and drive beneficial initiatives.
  • Be available 24/7 as an escalation point for operational teams.
  • Reduce MTTR and service impact.
  • Address technical debt to mitigate risks.
  • Reduce incident recovery times.
  • Assist in major incidents owned by IPE.
  • Validate service communications during major incidents from a technical perspective.
  • Standardize and improve incident recovery, problem management, resilience, and availability processes.
  • Implement ITSM best practices, create knowledge articles, runbooks, and process documents.
  • Manage IPE Technical Recovery and Problem Management responses, ensuring cross-team coordination.
  • Oversee and govern key resilience requirements for applications within IPE.
  • Identify trends and opportunities for Service Improvement Programs and drive them to completion.

MINIMUM REQUIREMENTS:

  • Bachelor’s degree or equivalent experience in an IT-related discipline preferred.
  • Technical knowledge of SRE focus areas, including observability with Datadog, capacity management, etc.
  • Excellent communication and influencing skills.
  • Experience with industry best practices and process improvement.
  • Initiative-driven, resilient, and positive attitude.
  • Strong negotiation and influencing skills to overcome resistance.
  • Ability to manage time-critical incident and recovery situations, liaising with stakeholders.
  • Extensive experience with monitoring tools like Datadog, ITRS, etc.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.