Enable job alerts via email!

DET-TT-Resilience and Reliability Engineer

EY

Coimbatore District

On-site

INR 15,00,000 - 25,00,000

Full time

Today
Be an early applicant

Job summary

A global professional services firm is seeking a Senior Reliability Engineer to enhance IT solutions by applying software engineering principles. Responsibilities include defining SLAs, engineering resilient designs, and automating processes to optimize operational efficiency. Ideal candidates will have over 7 years of experience in software product engineering, with proficiency in Java, cloud technologies, and observability tools. This role is based in Coimbatore, Tamil Nadu and offers opportunities for innovation in IT risk management.

Qualifications

  • 7+ years of experience in software product engineering principles.
  • Hands-on experience with Java / J2EE and web servers.
  • Experience in at least one CI-CD tool and IaC tools.
  • Experience in cloud technologies and reliability tools.
  • Strong knowledge of performance monitoring and troubleshooting.

Responsibilities

  • Define SLA/SLO/SLI for products/services.
  • Engineer resilient designs into solutions.
  • Develop automated processes to reduce manual effort.
  • Create observability solutions to track SLA adherence.
  • Manage critical situations effectively.

Skills

Java / J2EE
CI-CD tools
Cloud technologies
Linux (RHEL)
Observability tools
Microservices
Automation scripting (e.g., Python)
Performance tuning

Tools

Docker
Terraform
Azure DevOps
Dynatrace
Splunk
Job description
Description

Senior Reliability Engineer

  • Reliability Engineering (SRE) is a modern way of delivering IT Solutions by imbibing Software engineering principles in Service Delivery to reduce IT Risk to business, improve business resilience, attain predictability & reliability, optimize cost of IT Infra and Ops
  • A Reliability Engineer typically has deep software engineering experience encompassing design, build, deploy and manage / maintain an IT solution ensuring resilience, reliability, and performance.
  • A Reliability Engineer is a bridge between development and operations by applying a software engineering mindset to the development, deployment, and maintenance of applications to maximize system reliability & automation, while improving efficiencies by optimizing resources
  • Defining SLA/SLO/SLI for a product / service
  • Engineering in resilient design and implementation practices into solutions as they go through the product life cycle
  • Engineering out manual effort (Toil) through the development of automated processes and services (e.g., Automated Management of Systems, CI/CD improvements)
  • Developing Observability Solutions to track, report, and measure SLA adherence
  • Help Optimize Cost of IT Infra & Operations - FinOps
  • Critical Situation management
  • SOP / Runbook automation, Toil reduction
  • Data Analytics & System trend analysis
Typical Skills and Background
  • 7+ years of experience in software product engineering principles, processes and systems
  • Hands-on experience in Java / J2EE, one of web server (Apache Tomcat or IBM HTTP Server), one of the application servers (Tomcat/WebSphere), and any major RDBMS like Oracle
  • Hands-on experience in at least one CI-CD (Azure DevOps, GitLab CI/CD, Jenkins) and IaC tools (Terraform, AWS CloudFormation, Ansible etc.)
  • Experience in at least one cloud technology (AWS/Azure/GCP etc. and Docker, Pivotal, Kubernetes, OpenShift etc.) and its reliability tools (Azure AppInsight, CloudWatch, Azure Monitor etc.)
  • Experience in Linux (RHEL) operating system performance monitoring parameters and their interpretation, commands used for monitoring
  • Experience in Observability - APM tools (Dynatrace, AppDynamics etc.), metrics / log consolidation (Splunk) and ELK Stack
  • Defining NFRs and SLA/SLO/SLI agreement for a product / platform / services
  • Knowledge on queuing models used, thread pools, request servicing processes etc.
  • Knowledge in Web Services, SOA, ESB (DataPower), RESTFul
  • Knowledge of application design patterns, J2EE application architectures, Microservices, Spring boot & Cloud native architectures
  • Proficiency in Java runtimes, Core Java, Garbage collection, JVM parameters tuning
  • Experience in performance tuning on Application Servers (Tomcat/WAS)
  • Experience in trouble shooting Performance / Scalability / Availability issues
  • Experience in Thread dump, heap dump generation & analysis
  • Knowledge on Query tuning and database designs & models
  • Knowledge at least one automation scripting language like Python
  • Mastery in collaborative software development using Git, Jira, Confluence etc.
  • AI/ML & Data Analytics knowledge and experience is a desirable
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.