Enable job alerts via email!

Site Reliability Engineer (SRE) (Remote)

Remotestar

Cambourne

Remote

GBP 55,000 - 80,000

Full time

3 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

RemoteStar is seeking a Site Reliability Engineer (SRE) with 5 to 9 years of experience in a healthcare-focused environment. The role involves ensuring the performance and scalability of applications, troubleshooting operational issues, and monitoring systems. This position offers an initial remote work mode, transitioning to a hybrid model.

Qualifications

5 to 9 years of experience in SRE.
Must have a healthcare background.
Experience with cloud-native applications.

Responsibilities

Ensuring the availability, performance, and scalability of applications.
Handling operational issues like production failures and infrastructure problems.
Monitoring systems and creating plans for incident response.

Skills

Monitoring tools

Configuration management tools

Automation tools

Azure

GCP

Distributed systems

About the Company:

At RemoteStar, we’re hiring for one of our clients, a leading multinational IT services and consulting company specializing in digital transformation, cloud solutions, and AI-driven innovation. With a strong global presence, the company partners with enterprises across various industries to deliver cutting-edge technology solutions.

Job Title: Site Reliability Engineer (SRE)

Experience: 5 to 9 years

Location: Pan India (Remote)

Work Mode: Initially remote for this project. Later, the client will transition to a hybrid model (3 days from office per week).

Working Hours: 1PM to10 PM and 2PM 11PM. 5-days a week.

Industry Preference: Healthcare background is a must have.

Should be dealing with operational issues such as production failures, infrastructure problems, security, and monitoring.
Responsible for ensuring the availability, performance, and scalability of a website or application.
Work closely with developers to identify and fix potential issues before they cause problems for users.

Monitor systems and create plans for responding to incidents.
Involved in capacity planning and performance tuning to ensure that the site can handle increased traffic without issue.
Deep understanding of how distributed systems work in order to be able to troubleshoot and optimize them.

Familiar with various monitoring tools such as app dynamics, splunk, gcp operation suite.
Deep understanding of how different types of databases work in order to be able to effectively troubleshoot any issues that may arise.
Should have experience working with cloud-native applications to manage them effectively.

Ability to communicate clearly and concisely about system alerts or outages to other members of your team.
Should deal with unexpected outages or performance issues.
Experience tools include monitoring tools, configuration management tools, and automation tools.
Having good experience in Azure, GCP

Below points to be noted: candidate who can mature their SRE practice across the division. Someone who is comfortable being a champion and leader in the SRE space.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.