Enable job alerts via email!

Site Reliability Engineer (Node.js)

Thrive Learning Limited

Dubai

On-site

AED 120,000 - 180,000

Full time

7 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

A leading technology company in Dubai is seeking a Site Reliability Engineer (Node.js) to enhance AWS application performance and stability. The successful candidate will collaborate with a skilled team, troubleshoot complex issues, and document operational procedures while driving continuous improvements in service reliability.

Qualifications

  • Proven expertise in debugging Node.js applications.
  • Experience supporting AWS solutions and MongoDB applications.
  • Strong problem-solving skills in incident response.

Responsibilities

  • Diagnose and fix bugs in Node.js microservices.
  • Manage AWS environments and services.
  • Monitor system health and respond to incidents.

Skills

Debugging
Problem-solving
Cloud solutions
Node.js
Microservices
Automated deployment
Monitoring tools

Tools

AWS
MongoDB
Docker
NewRelic
DataDog
Prometheus
Grafana

Job description

As a Site Reliability Engineer (Node.js) within the SRE team, you’ll be focused on monitoring and supporting our AWS environments for platforms and tools utilised by our customers.

The SRE team specialises in giving delivery squads visibility of the performance of their services in production and support to investigate and contain potential problems.

You’ll have freedom to help research and recommend solutions for hosting applications at scale.

You’ll be fundamental in incident response, troubleshooting and containing issues.

You’ll collaborate with a highly experienced technical team to drive forward best practice as we implement and enhance our tools and services utilising cutting edge technology.

Key responsibilities

  • Diagnose and fix bugs in Node.js microservices.
  • Collaborate closely with backend engineers to improve code quality, resilience, and observability.
  • Configuration and ongoing management of environments and services on AWS.
  • Enhancing tools and processes for monitoring scalable applications on AWS.
  • Troubleshooting and resolving complex technical issues.
  • Documentation and automation of Standard Operating Procedures and Run Books as applicable.
  • Monitor system health through automated alerts, investigate issues, and take appropriate action to resolve them.
  • Respond to issues outside of working hours as per on call rota.

Basic Qualifications

  • Proven expertise in debugging, profiling, and fixing issues in live Node.js applications.
  • Experience implementing environments for web-based microservices.
  • Experience of supporting MongoDB based web applications.
  • Experience of engineering, architecting, or supporting AWS solutions.
  • Familiarity with cloud virtualisation tools such as ECS and / or Docker containers.
  • Experience working with automated deployment systems (eg. CloudFormation. CodeBuild).
  • Familiarity with any monitoring tool. for eg : NewRelic, DataDog, Prometheus, Grafana etc.
  • Strong problem-solving skills and the ability to troubleshoot complex issues.
  • Good understanding of incident response best practices, post-incident reviews, and continuous improvement.
  • Ability and willingness to proactively improve ways of working and processes.
  • Desire to continually grow, develop and improve.

Useful Skills

  • Understanding of REST, GraphQL and asynchronous messaging
  • Experience of using Git for version control.
  • Experience of Continuous Integration and Deployment advantageous.
  • Familiarity with core SRE principles encompassing areas such as monitoring, alerting, error budgets, fault analysis, and other prevalent concepts in the realm of reliability engineering.
  • Excellent written and verbal communication skills.
  • Familiarity with IT compliance and risk management requirements (eg. security, privacy, GDPR etc.)

J-18808-Ljbffr

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.