Enable job alerts via email!

Site Reliability Engineer (SRE / DevOps)

Professional.me

Dubai

On-site

AED 200,000 - 300,000

Full time

4 days ago

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

Professional.me is seeking a Site Reliability Engineer (SRE) to enhance infrastructure reliability and scalability within our AI-driven hiring platform. This mid-senior level role combines software engineering with systems thinking, guiding cross-functional teams while utilizing advanced tools to optimize performance and cost-efficiency. Your expertise will directly influence system resilience and operational processes as we scale our services in a dynamic environment.

Qualifications

5+ years of experience in SRE/DevOps roles.
Extensive hands-on experience with AWS and cloud management.
Advanced proficiency in tools like Terraform and GitHub Actions.

Responsibilities

Architect, implement, and maintain scalable infrastructure on AWS.
Manage and optimize databases ensuring high availability.
Develop and maintain CI/CD pipelines for software delivery.

Skills

AWS

Database Management

CI/CD

Infrastructure as Code

Monitoring

Scripting

Cost Optimization

Problem Solving

Education

Bachelor’s degree in Computer Science or Engineering

Tools

Terraform

GitHub Actions

Grafana

Prometheus

Python

Bash

Site Reliability Engineer (SRE / DevOps)

At Professional.me, the Site Reliability Engineer (SRE) will play a mission-critical role in scaling and securing the core infrastructure that powers our AI-driven hiring platform. As an internal hire, you’ll bring deep familiarity with our systems, products, and priorities- and now take ownership of the underlying reliability, performance, and cost efficiency across environments.

This is a senior-level position that blends software engineering with systems thinking. You’ll work closely with engineering, product, and data teams to architect and maintain the infrastructure that keeps Professional.me fast, secure, and reliable as we scale.

Key Responsibilities

Architect, implement, and maintain highly available and scalable infrastructure on AWS, leveraging advanced services and best practices for security, reliability, and cost optimization.
Manage, monitor, and tune databases including PostgreSQL, Redis, ClickHouse, and OpenSearch / ElasticSearch, ensuring optimal performance, data integrity, and high availability.
Design, deploy, and maintain robust queueing and messaging systems such as Kafka and NATS, supporting high-throughput, low-latency distributed applications.
Develop and maintain Infrastructure as Code (IaC) using Terraform, ensuring reproducibility, version control, and automated provisioning of cloud resources.
Set up, configure, and optimize CI / CD pipelines using GitHub Actions, automating build, test, and deployment workflows for rapid and reliable software delivery.
Create, manage, and enhance monitoring and observability solutions with Grafana, including the development of comprehensive dashboards and alerting systems for proactive incident response.
Conduct regular cost analysis and optimization of cloud resources, identifying opportunities to reduce spend while maintaining performance and reliability.
Collaborate closely with development, QA, and product teams to ensure seamless integration of reliability practices throughout the software lifecycle.
Lead incident response, root cause analysis, and post-mortem processes, driving continuous improvement in system resilience and operational processes.
Document infrastructure, processes, and best practices to ensure knowledge sharing and operational transparency across teams.
Stay current with industry trends, emerging technologies, and best practices in SRE, DevOps, and cloud infrastructure.

Required Experience & Skills

Extensive hands-on experience with AWS, including advanced services (EC2, RDS, S3, Lambda, VPC, IAM, CloudWatch, ECS / EKS, etc.), with a proven track record of architecting and managing large-scale cloud environments.
Deep expertise in managing, tuning, and troubleshooting databases such as PostgreSQL, Redis, ClickHouse, and OpenSearch / ElasticSearch, including backup, replication, and disaster recovery strategies.
Advanced proficiency in Infrastructure as Code using Terraform, including module development, state management, and integration with CI / CD workflows.
Strong experience with queueing and messaging systems like Kafka and NATS, including setup, scaling, monitoring, and troubleshooting in production environments.
Demonstrated ability to design, implement, and optimize CI / CD pipelines using GitHub Actions, with a focus on automation, reliability, and security.
Expert-level skills in monitoring, observability, and alerting using Grafana, Prometheus, and related tools, including dashboard creation and metric analysis.
Proven experience in cost optimization strategies for cloud infrastructure, including resource right-sizing, reserved instances, and usage monitoring.
Solid scripting and automation skills in languages such as Python, Bash, or Go, enabling efficient operations and process automation.
Strong understanding of networking, security best practices, and compliance requirements in cloud environments.
Excellent problem-solving, analytical, and troubleshooting abilities, especially in high-pressure, production-critical situations.
Effective communication and collaboration skills, with experience working in cross-functional teams and fast-paced startup environments as well as large organizations.

Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
5+ years of experience in Site Reliability Engineering, DevOps, or related roles, with demonstrated impact in both startup and enterprise settings.
Relevant certifications such as AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or Terraform Associate are highly desirable.
Experience with agile methodologies and modern software development practices.
Familiarity with incident management frameworks and ITIL processes is a plus.

Tools & Technologies

Infrastructure as Code : Terraform
CI / CD : GitHub Actions, Jenkins, CircleCI
Monitoring & Observability : Grafana, Prometheus, ELK Stack, CloudWatch
Scripting : Python, Bash, Go
Version Control : Git, GitHub
Configuration Management : Ansible, Chef, or Puppet

This role offers the opportunity to shape and optimize mission-critical infrastructure in a dynamic, technology-driven environment. The SRE will have a direct impact on system reliability, scalability, and cost efficiency, while working with cutting-edge tools and collaborating with talented teams across the organization. Success in this position will be measured by improvements in uptime, deployment velocity, cost savings, and the overall resilience of the platform.

By applying to this position, you are granting us permission to process your CV and keep your profile on file for consideration for this and future opportunities.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

IT Services and IT Consulting and Software Development

Referrals increase your chances of interviewing at Professional.me by 2x

Senior Site Reliability & DevOps Engineer

Site Reliability Engineer II - Real-Time and Big Data

Dubai, Dubai, United Arab Emirates 1 year ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

J-18808-Ljbffr

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Site Reliability Engineer (SRE / DevOps)

Professional.me

Dubai

On-site

AED 200,000 - 300,000