Enable job alerts via email!

Site Reliability Engineer (SRE Engineer)

Sestek

Çankaya

Hybrid

TRY 60,000 - 90,000

Full time

30+ days ago

Job summary

A leading company specializing in artificial intelligence seeks a Site Reliability Engineer to ensure the health and reliability of their production environments. This role requires a proactive mindset and strong technical skills, particularly in cloud-native architectures and Kubernetes. The ideal candidate will have substantial experience in maintaining production environments, incident response, and operational metrics management. Competitive salary and benefits are offered, including flexible working arrangements and opportunities for professional growth.

Benefits

Private health insurance
Meal card
Transportation allowance
Monthly budget for activities
Training opportunities
Birthday celebrations

Qualifications

  • 3+ years of experience in SRE, Cloud Engineering or DevOps roles.
  • Strong understanding of cloud-native architectures and Kubernetes.
  • Proficient in analyzing resource usage and troubleshooting.

Responsibilities

  • Maintain stable and highly available production environments.
  • Manage cloud monitoring and resource usage tracking.
  • Design and execute reliability and resilience tests.

Skills

Cloud-native architectures
Kubernetes
System-level troubleshooting
Monitoring tools
Linux systems administration
Network fundamentals
Scripting (Bash, Python)
Incident response
Communication skills

Tools

Prometheus
Grafana
ELK
OpenTelemetry
Job description

Location:İstanbul/Ankara

Working Style:Hybrid

Overview

We are looking for a “Site Reliability Engineer” who will take ownership of the health and reliability of our production environments. This role requires a proactive mindset, attention to detail, and a strong sense of responsibility in real-time operations. You will be the primary owner of our cloud infrastructure’s monitoring, resource management, and incident response processes. Your work will directly contribute to the stability and performance of mission-critical services. If you are passionate about building reliable systems and thrive in dynamic environments, this opportunity might be just for you.

What We Expect

  • Own the goal of maintaining stable and highly available production environments.
  • Take responsibility for SBC (Session Border Controller) and network configurations, including troubleshooting and tuning.
  • Be the primary point of accountability for cloud monitoring, resource usage tracking, and cost optimization.
  • Regularly analyze system resources (CPU, memory, disk) and implement request/limit optimizations, particularly in Kubernetes environments.
  • Design and execute reliability and resilience tests to improve system robustness.
  • Manage operational metrics and alerting systems, ensuring timely responses to incidents.
  • Act as the go-to person for real-time operational attention, helping reduce response times and increase system resilience.

Who We Are Looking For

  • 3+ years of experience in SRE, Cloud Engineering, or DevOps roles.
  • Strong understanding of cloud-native architectures and Kubernetes (including request/limit tuning, autoscaling, and Helm deployments).
  • Experience with infrastructure and application monitoring tools (Prometheus, Grafana, ELK, OpenTelemetry).
  • Familiarity with incident response, on-call support, and SLA/SLO practices.
  • Proficient in analyzing resource usage (CPU, memory, disk) and performing system-level troubleshooting.
  • Experience applying cloud best practices (preferably AWS) in real-world environments.
  • Experience in designing and implementing Business Continuity and Disaster Recovery strategies.
  • Good understanding of cloud security, preferably with experience in PCI DSS compliance program.
  • Strong proficiency in Linux systems administration.
  • Solid understanding of network fundamentals, including TCP/IP, DNS, NAT, routing, firewall rules, and load balancing.
  • Ability to document operational procedures through clear and actionable SOPs or runbooks, enabling faster incident resolution, knowledge sharing, and improved on-call efficiency.
  • Experience with network troubleshooting tools.
  • Excellent communication and cross-team collaboration skills.
  • Scripting ability (e.g., Bash, Python) is a plus.
  • SBC configuration experience (e.g. Kamailio, Audiocodes) is a plus.

What we can offer you :

  • A chance to be part of a company specialized in artificial intelligence.
  • Flat organizational structure and an energetic team
  • Flexible/hybrid working style and you can work in Ankara or İstanbul office.
  • Private health insurance, meal card, transportation allowance
  • Monthly budget for external activities with your colleagues
  • Incentive for graduate and postgraduate studies
  • Training opportunities for technical and personal development as well as support for certificate programs related to the field of profession.
  • Birthday celebrations, parties, and happy hours, “Welcome to Spring/Fall” events
  • Breakfast and healthy snacks at the office all day long.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.