Enable job alerts via email!

Site Reliability Engineer

Iqtalent

Emea

Remote

IDR 1.334.222.000 - 2.001.335.000

Full time

Today
Be an early applicant

Job summary

A leading technology company is seeking an experienced Site Reliability Engineer (SRE) to optimize and automate performance using insights from real-time data. The role offers total flexibility, default remote working, and unlimited paid holidays. Candidates should be proficient in Kubernetes, cloud infrastructure, and have strong analytical and collaboration skills.

Benefits

Unlimited paid holidays
Employee share scheme
Generous maternity and paternity leave
Volunteering Days
Company retreats
Employee Wellbeing platform

Qualifications

  • Strong collaboration skills.
  • Launching and operating production Kubernetes clusters.
  • Designing and operating infrastructure on AWS.
  • Operating MongoDB and Redis clusters.
  • Administering Linux servers.
  • Maintaining distributed software.
  • Operating Prometheus and Grafana.
  • Operating logging collection and analysis systems.

Responsibilities

  • Proactive monitoring and issue resolution.
  • Collaborating on alerting and monitoring systems.
  • Contributing to key performance metrics.
  • Developing solutions for performance enhancement.
  • Gathering and analyzing performance metrics.
  • Driving innovation for system and infrastructure optimization.
  • Ensuring scalability for customer demands.
  • Automating cloud operations tasks.
  • Designing and delivering software solutions.
  • Participating in root cause analysis sessions.
  • Creating documentation for operational processes.
  • Providing on-call support for Cloud services.
  • Planning and executing software upgrades.

Skills

Kubernetes & containers
Go and/or Python
AWS
Linux
Terraform and IaC
Helm
MongoDB
Redis
Monitoring & logging
Networking concepts
Networking protocols
Job description
Overview

The role is for an experienced SRE at Tyk who is focused on optimising, automating, and improving performance using insights from real-time, massive-scale data. We’re seeking an original thinker, a challenger, a technical legend, and an opinionated collaborator who wants to make things better.

At Tyk, we’re obsessed with building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions.

Total flexibility, default remote, radical responsibility

We offer unlimited paid holidays and remote working from anywhere in the world, for everyone. Why? Tyk was founded on the principle of offering flexibility and autonomy to our employees; we believe this allows our employees to achieve their best results. It also means we can build the best possible team, location and working hours are no barrier.

If this sounds like an environment that you believe could work for you then read on to find out more.

Responsibilities
  • Proactive Monitoring: Ensure our production Cloud environment operates within defined SLAs through vigilant monitoring and proactive issue resolution.
  • Alerting and Monitoring: Collaborate with Senior SRE to identify opportunities for building proactive alerting and monitoring systems; implement solutions to enhance system reliability.
  • Performance Metrics: Contribute to defining key performance metrics for Cloud services, enabling performance improvements and success measurement.
  • Solutions Development: Propose and develop solutions to maintain and enhance key performance indicators (KPIs) across our Cloud infrastructure.
  • Data Analysis: Gather and analyse metrics from operating systems and applications to optimise system performance and expedite fault resolution.
  • Innovation: Drive innovation by optimising system and infrastructure performance, anticipating customer needs, and proactively addressing scaling demands.
  • Scalability: Work closely with commercial functions to optimise our platform for scalability and meet growing customer demands.
  • Cloud Infrastructure: Analyse and ensure the automation, scalability, and efficient management of our Cloud infrastructure.
  • Automation: Execute automation for known cloud operations tasks and create new automation solutions to streamline processes.
  • Software Development: Design, write, and deliver software and automation solutions to enhance the availability, scalability, latency, and efficiency of our PaaS services.
  • Root Cause Analysis: Participate in blame-free root cause analysis meetings to promote learning and continuous system improvement in the event of production system incidents.
  • Documentation: Create and contribute to policies and runbooks to ensure that operational processes are well-documented and consistently followed.
  • On-call Support: Provide on-call support, ensuring our Cloud services follow a 24/7 model by promptly responding to alerts, meeting SLAs, and automating root cause analysis.
  • Upgrades and Migrations: Plan and execute software upgrades, including Kubernetes versions. Manage and communicate migrations from Classic Cloud to the new Cloud platform.
Qualifications
  • Strong collaboration skills
  • Launching and operating production Kubernetes clusters
  • Designing and operating infrastructure on AWS and other providers
  • Operating MongoDB (or other document database) clusters
  • Operating Redis (or other key-value storage) clusters
  • Administering Linux servers
  • Maintaining distributed software
  • Operating Prometheus and Grafana
  • Operating logging collection and analysis system
Skills
  • Kubernetes & containers (proficient)
  • Go and/or Python (advanced)
  • AWS (proficient)
  • Linux (proficient)
  • Terraform and IaC in general (proficient)
  • Helm (familiar)
  • MongoDB (or similar)
  • Redis (or similar)
  • Monitoring & logging
  • Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.)
  • Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP)
Benefits
  • Everyone has unlimited paid holidays.
  • We have total flexibility in hours, as we believe creativity flows better when our people are given freedom to decide when they are most productive. Everyone is unique after all.
  • Employee share scheme
  • Generous maternity and paternity leave
  • Volunteering Days
  • Company retreats
  • Employee Wellbeing platform

We all share the same vision – we value authenticity, respect, responsibility, independence, honesty, diversity and inclusion and most importantly treating others how you wish to be treated. We look for like-minded people who bring their personalities to work everyday, strive to achieve their personal goals and who are willing to challenge the way we do things, why? – to make what we do even better!

Our values
  • It’s ok to screw up!

We’ve found that it’s often the ‘stupid’ or unexpected ideas that turn out to be the successful ones – so try it, at least we can say we have!

  • The only stupid idea, is the untested one!

It’s in our DNA – starting a business with founders 12 hours apart, giving our gateway away for free – sure, we did that, and we’d do it again!

  • Trust starts with you – make it count!

Trust is a two-way street – instil it from day one!

  • Assume best intent!

We have each other’s back – we’re all on the same team. Think before you speak or act.

  • Make things better!

Always try to leave things better than when you found them – change is constant, inevitable and embraced! Be that change we want to see.

What’s it like to work here?! check it out: tyk.io/worklife/

Tyk is an equal opportunities employer and we are determined to ensure that no applicant or employee receives less favourable treatment on the grounds of gender, age, disability, religion, belief, sexual orientation, marital status, or race, or is disadvantaged by conditions or requirements which cannot be shown to be justifiable.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.