Enable job alerts via email!

Site Reliability Engineer (SRE II) (Kubernetes/Python)

k0deHut

Remote

ZAR 500 000 - 700 000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading South African fintech is seeking an Intermediate Site Reliability Engineer (SRE II) to maintain and improve their infrastructure primarily in Google Cloud and AWS. This remote role offers flexibility and an opportunity to work with a strong technical team. Candidates should be skilled in Kubernetes, Python, and Terraform, with a commitment to operational excellence and continuous learning.

Benefits

Flexible working hours

Remote work most days

Opportunity to learn and grow

Qualifications

Intermediate Site Reliability Engineer role.
Experience with public clouds, primarily Google Cloud and AWS.
Good working understanding of system operations.

Responsibilities

Improving and maintaining infrastructure using Terraform.
Owning infrastructure projects from start to finish.
Documenting infrastructure design and tooling usage.
Participating in the on-call rotation for production support.

Skills

Kubernetes

Python

Terraform

CI/CD with Jenkins

Golang

Bash

OAuth2

Tools

MongoDB Atlas

LogDNA

Kong API Gateway

Microservice Architecture

Site Reliability Engineer (SRE II) (Kubernetes/Python)

Job Openings Site Reliability Engineer (SRE II) (Kubernetes/Python)

About the job Site Reliability Engineer (SRE II) (Kubernetes/Python)

Intermediate Site Reliability Engineer (SRE II)

Our Client is offering the right candidate a great opportunity to join a fast growing South African fintech that enables seamless and innovative end-to-end customer onboarding services that drive conversion rates, prevent fraud, reduce risk and costs. They provide automated and easy to implement solutions that fully onboard a new customer in under two minutes.

You'll work in a small, senior team that operates on trust and high collaboration. The team works remotely most of the time and occasionally comes into the office when more direct collaboration is required. You should be motivated to achieve operational excellence using automation tooling (e.g. Terraform) and enjoy keeping your technical skills current to allow you to contribute to architectural discussions. Naturally, you'll be exposed to many aspects of our business from day one. They will ensure that you have the tools and support to do great work, but you'll also have the freedom to try new things and learn.

Infrastructure & Software Stack

CI/CD with Jenkins
Kong API Gateway
LogDNA
Falco
MongoDB Atlas
Microservice Architecture with Event Sourcing and CQRS

Your responsibilities will include:

Improving and maintaining our infrastructure using Terraform, which includes making effective use of public clouds (primarily Google Cloud and AWS) while considering:
Security
Maintainability
Scalability
Ensuring our infrastructure is automated and reproducible across environments
Leveraging Kubernetes in an effective manner to host our applications
Owning infrastructure projects from start to finish and driving them to completion within agreed timeframes
Documenting infrastructure design and how tooling should be used
Regularly considering the long-term vision for our infrastructure and our alignment to it
Making well-considered tradeoffs between short-term infrastructure requirements
and long-term objectives
Identifying potential improvements that could enable us to deliver faster without compromising operational objectives
Managing our identity platform and enabling enterprise user and system authentication and authorization using OAuth2
Writing, testing and executing change control plans for production changes with an eye for detail to spot potential issues
Having a good working understanding of how our systems operate and be able to debug production issues
Being part of our on-call rotation. When on-call, you will work on repaying technical debt and deal with operational incidents as and when they occur. This will require you to have or acquire a good general knowledge of production operations for technical support.
Being part of our security incident response team
Writing operational tooling to automate otherwise manual processes (e.g. Golang, Bash)
Performing high quality, ego-free code reviews for your colleagues as well as submitting your code for review by others and accepting their feedback generously
Taking ownership of our operational metrics and drive visibility, testing and improvement initiatives
Working effectively with the development team to plan and deploy required infrastructure changes or new capabilities ahead of time and unblocking the development team when unforeseen infrastructure blockers arise
Accepting feedback willingly and sharing your knowledge freely
Flexible working hours and leave (no clock watching)
Strong values that are practised
Remote work for most days of the week
Opportunity to learn and grow being surrounded by a strong technical team

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions