Enable job alerts via email!

Platform Engineer

Collier Recruitment

Johannesburg

On-site

ZAR 1 185 000 - 1 525 000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading tech recruitment firm in Johannesburg is looking for a Platform Engineer to manage cloud infrastructure and ensure security and compliance. You will own uptime and performance for production applications, manage CI/CD pipelines, and handle incident responses. The ideal candidate has strong experience with cloud platforms, CI/CD, and security practices. This role offers a competitive compensation package and opportunities for growth in a dynamic team environment.

Benefits

Competitive compensation package

Autonomy to choose tools and design processes

Opportunities for growth into leadership roles

Qualifications

3+ years operating production apps on Heroku, AWS, DigitalOcean, or similar.
Hands-on experience with GitHub Actions, Heroku CI, or equivalent.
Experience with Sentry, Paper trail (or similar), logs.
Understanding of IAM, encryption in transit/at rest, MFA/SSO.
Experience implementing automated backups and maintaining incident runbooks.
Ability to document processes clearly and work closely with developers.

Responsibilities

Own uptime, performance, and monitoring for all production applications.
Manage Heroku pipelines, CI/CD, review apps, and production environments.
Operate Celery workers, monitor health, and handle missed task check-ins.
Define and track service level objectives (SLOs) for availability.
Run periodic disaster recovery drills and lead post-mortems.
Manage daily backups and ensure restore tests are in place.

Skills

Cloud infrastructure management

CI/CD pipelines

Monitoring & incident response

Security fundamentals

Disaster recovery & backups

Communication & collaboration

Tools

Terraform

Docker

Sentry

GitHub Actions

Overview

Johannesburg, South Africa | Posted on 09/11/2025

Our client offers a groundbreaking web-based platform designed to simplify the management of share incentive plans for companies. By making the understanding, administration, and accounting of share plans more accessible, the platform caters to both listed and unlisted businesses globally.

This role is an exciting opportunity for a highly motivated individual to join a dynamic team. You will work closely with top-tier clients, assisting with share plan-related technical queries while driving client success and satisfaction.

Responsibilities

Reliability & Operations — Own uptime, performance, and monitoring for all production applications.
Manage Heroku pipelines, CI/CD, review apps, and production environments.
Operate Celery workers and queues, monitor health, and handle missed task check ins.
Define and track service level objectives (SLOs) for availability and latency (task success rate).
Maintain runbook, a centralized wiki for incident response, and lead post-mortems.
Run periodic disaster recovery drills and coordinate penetration tests.
Keep environments current (Heroku stacks, Postgres/Redis versions, DO/AWS base images).
Manage daily backups, ensure restore tests and disaster recovery runbooks are in place.
Standardize infrastructure (Terraform or scripts for DO/AWS; app.json for Heroku).
Manage Cloudflare for DNS, edge security, and performance optimization.
Optimise infrastructure costs across Heroku, DigitalOcean, and AWS.

Developer Experience & CI/CD — Maintain CI pipelines with type checking, linting, and security scanning.
Enforce test coverage and automate deploy checks (smoke tests, migration health).
Support developers with tooling for local/staging environments and build self-service.
Collaborate with developers to streamline workflows and educate on secure coding practices.

Security & Compliance — Own vulnerability management and dependency patching cadence.
Manage access reviews, secrets, MFA/SSO, and enforce least-privilege IAM policies.
Implement encryption for data at rest and in transit (e.g., S3 server-side encryption).
Contribute evidence and responses for security questionnaires and SOC 2 audits.
Maintain a “security pack” with architecture, sub-processors, and DR/backup processes.

Monitoring & Alerting — Configure Sentry ownership rules, Cron Monitors, and release health.
Set up alerts on golden signals (latency, errors, traffic, saturation) and avoid alert fatigue.
Conduct capacity planning and track resource usage trends.
Evaluate and manage vendor relationships (e.g., Mailgun, Twilio) to ensure SLAs and performance.
Assess new tools/services to enhance platform capabilities (observability, security).
Track costs, security posture, and integration quality for all third-party services.

Requirements

Must-Have
- Cloud infrastructure management: 3+ years operating production apps on Heroku, AWS, DigitalOcean, or similar.
- CI/CD pipelines: Hands-on experience with GitHub Actions, Heroku CI, or equivalent.
- Monitoring & incident response: Experience with Sentry, Paper trail (or similar), logs.
- Security fundamentals: Understanding of IAM, encryption in transit/at rest, MFA/SSO, and secure configuration practices.
- Disaster recovery & backups: Experience implementing automated backups, restore testing, and maintaining incident runbooks.
- Communication & collaboration: Ability to document processes clearly and work closely with developers in a small team.
Strong Plus
- Infrastructure as Code & automation: Experience with Terraform, Docker, or equivalent tooling.
- Asynchronous workloads: Familiarity with Celery, Redis, or other task queues and message brokers.
- Scaling & cost optimization: Capacity planning, performance tuning, and managing infra spend.
- Compliance frameworks: Exposure to SOC 2, GDPR, or supporting client security.
- Incident management: Participation in on-call rotations, leading post-mortems, or serving as incident commander.
Nice-to-Have
- Proficiency in Python; familiarity with Django/Flask.
- Experience with DNS/CDN/edge security (e.g., Cloudflare).
- Observability platforms (Prometheus, Grafana, New Relic).
- Static analysis and code quality tools (mypy, Bandit, SonarQube).
- Prior exposure to multi-tenant SaaS environments.
- Certifications (AWS Certified DevOps Engineer, CKS, or equivalent).

What we offer

Be our first Platform Engineer with real ownership to shape reliability, security, and scalability from the ground up.
Enjoy autonomy to choose tools, design processes, and drive automation across the platform.
Grow into a leadership role as our Platform/SRE/Security function expands.
Work with a diverse technical stack including Heroku, AWS, DigitalOcean, S3, and Cloudflare.
Collaborate closely with experienced developers and founders who value reliability and security.
Gain hands-on experience with SOC 2 compliance and enterprise-grade security practices.
Receive a competitive compensation package with room to grow as the company grows.
Work that matters - powering financial equity for hundreds of thousands of employees.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs