Enable job alerts via email!

Platform Engineer (AWS, GitHub Actions, Heroku CI) (JHB) (26559)

Datafin IT Recruitment

Johannesburg

On-site

ZAR 600 000 - 800 000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A tech recruitment agency in Johannesburg is seeking a skilled Platform Engineer to manage Heroku pipelines, CI/CD processes, and performance optimization. The successful candidate will have a strong background in cloud infrastructure, monitoring, and incident response, along with excellent communication skills for collaboration. Ideal for those with 3+ years of experience in managing production applications in cloud environments.

Qualifications

  • 3+ years’ operating production apps on Heroku, AWS, DigitalOcean, or similar.
  • Hands-on experience with GitHub Actions, Heroku CI, or equivalent.
  • Experience with Sentry, Papertrail (or similar), logs, and dashboards.

Responsibilities

  • Manage Heroku pipelines, CI/CD, review apps, and production environments.
  • Operate Celery workers and queues, monitor health, and handle missed task check-ins.
  • Define and track service level objectives (SLOs).

Skills

Operating production apps on Heroku
CI/CD pipelines with GitHub Actions
Monitoring and incident response
Security fundamentals
Disaster recovery and backups
Communication and collaboration

Tools

Terraform
Docker
Celery
Job description
SUMMARY :

POSITION INFO :

ENVIRONMENT :

A provider of cutting-edge Financial Tools in Joburg seeks the technical expertise of a Platform Engineer to manage Heroku pipelines, CI / CD, review apps, and production environments. You will also operate Celery workers and queues, monitor health, and handle missed task check-ins, manage Cloudflare for DNS, edge security, and performance optimisation & collaborate with Developers to streamline workflows and educate on secure coding practices. The ideal candidate must have 3+ years’ operating production apps on Heroku, AWS, DigitalOcean, or similar, CI / CD pipelines : Hands‑on experience with GitHub Actions, Heroku CI, or equivalent; solid Git fundamentals and Monitoring & incident response : Experience with Sentry, Papertrail (or similar), logs, and uptime / performance dashboards.

DUTIES :
Reliability & Operations -
  • Own uptime, performance, and monitoring for all production applications.
  • Manage Heroku pipelines, CI / CD, review apps, and production environments.
  • Operate Celery workers and queues, monitor health, and handle missed task check‑ins.
  • Define and track service level objectives (SLOs) (availability, latency, task success rate).
  • Maintain runbooks, a centralised wiki for incident response, and lead post‑mortems.
  • Run periodic disaster recovery drills and coordinate Penetration Tests.
Platform Engineering -
  • Keep environments current (Heroku stacks, Postgres / Redis versions, DO / AWS base images).
  • Manage daily backups, ensure restore tests and disaster recovery runbooks are in place.
  • Standardise infrastructure (Terraform or scripts for DO / AWS; app.json for Heroku).
  • Manage Cloudflare for DNS, edge security, and performance optimisation.
  • Tune performance (DB indices, query optimisation, cache usage, Celery queue design).
  • Optimise infrastructure costs across Heroku, DigitalOcean, and AWS.
Developer Experience & CI / CD -
  • Maintain CI pipelines with type checking, linting, and security scanning.
  • Enforce test coverage and automate deploy checks (smoke tests, migration health, error budgets).
  • Support Developers with tooling for local / staging environments and build self‑service dashboards (e.g., Celery queue status).
  • Collaborate with Developers to streamline workflows and educate on secure coding practices.
Security & Compliance -
  • Own vulnerability management and dependency patching cadence.
  • Manage access reviews, secrets, MFA / SSO, and enforce least‑privilege IAM policies.
  • Implement encryption for data at rest and in transit (e.g., S3 server‑side encryption).
  • Contribute evidence and responses for security questionnaires and SOC 2 audits.
  • Maintain a “security pack” with architecture, sub‑processors, and DR / backup processes.
Monitoring & Alerting -
  • Configure Sentry ownership rules, Cron Monitors, and release health.
  • Centralise metrics / logs (Heroku metrics, Papertrail, Sentry, APM, Prometheus / New Relic).
  • Set up alerts on golden signals (latency, errors, traffic, saturation) and avoid alert fatigue.
  • Conduct capacity planning and track resource usage trends.
Vendor & External Services -
  • Evaluate and manage vendor relationships (e.g., Mailgun, Twilio) to ensure service level agreements (SLAs) and performance.
  • Assess new tools / services to enhance platform capabilities (e.g., observability, security).
  • Track costs, security posture, and integration quality for all third‑party services.
REQUIREMENTS : Must‑Haves -
  • Cloud Infrastructure Management : 3+ years’ operating production apps on Heroku, AWS, DigitalOcean, or similar.
  • CI / CD pipelines : Hands‑on experience with GitHub Actions, Heroku CI, or equivalent; solid Git fundamentals.
  • Monitoring & incident response : Experience with Sentry, Papertrail (or similar), logs, and uptime / performance dashboards.
  • Security Fundamentals : Understanding of IAM, encryption in transit / at rest, MFA / SSO, and secure configuration practices.
  • Disaster recovery & backups : Experience implementing and operating automated backups, restore testing, and writing / maintaining incident runbooks.
  • Communication & collaboration : Ability to document processes clearly and work closely with Developers in a small team.
Strong Plus -
  • Infrastructure as Code & automation : Experience with Terraform, Docker, or equivalent tooling.
  • Asynchronous workloads : Familiarity with Celery, Redis, or other task queues and message brokers.
  • Scaling & cost optimisation : Capacity planning, performance tuning, and managing infra spend.
  • Compliance frameworks : Exposure to SOC 2, GDPR, or supporting client security questionnaires.
  • Incident management : Participation in on‑call rotations, leading post‑mortems, or serving as incident commander.
Nice-to-Haves -
  • Certifications (AWS Certified DevOps Engineer, CKS, or equivalent).
  • Proficiency in Python; familiarity with Django / Flask.
  • Experience with DNS / CDN / edge security (e.g., Cloudflare).
  • Observability platforms (Prometheus, Grafana, New Relic).
  • Static analysis and code quality tools (mypy, Bandit, SonarQube).
  • Prior exposure to multi‑tenant SaaS environments.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.