Job Search and Career Advice Platform

Enable job alerts via email!

Software Engineering & Reliability Engineer

Electrum Payments

Cape Town

Hybrid

ZAR 600 000 - 800 000

Full time

5 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A next-generation payment technology company is seeking a Site Reliability Engineer to enhance the reliability and performance of its services. You will collaborate with development teams, troubleshoot incidents, and implement monitoring solutions in a cloud environment. Ideal candidates hold a Bachelor's in Computer Science and have 2+ years experience in DevOps-related roles. The position offers a people-first culture, flexible work hours, and generous leave starting at 20 days per year.

Benefits

Flexible work hours
20 days leave per year
Daily catered lunch
Regular team activities

Qualifications

  • 2+ years of experience in an SRE, DevOps, Platforms or similar role.
  • Familiarity with cloud services: Computer, Object Storage, Databases.
  • Hands-on software engineering experience.

Responsibilities

  • Monitor and improve reliability, scalability, and performance of services.
  • Collaborate with teams to develop scalable applications.
  • Respond to and investigate incidents relating to infrastructure.

Skills

Troubleshooting
Problem-solving
Cloud services
Scripting experience
Attention to detail
Time management

Education

Bachelor's degree in Computer Science

Tools

DataDog
Elastic/ELK Stack
Grafana
Job description

Electrum is a next-generation payment software technology company.

Since 2012, we've delivered trusted, enterprise-grade, cloud-native software to optimise financial transaction processing. Our deep expertise has established us as a respected partner in high-volume, low-value payment schemes, enabling clients to deliver services to millions of South Africans daily.

At Electrum, we are grounded in impact – designing solutions that matter, acting with urgency, and continuously learning as we scale. We believe in creating together – working side by side with our clients and teams to build meaningful, lasting solutions. We prioritise making it safe – encouraging open communication, smart risk-taking, and trust so that creativity and alignment thrive. And we back empowered strong teams – hiring brilliant people, collaborating hard, and holding each other to high standards while leading with empathy and kindness.

The Role

Site Reliability Engineers (SREs) are responsible for monitoring, automating, and improving the reliability, scalability, performance and availability of our services. SREs work on tasks such as preventing incidents, managing infrastructure reliability, building effective monitoring systems and ensuring smooth operations of cloud production systems.

Service Reliability and Availability
  • Collaborate with teams to develop reliable, available, and scalable applications.
  • Work closely with the development team to understand, address, and prevent technical issues.
  • Participate in on-call rotations and manage critical incidents.
  • Develop and maintain incident response processes and alerting mechanisms.
  • Develop and maintain tools to monitor application and service SLIs and SLOs.
System Troubleshooting and Problem Resolution
  • Diagnose and resolve infrastructure and system-level issues, ensuring minimal downtime and swift problem resolution.
  • Respond to and investigate incidents related to infrastructure and applications, utilising diagnostic tools to track down and remediate issues.
  • Participate in on-call rotations to provide 24/7 operational support as necessary.
Observability and Automation
  • Utilise technologies to develop and maintain effective log management and monitoring solutions for internal and external customers.
  • Evaluate system health, identify performance bottlenecks and proactively optimise performance and cost-effectiveness.
  • Implement automation tools and frameworks for deployment, configuration, and monitoring processes.
  • Capacity management and planning for systems to ensure continued reliability.
Process Improvements
  • Offer recommendations and improvements to enhance performance, security, and scalability.
  • Evaluate and integrate emerging technologies, cloud services and automation tools to improve operational efficiency.
  • Drive cost-optimization initiatives by identifying opportunities for resource right-sizing, efficiency and other cost-saving measures.
Disaster Recovery
  • Design and implement disaster recovery strategies, including backup and restoration processes, to ensure business continuity.
  • Develop and update incident management procedures, ensuring effective incident response by providing technical solutions and implementing preventative measures.
  • Regularly assess system performance, identify irregularities, troubleshoot issues, and ensure high system availability. This includes performing or facilitating Disaster Recovery tests.
Requirements
  • Bachelor\'s degree in Computer Science, Information Technology, or related field.
  • 2+ years experience in an SRE, DevOps, Platforms or similar role.
  • Familiarity with Cloud services dealing with Computer, Object Storage, Databases, Serverless Computer, Monitoring & Observability.
  • Demonstrable experience with observability tooling and pipelines, e.g. DataDog, Elastic/ELK Stack or Grafana.
  • Hands-on Software Engineering and scripting experience,
  • Proficient troubleshooting and problem-solving skills.
  • Excellent prioritisation and time management skills.
  • Attention to detail and ability to work effectively in a team environment.
Why Join Electrum?
  • We believe in a People First approach, ensuring a culture where you can thrive and make a real difference.
Your Career & Culture
  • Career Growth: Delivering world-class financial software is challenging, but your effort will earn you hands-on experience with products used by millions, accelerating your career.
  • Strong Teams: We keep teams small, focused, and collaborative to maximize impact.
  • Transparency: We openly discuss strategy, finances, and salaries. Mistakes are viewed as learning opportunities that we actively discuss.
  • Autonomy: We trust you. You\'re expected to seek out the data needed for informed decisions and manage your own time—knowing when to focus and when to recharge.
  • Shared Vision: You\'ll have the power to shape the vision of how we build the future of financial services.
Practical Perks
  • Flexible Work: Office-first environment with flexible hours.
  • Generous Leave: Starting at 20 days per year.
  • Office Perks (Cape Town): Fully-stocked kitchen and daily catered lunch.
  • Social Life: Regular team activities like hikes, getaways, and dinners
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.