Job Search and Career Advice Platform

Enable job alerts via email!

Senior Infrastructure Operation Engineer

iSanqa Resourcing

Midrand

Hybrid

ZAR 800 000 - 1 200 000

Full time

2 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology resourcing agency is seeking a High Performance Computing (HPC) Engineer to manage and optimize its HPC infrastructure for advanced automotive design and engineering. The role demands a background in Ansible, Kubernetes, and license management, along with hands-on experience of 6-8 years in related fields. The position allows hybrid working with a contract term from December 2025 to December 2028, requiring skills in automation, leadership, and efficient support for critical engineering workflows.

Qualifications

  • Minimum of 6 years IT working experience.
  • ITIL process knowledge and work experience.
  • Minimum 4 years experience in an operations environment.

Responsibilities

  • Manage and optimize the HPC infrastructure.
  • Lead license management automation.
  • Ensure high system reliability and performance.

Skills

Ansible Automation Platform
Kubernetes
License Management
Windows OS
Linux OS
Terraform
Python
Bash
PowerShell

Education

Degree in Information Systems

Tools

Dynatrace
OpenLM
Matlab
FlexLM
ServiceNow (ITSM)
Docker
Confluence
Jira
Job description
Overview

Our client is looking for a High Performance Computing (HPC) Engineer to manage and optimize the HPC infrastructure that powers advanced automotive design and engineering. In this role you will lead license management automation and operational support for enterprise-scale HPC environments supporting CAD CAE and PDM applications leveraging tools such as Ansible Kubernetes and Dynatrace.

Your infrastructure expertise will enable high system reliability, performance and automation across global platforms ensuring seamless operation of critical engineering workflows with robust 24/7 support.

Senior infrastructure engineering with Ansible Kubernetes and license management. Hybrid and remote working flexibility with 1960 flexible annual hours. Leadership role with operations meetings, stakeholder management and incident coordination.

POSITION: Contract: 01 December 2025 – 31 December 2028

EXPERIENCE: 6-8 years related experience

COMMENCEMENT: 01 December 2025

LOCATION: Hybrid: Midrand/Menlyn/Rosslyn/Home Office rotation

TEAM: High Performance Computing (HPC)

HPC provides a robust and scalable foundation for CAD, CAE and PDM applications. It supports complex workflows and delivers high-performance computing systems and job flow operation to enable efficient management of users, data, code, monitoring, deployment and middleware.

HPC aims to drive innovation, optimize design and engineering processes, and streamline product development workflows through integrated applications and infrastructure.

Qualifications / Experience

Minimum mandatory qualifications:

  • Degree in Information Systems or equivalent experience
  • Minimum of 6 years IT working experience
  • ITIL process knowledge and work experience (Required)
  • Minimum 4 years experience in an operations environment

Advantageous qualifications:

  • ITIL certification (advantageous)

Advantageous experience:

  • Proven track record of successful infrastructure projects
  • Leadership experience in operations teams
  • Agile working experience
Essential Skills & Requirements

Infrastructure & Automation:

  • System management experience in Ansible Automation Platform/Ansible Tower
  • Experience with Ansible Tower or AWX for managing and scaling Ansible automation
  • Advanced experience in Kubernetes and Dynatrace
  • Experience in administrating Windows and Linux OS (client / server)
  • Thorough knowledge of Linux and Linux commands

License Management:

  • Experience in License services administration and management such as OpenLM, Matlab and FlexLM
  • Software License Management installation updates, etc.

Operations & Support:

  • Experience in IT-Operations standby support and ticket management
  • AGILE Project Management knowledge and PIC processes

Scripting & Automation:

  • Proficiency in scripting languages such as Terraform, Python, Bash or PowerShell
  • Supervision: Completely independent worker that will only escalate tasks that are complex and outside of their span of control. Expected to lead operational meetings and coordinate teams
  • Problem solving: Improve a product or a system that already exists by making conceptual changes and enhancements. Can manage the solution for complex problems that may require simple solutions but affect multiple systems. Lead incident resolution and post-mortem analysis
  • Communication: Influence and strong interpersonal and communication skills. Excellent organizational and presentation skills. Willingness to engage with international customers
  • Delivery: Specify new products, processes, standards based on organization strategy; set short- to mid-term operational plans. May need to guide lower-level employees
  • Knowledge: Functional expert with mastery of a specific professional discipline
Soft Skills
  • Ability to work interdependently and submit deliverables on time with high quality
  • Self-starter with leadership capabilities
  • Good interpersonal and organizational skills with the ability to communicate effectively (both verbally and written) on technical and non-technical levels
  • Proactive problem solving and critical thinking
  • Strong ethics and compliance mindset
  • Flexibility to take up different tasks and multi-task
Advantageous Skills
  • Experience with configuration management practices and tools ensuring consistent configuration
  • Confluence / Jira
  • Leadership and stakeholder coordination
  • Experience in ServiceNow (ITSM)
  • Experience with containerisation (e.g., Docker)
  • Experience with public and private cloud services (e.g., Azure, AWS, Google)
Role Requirements

License Management & Administration:

  • Administration and monitoring of license services, including server and triades on Windows and Linux
  • License service upgrades
  • Client license tracking
  • Support and consulting with license suppliers (logs, test new releases, etc.)
  • Investigating new platforms for license management

Operations & Incident Management:

  • Lead operational meetings
  • Incident-, problem-, and change-management
  • Handling and fixing IT security issues
  • On-call duty 24/7
  • Leading post-incident reviews and major incident management
  • Supporting Infrastructure Feature Teams in post-processing of major incidents

System Monitoring & Performance:

  • Monitor and maintain health and performance of applications
  • Troubleshoot and resolve issues to minimize downtime
  • Analyze logs and metrics to address potential issues
  • Implement and maintain monitoring and alerting solutions

Documentation & Compliance:

  • Document processes/configuration and incident reports
  • Ensure adherence to security and compliance standards

Software & Migration:

  • Software migration and agent distribution
  • Security lifecycle measures

Coordination & Leadership:

  • Coordinate external contractors
  • Measure and communicate operation KPIs and drive quality improvements
  • Prioritize operational scope for business-critical processes
  • Support system design with run-ready requirements
  • Steer IT Service Continuity Management for critical processes
  • Incorporate requirements from regulated business units
  • Request measures in PRIME processes for operational/compliance needs
  • Plan and manage overarching operation budget

NB:

  • South African citizens/residents preferred; valid work permits considered
  • By applying you consent to be added to the database and to receive updates
  • If you do not receive a response within 2 weeks, please consider your application unsuccessful

Note: This job description is provided for information purposes and may be subject to change.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.