Site Reliability Engineer

Sei unter den ersten Bewerbenden.
Nur für registrierte Mitglieder
Pforzheim
EUR 55.000 - 90.000
Sei unter den ersten Bewerbenden.
Vor 4 Tagen
Jobbeschreibung

Job Description

Our client is one of the world’s leading manufacturers of semiconductor chip-making equipment. A majority of the world’s microchips receive their critical lithographic patterning in machines made by this organisation. They also produce metrology tools and advanced applications to analyze and optimize customer production processes.

Job Mission

Troubleshoot short-term problems and develop structural improvements on our distributed data and compute platform infrastructure. Ensure accuracy and precision to increase the availability of distributed computing systems across Korea, Taiwan, Israel, China, and the US. Be part of the computing platform that supports the production of next-generation microchips for companies like Apple, Samsung, and others.

Responsibilities:

  • Create awareness among other teams about methods and procedures to help prevent repetitive help requests.
  • Assist application developers in understanding the infrastructure, clusters, and systems.
  • Understand and explain how the system integrates into the customer’s ecosystem.
  • Share knowledge and mindset with other teams (development and infrastructure engineers).
  • Contribute to building VCP as a product that meets quality standards.
  • Increase stability and reliability of VCP through automation and testing.
  • Enhance customer satisfaction and product reliability.
  • Improve VCP functionality and reliability.
  • Translate customer ecosystem needs into engineering deliverables.
  • Identify and resolve system/cluster-level issues.
  • Combine individual tasks into comprehensive solutions.
  • Improve system resilience to make VCP reliable, including bug fixes and structural improvements.
  • Handle bug resolution sustainably, including regression testing and structural fixes.
  • Manage predictable component lifecycle as an ambassador.
  • Maintain the technical roadmap (application lifecycle management).
  • Support feature and service requests from the field.
  • Suggest and implement improvements to technical solutions and workflows, aligning with team and stakeholder needs.

Highly valued qualifications & experiences:

  • Experience with DC/OS.
  • Experience with zero-downtime technology introduction, including data migration.
  • Passion for automated testing and CI/CD pipelines.
  • Deep understanding of networking issues.
  • Availability to work remotely outside regular hours when necessary to build fail-safe systems.

Required qualifications & experiences:

  • Practical experience with distributed computing systems.
  • Experience with build and release tools like Maven, Nexus, Bamboo, Github.
  • Proficiency in at least one scripting language (Python).
  • Experience with Ansible.
  • Expertise in Linux.