Site Reliability Engineer

Nur für registrierte Mitglieder

Stuttgart

Remote

EUR 60.000 - 100.000

Jobbeschreibung

Our client is one of the world’s leading manufacturers of semiconductor chip-making equipment. A majority of the world’s microchips receive their critical lithographic patterning in machines made by this organisation. In addition, they produce metrology tools and advanced applications to analyze and optimize the performance of the customer production process.

Job Mission

Troubleshoot short-term problems and translate, develop into structural improvements on our distributed data and compute platform infrastructure. Be accurate, be precise and help drive up the aggregate availability of the installs of these distributed computing systems in Korea, Taiwan, Israel, China and the US (etc.). Be part of the computing platform that is one of the main pillars under the production of the next-generation microchips of Apple, Samsung and many others.

Responsibilities :

Create awareness in other teams about methods and procedures we use to help them to prevent repetitive help requests.
Help application developers to understand the infrastructure / cluster / system
Understand and explain how the system fits into the customer’s ecosystem
Share knowledge and mindset with other teams (dev / infra engineers)
Contribute towards building VCP as a product that meets our standards of quality
Increase stability and reliability of VCP through automated testing and automation
Enhance customer satisfaction and product reliability
Improve the functionality and reliability of VCP
Translate customer ecosystem needs into engineering deliverables
Identify and resolve system or cluster-level issues
Combine individual ‘stories’ into a comprehensive solution
Make VCP reliable by improving system resilience (bug-fixing and beyond)
Resolve bugs sustainably (implement regression tests, design structural fixes)
Promote predictable component lifecycle management
Maintain the technical roadmap (application lifecycle management)
Support feature and service requests from the field
Suggest and implement improvements to our technical solutions and workflows in collaboration with your team and stakeholders

Highly Valued Qualifications & Experiences :

Experience with data centers and operating systems
Experience with zero-downtime technology introduction, including data migration
Passion for automated testing and qualification, ideally as part of CI/CD pipelines
Deep understanding of networking issues
Willingness to work remotely outside regular hours when necessary to build fail-safe systems (preferably an exception, not the rule)

Required Qualifications & Experiences :

Practical experience with distributed computing systems (must!)
Experience with build and release infrastructure (Maven, Nexus, Bamboo, Github)
Familiarity with at least one scripting language (Python)