Aktiviere Job-Benachrichtigungen per E-Mail!

Site Reliability Engineer

ZipRecruiter

Erfurt

Remote

EUR 60.000 - 90.000

Vollzeit

Vor 6 Tagen

Sei unter den ersten Bewerbenden

Erhöhe deine Chancen auf ein Interview

Erstelle einen auf die Position zugeschnittenen Lebenslauf, um deine Erfolgsquote zu erhöhen.

Zusammenfassung

A leading company in semiconductor manufacturing seeks an expert for distributed computing systems. The role focuses on troubleshooting, automation, and collaborative problem-solving to enhance reliability and customer satisfaction across global deployments in the tech industry.

Qualifikationen

Experience with distributed computing systems.
Proficiency in scripting languages, preferably Python.
Expertise in Linux systems.

Aufgaben

Troubleshoot and develop improvements for the distributed data platform.
Maintain technical roadmap for application lifecycle management.
Conduct regression tests and structural fixes for bugs.

Kenntnisse

Automation

Networking

Tools

Maven

Nexus

Bamboo

Github

Ansible

Job Description

Our client is one of the world’s leading manufacturers of semiconductor chip-making equipment. A majority of the world’s microchips receive their critical lithographic patterning in machines made by this organization. In addition, they produce metrology tools and advanced applications to analyze and optimize the performance of the customer production process.

Job Mission

Troubleshoot short-term problems and develop structural improvements on our distributed data and compute platform infrastructure. Ensure accuracy and precision to increase the availability of these distributed computing systems across Korea, Taiwan, Israel, China, and the US. Be part of the computing platform that is a key component in the production of microchips for companies like Apple, Samsung, and others.

Responsibilities:

Create awareness in other teams about methods and procedures to help prevent repetitive help requests.
Assist application developers in understanding the infrastructure, clusters, and systems.
Understand and explain how the system integrates into the customer’s ecosystem.
Share knowledge and mindset with other teams (developers and infrastructure engineers).
Contribute to building VCP as a quality product.
Increase stability and reliability of VCP through automation and automated testing.
Enhance customer satisfaction and product reliability.
Improve the functionality and reliability of VCP.
Translate customer ecosystem needs into engineering deliverables.
Identify and resolve system/cluster-level issues.
Combine individual tasks into comprehensive solutions.
Improve system resilience to make VCP reliable, including bug fixing and structural improvements.
Implement regression tests and structural fixes for bugs.
Manage component lifecycle predictably.
Maintain the technical roadmap (application lifecycle management).
Support field feature and service requests.
Propose and implement improvements to technical solutions and workflows, aligned with team and stakeholder needs.

Highly valued qualifications & experiences:

Experience with DC/OS.
Experience with zero-downtime technology introduction and data migration.
Passion for automated testing, qualification, and CI/CD pipelines.
Strong interest in networking issues.
Willingness to work remotely outside regular hours when necessary to build fail-safe systems (rarely).

Required qualifications & experiences: