Aktiviere Job-Benachrichtigungen per E-Mail!

Site Reliability Engineer

ZipRecruiter

Bielefeld

Remote

EUR 65.000 - 85.000

Vollzeit

Vor 12 Tagen

Erstelle in nur wenigen Minuten einen maßgeschneiderten Lebenslauf

Überzeuge Recruiter und verdiene mehr Geld. Mehr erfahren

Starte ganz am Anfang oder importiere einen vorhandenen Lebenslauf

Zusammenfassung

A leading company specializing in semiconductor chip-making equipment is seeking a professional to troubleshoot and enhance their distributed computing systems. The successful candidate will improve system reliability, contribute knowledge across teams, and work with advanced automation tools to deliver high-performance solutions. This role emphasizes a deep understanding of intricate networking issues and requires availability for out-of-hours support in exceptional situations.

Qualifikationen

  • Experience with distributed computing systems required.
  • Knowledge of build and release infrastructure is necessary.
  • Familiarity with CI/CD processes is a plus.

Aufgaben

  • Create awareness in teams to prevent repetitive help requests.
  • Contribute to improving the reliability of the VCP system.
  • Resolve bugs and implement regression tests.

Kenntnisse

Networking issues
Automated testing
Continuous Integration/Continuous Deployment (CI/CD)
Knowledge of distributed computing systems
Scripting (Python)

Tools

Maven
Nexus
Bamboo
Github
Ansible
Linux

Jobbeschreibung

Job Description

Our client is one of the world’s leading manufacturers of semiconductor chip-making equipment. A majority of the world’s microchips receive their critical lithographic patterning in machines made by this organisation. In addition, they produce metrology tools and advanced applications to analyze and optimize the performance of the customer production process.

Job Mission

Troubleshoot short-term problems and translate, develop into structural improvements on our distributed data and compute platform infrastructure. Be accurate, be precise and help drive up the aggregate availability of the installs of these distributed computing systems in Korea, Taiwan, Israel, China and the US (etc.). Be part of the computing platform that is one of the main pillars under the production of the next- microchips of Apple, Samsung and many others.

Responsibilities:

  • Create awareness in other teams about methods and procedures we use to help them to prevent repetitive help requests.
  • Help application developers to understand the infrastructure / cluster / system
  • “We are the team that is in charge of understanding & explaining how the system fits into the customer’s ecosystem”
  • Share knowledge / mindset to other teams (dev/infra engineers)
  • Cross functional, share knowledge between infra engineers
  • Contribute towards building VCP as a Product which meets our standards of quality
  • Increase stability and reliability of VCP by automated testing and automation
  • Customer satisfaction and product reliability
  • Improve the functionality and reliability of VCP
  • Translate customer ecosystem needs to engineering deliverables
  • Find the broken pieces of the puzzle at system/cluster level
  • Combination of individual ‘stories’ in a complete book
  • Make the VCP reliable by improving system resilience (bug-fixing and beyond)
  • Resolve bugs in a sustaining way (implement regression test, design structural fixes)
  • Ambassador of predictable component lifecycle management
  • Technical roadmap maintenance (App life cycle management)
  • Support feature and service request from the field
  • Suggest improvements to our technical solutions and way of working, and implement them in alignment with your team and their stakeholders

Highly valued qualifications & experiences:

  • Experience with DC/OS
  • Experience with new technology introduction @ zero downtime including data migration
  • Fan of automatic testing and qualification, if can be part of CI/CD pipeline.
  • Affinity to dig deep into the details of networking issues
  • Available to work (remotely) outside regular office hours when it proves that attempt to build a fail-safe system was not yet successful. We really want this to be an exception, not a rule.

Required qualifications & experiences:

  • Knowledge of distributed computing systems, practical experience (must!)
  • Experienced in build and release infrastructure, Maven, Nexus, Bamboo, Github
  • Familiar with at least one scripting (Python)
  • Experience with Ansible
  • Linux expert
Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.