Job Description
Our client is one of the world’s leading manufacturers of semiconductor chip-making equipment. A majority of the world’s microchips receive their critical lithographic patterning in machines made by this organization. In addition, they produce metrology tools and advanced applications to analyze and optimize the performance of the customer production process.
Job Mission
Troubleshoot short-term problems and develop structural improvements on our distributed data and compute platform infrastructure. Ensure accuracy and precision to increase the availability of these distributed computing systems across Korea, Taiwan, Israel, China, and the US. Be part of the computing platform that is a key component in the production of microchips for companies like Apple, Samsung, and others.
Responsibilities:
- Create awareness in other teams about methods and procedures to help prevent repetitive help requests.
- Assist application developers in understanding the infrastructure, clusters, and systems.
- Understand and explain how the system integrates into the customer’s ecosystem.
- Share knowledge and mindset with other teams (developers and infrastructure engineers).
- Contribute to building VCP as a quality product.
- Increase stability and reliability of VCP through automation and automated testing.
- Enhance customer satisfaction and product reliability.
- Improve the functionality and reliability of VCP.
- Translate customer ecosystem needs into engineering deliverables.
- Identify and resolve system/cluster-level issues.
- Combine individual tasks into comprehensive solutions.
- Improve system resilience to make VCP reliable, including bug fixing and structural improvements.
- Implement regression tests and structural fixes for bugs.
- Manage component lifecycle predictably.
- Maintain the technical roadmap (application lifecycle management).
- Support field feature and service requests.
- Propose and implement improvements to technical solutions and workflows, aligned with team and stakeholder needs.
Highly valued qualifications & experiences:
- Experience with DC/OS.
- Experience with zero-downtime technology introduction and data migration.
- Passion for automated testing, qualification, and CI/CD pipelines.
- Strong interest in networking issues.
- Willingness to work remotely outside regular hours when necessary to build fail-safe systems (rarely).
Required qualifications & experiences:
- Practical experience with distributed computing systems.
- Experience with build and release tools: Maven, Nexus, Bamboo, Github.
- Proficiency in at least one scripting language (e.g., Python).
- Experience with Ansible.
- Expertise in Linux.