Role Overview
As a High‑Performance Computing (HPC) Principal Engineer, you will serve as the technical authority and strategic leader for Elanco's entire HPC ecosystem. Reporting to the global TechOps team, you will shape the architectural vision, design, and long‑term roadmap for the computational platforms that power our most critical research and development efforts. This is a seasoned expert role that involves mentoring a team, influencing stakeholders, and building the next generation of scientific computing at Elanco.
Responsibilities
- Shape the design, architecture, and strategic evolution of Elanco’s HPC, storage, and networking infrastructure to meet future research demands.
- Evaluate emerging technologies, conduct proof‑of‑concept projects, and build business cases for new investments to keep Elanco at the cutting edge of scientific computing.
- Act as a senior mentor and technical escalation point for other engineers and support staff, fostering technical excellence and knowledge sharing within the team.
- Design, deploy, configure, and maintain Elanco’s HPC clusters and associated storage and networking infrastructure.
- Proactively monitor system performance, troubleshoot bottlenecks, and tune the environment to ensure optimal efficiency and resource utilization.
- Act as the primary technical contact for our research and scientific user base, providing support, training, and guidance on how to best leverage HPC resources.
- Develop and maintain scripts and automation tools to streamline system administration, job scheduling, and monitoring tasks.
- Manage and configure job scheduling systems to ensure fair and efficient allocation of computational resources.
- Implement and maintain security best practices to protect sensitive data and ensure the integrity of the HPC environment.
- Collaborate with stakeholders to forecast future computing needs and contribute to the strategic planning and evolution of Elanco’s HPC.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
- Ability to align technical strategy with business goals, develop multi‑year roadmaps, and justify major technology investments.
- Deep expertise in Linux/Unix system administration in a large‑scale environment.
- Broad experience with HPC cluster management, including job schedulers and parallel file systems.
- Exceptional scripting skills for automation, particularly in Python and Bash.
- Solid understanding of high‑speed networking fabrics such as InfiniBand or Omni‑Path.
- Familiarity with Public Cloud services, specifically Microsoft Azure and Google Cloud Platform (GCP), as well as server, storage, and networking hardware common in HPC environments.
- Proven experience with DevSecOps concepts and tooling, including Continuous Integration/Continuous Delivery (CI/CD), Git SCM, containerisation (Docker, Kubernetes), and Infrastructure‑as‑Code (HashiCorp Terraform).
- Excellent analytical and troubleshooting skills, with the ability to diagnose and resolve complex technical issues efficiently.
- Strong interpersonal and communication skills, with a customer‑centric approach to supporting a diverse scientific user community.
- Proven experience leading complex technical projects and mentoring junior and senior engineers.
Additional Information
- Travel: 0–10%
- Location: Hook, UK – Hybrid Work Environment