Activez les alertes d’offres d’emploi par e-mail !

Senior HPC AI Engineer

TN France

France

À distance

EUR 60 000 - 100 000

Plein temps

Il y a 4 jours
Soyez parmi les premiers à postuler

Mulipliez les invitations à des entretiens

Créez un CV sur mesure et personnalisé en fonction du poste pour multiplier vos chances.

Résumé du poste

An established industry player is seeking a skilled HPC Engineer to join their innovative team focused on cutting-edge AI and HPC technologies. In this pivotal role, you will design and implement large-scale HPC clusters, collaborating with experts to enhance workflows and develop groundbreaking solutions. Your expertise in job scheduling, automation, and cloud platforms will be crucial in advancing the company's mission in artificial intelligence and GPU computing. If you're passionate about pushing the boundaries of technology and thrive in a dynamic environment, this opportunity is perfect for you.

Qualifications

  • 5+ years of experience in HPC and AI technologies.
  • Proficiency in job scheduling tools and Linux networking.

Responsabilités

  • Design and maintain large-scale HPC/AI clusters.
  • Develop CI/CD pipelines and automate infrastructure management.

Connaissances

HPC and AI solution technologies
Job scheduling and orchestration tools (Slurm, Kubernetes)
Python scripting
Bash scripting
Linux networking and security
Deep knowledge of networking protocols (InfiniBand, Ethernet)
Automation tools (Jenkins, Ansible, Puppet, Chef)
Storage solutions (Lustre, GPFS, ZFS, XFS)
Cloud platforms (AWS, Azure, Google Cloud)

Formation

Degree in Computer Science or Engineering

Outils

Kubernetes
Slurm
Jenkins
Ansible
Puppet
Chef
VMware
Hyper-V
KVM
Citrix

Description du poste

Social network you want to login/join with:

Job Description: HPC Engineer at NVIDIA

NVIDIA is seeking an experienced HPC Engineer to join the E2E software verification HPC/AI Infrastructure team. The team focuses on building supercomputers and HPC clusters utilizing groundbreaking technologies. We are looking for an outstanding architect for a senior HPC role, who will be a key contributor to cutting-edge computing hardware and software, advancing breakthroughs in artificial intelligence and GPU computing. The engineer will provide insights on system design and tuning for large-scale compute runs, working with the latest accelerated computing and deep learning platforms, collaborating with scientific researchers, developers, and customers to improve workflows and develop innovative solutions. The role involves interaction with HPC, OS, GPU compute, and system specialists to architect, develop, and deploy large-scale performance platforms.

Responsibilities:
  1. Design, implement, and maintain large-scale HPC/AI clusters with monitoring, logging, and alerting systems.
  2. Manage Linux job/workload scheduling and orchestration tools.
  3. Develop and maintain continuous integration and delivery pipelines.
  4. Create tooling for automating deployment and management of infrastructure, operational monitoring, alerting, and enabling self-service resource consumption.
  5. Deploy monitoring solutions for servers, network, and storage.
  6. Perform troubleshooting from hardware to application levels.
  7. Develop, redefine, and document standard methodologies for internal teams.
  8. Support R&D activities and participate in POCs/POVs for future enhancements.
Minimum Requirements:
  • A degree in Computer Science, Engineering, or a related field, with 5+ years of experience.
  • Knowledge of HPC and AI solution technologies, including CPU, GPU, high-speed interconnects, and supporting software.
  • Experience with job scheduling and orchestration tools like Slurm and Kubernetes.
  • Proficiency in Windows and Linux networking, security, and protocols.
  • Experience with storage solutions such as Lustre, GPFS, ZFS, and XFS, and familiarity with emerging storage technologies.
  • Python and Bash scripting skills.
  • Experience with automation and configuration tools like Jenkins, Ansible, Puppet, or Chef.
  • Deep knowledge of networking protocols such as InfiniBand and Ethernet.
  • Experience with virtual systems (VMware, Hyper-V, KVM, Citrix).
  • Familiarity with cloud platforms like AWS, Azure, or Google Cloud.
Preferred Qualifications:
  • Knowledge of CPU and GPU architectures.
  • Experience with Kubernetes and container microservices.
  • Experience with GPU hardware/software (e.g., DGX, CUDA).
  • Background with RDMA fabrics (InfiniBand or RoCE).

We are an equal opportunity employer committed to diversity. We do not discriminate based on race, religion, color, national origin, sex, gender, sexual orientation, age, marital status, veteran status, or disability. Reasonable accommodations are provided for individuals with disabilities during the application and interview process. Please contact us to request accommodations.

Obtenez votre examen gratuit et confidentiel de votre CV.
ou faites glisser et déposez un fichier PDF, DOC, DOCX, ODT ou PAGES jusqu’à 5 Mo.