Activez les alertes d’offres d’emploi par e-mail !
Générez un CV personnalisé en quelques minutes
Décrochez un entretien et gagnez plus. En savoir plus
Une entreprise dynamique spécialisée dans les solutions cloud recrute un technicien HPC pour déployer et maintenir des clusters avec matériel Nvidia. Ce rôle clé implique l'automatisation, l'optimisation des outils et la gestion du cycle de vie des HPC, tout en encourageant la collaboration au sein de l'équipe. Si vous avez une passion pour la technologie et les pratiques SRE, venez rejoindre cette aventure enrichissante à Paris ou Lille.
Social network you want to login/join with:
Le poste
Our mission
Founded in 1999, Scaleway, the cloud of choice, helps developers and businesses to build, deploy, and scale applications to any infrastructure. Located in Paris, Amsterdam, and Warsaw, Scaleway’s complete cloud ecosystem is used by 25,000+ businesses, including European startups, who choose Scaleway for its multi-AZ redundancy, smooth developer experience, carbon-neutral data centers, and native tools for managing multi-cloud architectures.
With fully managed offerings for bare metal, containerization, and serverless architectures, Scaleway provides options for cloud computing, allowing customers to decide where their data resides, which architecture suits their needs, and to scale responsibly.
Our journey
We aim for all our actions to bring us closer to our vision: building and scaling technologies that matter to us, our customers, and end users. Scaleway is a challenger in the cloud space, and as we grow, serving a diverse, global customer base is key. We believe that incorporating diverse perspectives, knowledge, skills, and cross-cultural understanding is vital for delivering high value and performance.
Our values
About the job
We are deploying multiple HPC clusters with Nvidia hardware, aiming to be among the top 15 in the Top500 list. Reporting to our Engineering Manager Emerick Mounoury, you will ensure the deployment and health of HPC components. A strong background in HPC environments, system administration, DevOps, and SRE best practices is required. Our systems evolve, and so must our monitoring and resilience tools.
Profil recherché
Additional responsibilities include creating and optimizing tools, troubleshooting high-impact issues, on-call duties, ensuring high service quality, managing HPC cluster lifecycle, and implementing best practices for stability, resiliency, scalability, security, and performance.
Skills required include Python/Bash, MySQL, S3 API, Lustre, NAS, monitoring tools (Sentry, Prometheus, Grafana, ElasticSearch, Fluentd, Kibana), configuration management (Ansible, Salt), version control (GitLab, Nexus), Linux distributions (Ubuntu, Debian, CentOS), Nvidia hardware/software, MPI, Modules, AI software, Slurm, Kubernetes, and collaboration tools (Jira, Confluence, Slack, GSuite).
Location: Based in Paris or Lille, partial remote possible.
Recruitment Process: Screening call, technical interviews, HR interview, offer within 2-3 weeks. Don’t hesitate to apply even if you don’t meet all criteria.
To learn more about us