Job Search and Career Advice Platform

Activez les alertes d’offres d’emploi par e-mail !

(Freelance) Pre-Training Engineer

Kog AI

Paris

Hybride

EUR 60 000 - 80 000

Temps partiel

Aujourd’hui
Soyez parmi les premiers à postuler

Générez un CV personnalisé en quelques minutes

Décrochez un entretien et gagnez plus. En savoir plus

Résumé du poste

A European AI startup in Paris is looking for a hands-on engineer with recent experience in training neural networks on NVIDIA clusters with H100 GPUs. The role involves profiling pretraining runs, optimizing scheduling, and implementing practices to enhance performance on large GPU jobs. This freelance mission can lead to a full-time offer based on strong performance, with remote-friendly options and competitive compensation. Join a dynamic team and influence the engineering culture in a prime tech location.

Prestations

Top-tier compensation
Equity participation (BSPCE)
Access to high-end AI hardware
Strong talent team
Hybrid work model with team bonding

Qualifications

  • Recent hands-on experience with NVIDIA H100 GPUs and NCCL.
  • Strong focus on optimizing real training environments.
  • Not a research role; practical execution emphasized.

Responsabilités

  • Profile current LLM pretraining runs using NCCL and SLURM.
  • Critique and improve scheduling and monitoring practices.
  • Implement fixes to enhance stability and throughput.
  • Produce documentation enabling team self-sufficiency.
  • Leverage LLM expertise to boost training efficiency.

Connaissances

Hands-on experience training neural networks
Deep knowledge of SLURM
Expertise in profiling training workloads
Track record with LLM pre-training
Pragmatic engineering mindset
Description du poste
KOG

Kog is a European VC-funded startup and real-time AI frontier lab building the world’s fastest AI execution layer, part of the 2030 French Tech cohort.

We are not just optimizing existing libraries; we are bypassing inefficient abstraction layers to rewrite the rules of AI inference. By coding at the Assembly level on high-end GPUs (starting with the AMD MI300X), we unlock raw performance that standard stacks leave on the table.

Our Mission: To enable true real-time AI. We are targeting 10x performance gains through a combination of low-level GPU mastery and novel model architecture. Our goal is to build the sovereign infrastructure that will power the next generation of collaborative AI agents.

Why join now? We have already achieved a 3x to 10x speedup compared to state-of-the-art alternatives (vLLM, TensorRT-LLM) by making breakthroughs in:

  • Inter-GPU communication & Grid synchronization
  • Aggressive Kernel fusion
  • Low-level Memory Access Optimization
Context

We have access to a large NVIDIA H100 training cluster with 200+ GPUs. Our immediate priority is to optimize LLM pretraining efficiency on this cluster. We are competent with our current setup, but not yet at the level we need. We want a hands‑on engineer who already runs pretraining at scale elsewhere and can quickly profile, correct, and document best practices so our team can execute independently.

Missions
  • Profile current LLM pretraining runs on an NVIDIA H100 cluster using NCCL and SLURM.
  • Critique and improve launch, scheduling, data and model parallelism, checkpointing, fault tolerance, and monitoring practices.
  • Implement targeted fixes to improve stability, throughput, and cost efficiency on 200+ GPU jobs.
  • Produce clear documentation and runbooks enabling the team to sustain improvements.
  • Bonus: bring practical LLM expertise that improves pretraining efficiency or quality through better model or training architecture choices.
Profile
  • Recent hands‑on experience training neural networks on NVIDIA clusters with H100 GPUs and NCCL.
  • Deep knowledge of SLURM for large multi-node jobs at scale.
  • Strong expertise in end‑to‑end profiling of training workloads and removing bottlenecks.
  • Proven track record with LLM pre‑training on large clusters of several hundred GPUs.
  • Pragmatic engineering mindset focused on launching, optimizing, and monitoring real training.
  • Not a research role. We value practical, production‑grade execution over theory.
  • You have done this elsewhere and can step in immediately.
Contract
  • Freelance mission from 1 day up to 2 weeks.
  • Remote‑friendly.
  • Attractive day rate.
  • Strong performance can lead to a full‑time offer.
What we offer
  • Top‑Tier Compensation: We offer a highly competitive salary package (top of the market) tailored to match your expertise and leadership level.
  • Real Ownership (BSPCE): You aren't just an employee; you are a partner. We offer significant equity to ensure you share in the startup's success.
  • Unrivaled Technical Playground: Work on the bleeding edge of AI hardware. You will have access to the compute power you need (high‑end clusters) to perform your magic.
  • A world‑class Environment: Join a high‑density talent team of 12 engineers (including 5 PhDs). We value peer‑to‑peer learning, high autonomy, and zero bureaucracy.
  • Impact & Autonomy: As a Lead, you will have a direct seat at the table to shape our engineering culture and roadmap alongside the CEO.
  • Prime Location & Flexibility: WeWork offices in the 13th district (near Station F), the heart of Paris' tech scene. We operate with a hybrid model, punctuated by our "Paris Weeks" for deep work and team bonding (and great afterworks!).
Feel free to apply if you feel like you're up to the task!
Obtenez votre examen gratuit et confidentiel de votre CV.
ou faites glisser et déposez un fichier PDF, DOC, DOCX, ODT ou PAGES jusqu’à 5 Mo.