Activez les alertes d’offres d’emploi par e-mail !

GPU Engineer

Kog AI

Paris

Hybride

EUR 80 000 - 100 000

Plein temps

Il y a 10 jours

Générez un CV personnalisé en quelques minutes

Décrochez un entretien et gagnez plus. En savoir plus

Résumé du poste

A European technology startup is seeking experienced individuals to implement and optimize AI models at the assembly level on GPUs. The role requires expertise in C++, a focus on performance optimization, and a collaborative mindset. Competitive compensation is offered along with equity participation, and the position is based in Paris with a hybrid working model, allowing for flexibility and collaboration in a fast-paced environment.

Prestations

Top-tier compensation

Equity participation

High-end technical environment

Flexible work location

Collaborative team culture

Qualifications

5+ years of experience in relevant field.
Strong capability in low-level GPU programming.
Track record of side projects demonstrating passion for AI.

Responsabilités

Implement cutting-edge AI models in low-level C++ and Assembly.
Reverse-engineer GPU features for performance optimization.
Optimize the Kog inference engine for faster AI inference.

Connaissances

Low-level C++ programming

Experience with AMD and NVIDIA GPUs

Proficiency in CUDA or ROCm

Ability to reverse-engineer GPU features

Team player attitude

Formation

PhD or degree from a top engineering school

KOG:

Kog is a European VC-funded startup and real-time AI frontier lab building the world’s fastest AI execution layer, part of the 2030 French Tech cohort.

We are not just optimizing existing libraries; we are bypassing inefficient abstraction layers to rewrite the rules of AI inference. By coding at the Assembly level on high-end GPUs (starting with the AMD MI300X), we unlock raw performance that standard stacks leave on the table.

Our Mission: To enable true real-time AI. We are targeting 10x performance gains through a combination of low-level GPU mastery and novel model architecture. Our goal is to build the sovereign infrastructure that will power the next generation of collaborative AI agents.

Why join now? We have already achieved a 3x to 10x speedup compared to state-of-the-art alternatives (vLLM, TensorRT-LLM) by making breakthroughs in:

Inter-GPU communication & Grid synchronization
Aggressive Kernel fusion
Low-level Memory Access Optimization

We’ve built an inference engine optimized at the Assembly level, bypassing the inefficient abstraction layers, and we’ve made significant advancements in several areas:

Inter-GPU communication
Kernel fusion
Grid synchronization
Memory access optimization

The inference engine offers speed improvements 3 to 10 times greater compared to the best GPU alternatives, starting with AMD MI300X.

What you’ll do:

We wish to strengthen our world-class team with technically brilliant individuals who want to take on this challenge. Your missions will include:

Implementing cutting-edge AI models in low-level C++ code and Assembly on high-end AMD and NVIDIA GPUs
Reverse‑engineering subtle GPU features (such as memory page mappings, memory channels, hash functions, cache behaviors, credit assignment logic, etc.)
Leveraging this knowledge to find and implement creative optimization ideas
Optimizing the Kog inference engine to make AI inference incredibly fast (10x compared to vLLM, SGLang, or TensorRT‑LLM—we are already at 3x!)

Who we’d like to work with :

World‑class talents with 5+ years of experience
Proficiency in CUDA or ROCm
Start‑up mindset
Team player attitude
PhD or Top Engineering Schools
Someone who has side projects or shows great passion and interest

What we offer:

Top‑Tier Compensation: We offer a highly competitive salary package (top of the market) tailored to match your expertise and leadership level.
Real Ownership (BSPCE): You aren’t just an employee; you are a partner. We offer significant equity to ensure you share in the startup’s success.
Unrivaled Technical Playground: Work on the bleeding edge of AI hardware. You will have access to the compute power you need (high‑end clusters) to perform your magic.
A world‑class Environment: Join a high‑density talent team of 12 engineers (including 5 PhDs). We value peer‑to‑peer learning, high autonomy, and zero bureaucracy.
Impact & Autonomy: As a Lead, you will have a direct seat at the table to shape our engineering culture and roadmap alongside the CEO.
Prime Location & Flexibility: WeWork offices in the 13th district (near Station F), the heart of Paris’ tech scene. We operate with a hybrid model, punctuated by our "Paris Weeks" for deep work and team bonding (and great afterworks!).