
Activez les alertes d’offres d’emploi par e-mail !
Générez un CV personnalisé en quelques minutes
Décrochez un entretien et gagnez plus. En savoir plus
A European technology startup is seeking experienced individuals to implement and optimize AI models at the assembly level on GPUs. The role requires expertise in C++, a focus on performance optimization, and a collaborative mindset. Competitive compensation is offered along with equity participation, and the position is based in Paris with a hybrid working model, allowing for flexibility and collaboration in a fast-paced environment.
KOG:
Kog is a European VC-funded startup and real-time AI frontier lab building the world’s fastest AI execution layer, part of the 2030 French Tech cohort.
We are not just optimizing existing libraries; we are bypassing inefficient abstraction layers to rewrite the rules of AI inference. By coding at the Assembly level on high-end GPUs (starting with the AMD MI300X), we unlock raw performance that standard stacks leave on the table.
Our Mission: To enable true real-time AI. We are targeting 10x performance gains through a combination of low-level GPU mastery and novel model architecture. Our goal is to build the sovereign infrastructure that will power the next generation of collaborative AI agents.
Why join now? We have already achieved a 3x to 10x speedup compared to state-of-the-art alternatives (vLLM, TensorRT-LLM) by making breakthroughs in:
Inter-GPU communication & Grid synchronization
Aggressive Kernel fusion
Low-level Memory Access Optimization
We’ve built an inference engine optimized at the Assembly level, bypassing the inefficient abstraction layers, and we’ve made significant advancements in several areas:
Inter-GPU communication
Kernel fusion
Grid synchronization
Memory access optimization
The inference engine offers speed improvements 3 to 10 times greater compared to the best GPU alternatives, starting with AMD MI300X.
We wish to strengthen our world-class team with technically brilliant individuals who want to take on this challenge. Your missions will include:
Implementing cutting-edge AI models in low-level C++ code and Assembly on high-end AMD and NVIDIA GPUs
Reverse‑engineering subtle GPU features (such as memory page mappings, memory channels, hash functions, cache behaviors, credit assignment logic, etc.)
Leveraging this knowledge to find and implement creative optimization ideas
Optimizing the Kog inference engine to make AI inference incredibly fast (10x compared to vLLM, SGLang, or TensorRT‑LLM—we are already at 3x!)
World‑class talents with 5+ years of experience
Proficiency in CUDA or ROCm
Start‑up mindset
Team player attitude
PhD or Top Engineering Schools
Someone who has side projects or shows great passion and interest
Top‑Tier Compensation: We offer a highly competitive salary package (top of the market) tailored to match your expertise and leadership level.
Real Ownership (BSPCE): You aren’t just an employee; you are a partner. We offer significant equity to ensure you share in the startup’s success.
Unrivaled Technical Playground: Work on the bleeding edge of AI hardware. You will have access to the compute power you need (high‑end clusters) to perform your magic.
A world‑class Environment: Join a high‑density talent team of 12 engineers (including 5 PhDs). We value peer‑to‑peer learning, high autonomy, and zero bureaucracy.
Impact & Autonomy: As a Lead, you will have a direct seat at the table to shape our engineering culture and roadmap alongside the CEO.
Prime Location & Flexibility: WeWork offices in the 13th district (near Station F), the heart of Paris’ tech scene. We operate with a hybrid model, punctuated by our "Paris Weeks" for deep work and team bonding (and great afterworks!).
Feel free to apply if you feel like you’re up to the task!