
Activez les alertes d’offres d’emploi par e-mail !
Générez un CV personnalisé en quelques minutes
Décrochez un entretien et gagnez plus. En savoir plus
A leading engineering school in Écully seeks a postdoctoral researcher in computer science to research heterogeneous quantization of AI models. This role requires proficiency in deep learning and frameworks like PyTorch. Candidates should have a PhD in computer science or a related field and be able to apply advanced techniques to real-world problems. The position starts in April or May 2026 for a duration of 12 months with a focus on designing custom hardware architecture for efficient execution of quantized models.
Heterogeneous compression of AI models
Efficient modern AI approaches heavily rely on Deep Neural Networks (DNN), also known as deep learning. They are used in multiple application domains (industrial production, entertainment, security, etc.) for solving complex issues related to computer vision, natural language processing… Recent AI models are very powerful, but they are composed of millions or even billions of parameters so that they can be costly to train but also to use at inference time. It is the reason why several techniques have been designed to reduce this cost, such as pruning part of the model weights [1] or changing value precision by quantization [2].
Quantization is a widely‑used technique to reduce the memory footprint, computational cost and power consumption of deep neural networks by lowering the precision of weights and activations (e.g., from 32‑bit floating point to 8‑bit integer or even fewer bits). Traditional quantization methods tend to apply a uniform precision (bit‑width) and uniform quantization scheme across all layers or all parameters of the network, such as the GPTQ algorithm [3]. In contrast, heterogeneous quantization (also called mixed‑precision) means that different parts of the network (different layers, different channels, even individual parameters) can be assigned different precisions or different quantization schemes according to their sensitivity, distribution of values, or hardware needs [4]. This more fine‑grained approach enables more aggressive compression (lower bits where tolerable) while preserving accuracy where it matters.
Despite the theoretical benefits of the heterogeneous quantization framework, a limitation is due to the hardware (HW) architecture used to deploy the quantized model. Indeed, HW architectures are designed to manage a set of pre‑defined data precision and types (e.g., integer 4 bit, integer 8 bit, etc.). This includes the memory‑processing unit data transfers and arithmetic circuits, meaning that, for custom precisions (e.g., a priori n bit data), conversions and cast operations have to be added eventually increasing the overhead of the overall implementation [5].
The goal of this post‑doc position is to investigate heterogeneous quantization according to efficiency and trustworthiness applied to a given AI model in order to derive requirements to design a custom hardware architecture.
We are seeking a postdoctoral researcher with a PhD in computer science or a closely related field, and a strong background in machine learning and deep learning. The ideal candidate should be proficient with modern frameworks and methodologies in computer vision and/or natural language processing, and capable of applying these techniques to complex, real‑world problems. A solid understanding of model architectures, training strategies, and evaluation methods is expected. The candidate should also be able to understand how the computations are done at the matrix level. Experience or familiarity with model compression and optimization techniques—such as pruning, quantization, or knowledge distillation—would be a significant advantage.
Post Doc is expected to start in April or May 2026, duration 12 months.
The Ph.D. candidate will be supervised by the LIRIS (expertise in Machine Learning) and INL (expertise in hardware architecture) teams in Lyon (Ecole Centrale Campus). The salary will follow standard French rates.
[1] Hassibi, Babak, David G Stork, and Gregory J Wolff (1993). « Optimal brain surgeon and general network pruning. » In: IEEE international conference on neural networks. IEEE, pp. 293–299
[2] Gupta, Suyog, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan (2015). « Deep learning with limited numerical precision. » In: International conference on machine learning. PMLR, pp. 1737–1746
[3] Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh (2023). “GPTQ: Accurate post‑training quantization for generative pre‑trained transformers”. In: ICLR
[5] Ali S. B., S. -I. Filip, O. Sentieys and G. Lemieux, "MPTorch‑FPGA: A Custom Mixed‑Precision Framework for FPGA‑Based DNN Training," 2025 Design, Automation & Test in Europe Conference (DATE), Lyon, France, 2025, pp. 1‑7, doi: 10.23919/DATE64628.2025.10993010.
[6] Jiao, Xiaoqi, et al. "TinyBERT: Distilling BERT for natural language understanding." Findings of the association for computational linguistics: EMNLP 2020. 2020.
The Ph.D. candidate will be supervised by the LIRIS (expertise in Machine Learning) and INL (expertise in hardware architecture) teams in Lyon (Ecole Centrale Campus). The recruited postdoc is expected to come to the lab physically at a daily basis. She or he will have its own desk and access to the computation facilities of the lab.