Activez les alertes d’offres d’emploi par e-mail !
Générez un CV personnalisé en quelques minutes
Décrochez un entretien et gagnez plus. En savoir plus
A PhD opportunity at the Institut Montpelliérain Alexander Grothendieck focuses on cancer evolution through high-throughput sequencing data analysis. Candidates should possess strong programming skills in R, C/C++, or Python, with a strong background in probability and statistics, particularly as applied to biological data.
Organisation/Company CNRS Department Institut Montpelliérain Alexander Grothendieck Research Field Biological sciences Computer science Mathematics Researcher Profile First Stage Researcher (R1) Country France Application Deadline 10 Jul 2025 - 23:59 (UTC) Type of Contract Temporary Job Status Full-time Hours Per Week 35 Offer Starting Date 1 Oct 2025 Is the job funded through the EU Research Framework Programme? Not funded by a EU programme Is the Job related to staff position within a Research Infrastructure? No
The PhD will take place at the Institut Montpelliérain Alexander Grothendieck (IMAG) in Montpellier, in collaboration with MAP5 in Paris. It will be supervised by Gilles Didier (IMAG) and Paul Bastide (MAP5), with collaboration from Alice Cleynen (IMAG) and Sophie Lèbre (IMAG).
The project is part of the ANR IdenTHiC (Identification of Tumor History at the Clone level) program, which focuses on the study of clinical data from cancer patients to support diagnosis.
Developing the simulation tool requires strong programming skills, particularly in R, C/C++, or Python. The study of the model involves expertise in probability and statistics, and benefits from an interest in biological applications.
During cancer progression, various mutations accumulate in cancer cells, generating multiple cellular lineages that coexist within a given tumor. The objective of this project is to study the evolutionary history of a tumor based on high-throughput sequencing data called "bulk" sequencing, which consists of mixed cells from the tumor.
These data are complex due to both biological and technical reasons. Biologically, cancer evolution involves numerous processes that induce mutations, structural alterations in certain genomic regions in some cells, as well as changes in tumor size. Technically, high-throughput sequencing does not provide complete genome sequences but rather a very large number of small fragments, called "reads," which are mapped onto a reference sequence for analysis. In bulk sequencing, where many cells are sequenced together, it is not possible to directly assign each read to its originating cell.
The main goal of the thesis is to reconstruct the tumor's cellular composition history of a patient from longitudinal biopsy samples taken at multiple time points and sequenced. The proposed approach relies on developing a stochastic model of bulk sequencing data from a tumor. This model naturally decomposes into two main parts.
First, a birth-and-death process (modeling cell division and death), coupled with a Poisson process (modeling mutations), will be used to describe the evolution of the number of cells in each lineage and the emergence of new lineages. Conditionally on the size of these lineages, the second part models the sampling of tumor cells and their high-throughput sequencing, which generates the observed set of reads.
Initially, this model can be used to simulate sequencing data under various biological hypotheses to test the robustness and accuracy of existing reconstruction methods such as Pairtree [3] or CALDER [2].
Subsequently, the objective will be to compute the likelihood of bulk sequencing data under this model, in order to propose a new statistical inference method, for instance by adapting the approach of [1] for the first part of the model.
[1] Didier, Laurin. 2020. Systematic Biology. 69:1068–1087.
[2] Myers, Satas, Raphael. 2019. Cell systems. 8:514–522.
[3] Wintersinger, Dobson, Kulman, et al. 2022. Blood Cancer Discovery. 3:208–219