
Activez les alertes d’offres d’emploi par e-mail !
Générez un CV personnalisé en quelques minutes
Décrochez un entretien et gagnez plus. En savoir plus
Une institution de recherche en informatique recherche un candidat pour une thèse de doctorat axée sur le minage et la complétion de graphes de connaissances dans le cadre du projet MetaboLinkAI. Le candidat doit posséder un Master en Informatique et des compétences en normes du web sémantique, ainsi qu'une motivation pour la recherche. Ce poste inclut des avantages tels que des repas subventionnés, des congés annuels généreux, et des possibilités de télétravail après 6 mois.
This PhD thesis takes place within the MetaboLinkAI ANR‑SNF project, which aspires to revolutionize the analysis and interpretation of metabolomics data through a multidisciplinary approach that combines a comprehensive knowledge graph hub (MetaKH) with cutting‑edge artificial intelligence (AI) and machine learning (ML) techniques. The project's main goals are to enhance the querying and ease of use of metabolomics data, improve research efficiency, and stimulate creativity in the field. These objectives are set to surpass current standards by creating an encyclopedic and expandable knowledge base, integrating advanced AI to handle the uncertainties of experimental data, and enabling a broader range of hypothesis testing and evaluation.
Within this project, we will focus on developing innovative methodologies and tools, such as graph mining methods, to enhance data interaction, analysis capabilities, and representation of uncertainty.
One distinctive peculiarity of metabolomics data (and thus MetaKH) is incompleteness, variable confidence and inherent uncertainty. Here, we adopt AI to enhance the completeness and reliability of the KG and to correctly account for uncertainty.
Because of the uncertain nature of metabolomics data and associated knowledge, MetaKH will be largely incomplete and partly incorrect. Therefore, it will be crucial to develop a comprehensive computational framework to enhance the quality, completeness and validity to eventually increase the quality of any processing using MetaKH. We propose to adapt heuristic methods and algorithms to discover/induce topological motifs, axioms (OWL), rules (SWRL or SPARQL) or shapes (SHACL) from knowledge graphs (TBox construction/refinement). These will account for the possible uncertainty of knowledge represented in the ABox ( WP3.2). Expert‑in‑the‑loop techniques will also be considered. We will design algorithms and data structures to allow KG queries at different levels of data granularity. The methods will exploit heuristics derived from expert knowledge in combination with semi‑succinct and, where needed, approximated data structures. In parallel, we will work on methods for knowledge graph completion, correction and enrichment, to enhance quality and content (ABox refinement). The developed methods will combine deductive reasoning (including analogic), SHACL validation, and link prediction and retraction based on KG embeddings. They will take into account the uncertainty of knowledge as defined in WP3.2. Evaluation will be done by measuring the improvement of KG completeness and validity, and the effectiveness of reasoning by corrupting the KG by adding/removing/perturbing some edges, applying completion/inference/querying, and assessing the impact in comparison with the original KG.
The objective is to develop and integrate a sophisticated framework into semantic web standards for formal representation and reasoning of uncertainty (both ontic and epistemic) in MetaKH, improving data confidence and decision‑making processes. Initially, we will review literature to identify adequate models to represent ontic uncertainty (certainly probability theory) and epistemic uncertainty (e.g. possibility theory, Dempster‑Shafer theory) adequate to represent mass spectrometry observations and metabolomic knowledge. Based on such models, we will propose extensions to Semantic Web standards to express uncertainty, provenance, and temporality metadata, facilitating richer data interpretation and trustworthiness. We will develop algorithms to integrate uncertainty in querying, deduction and embedding in KGs. We will establish criteria for using KGs based on uncertainty and provenance metadata, as well as other types of metadata, enabling users and agents to make informed decisions regarding trust and data application. Algorithms developed in WP3.1 will be extended to integrate uncertainty. Finally, we plan to implement mechanisms for evaluating KG completeness, validity, and reasoning under uncertainty, incorporating expert feedback and adapting methodologies based on provenance and other metaknowledge types.
This thesis will start with a state of the art of the different domains involved, in particular graph-based knowledge representation, KG mining, uncertainty representation and management in KG.
The PhD student is expected to first address computational approaches for MetaKH mining and completion, and then extend these approaches considering the inherent uncertainty of some knowledge in MetaKH, and of the mining approaches and their results.
Expected deliverables are:
The candidate must hold a Master degree in Informatics / Computer science and must demonstrate aptitudes or matches with most of the following aspects:
Gross Salary: 2300 €per month