Research

Publications

Publications in journals with peer-reviewing

J5. Definition and Exploration of Realistic Chemical Spaces Using the Connectivity and Cyclic Features of ChEMBL and ZINC

Thomas Cauchy, Jules Leguy, Benoit Da Mota
2023 Digital Discovery
doi logo DOi : 10.1039/D2DD00092J   scimago logo scimago : Rank Q1 (chemistry)   github logo github : BenoitDamota/gcf   figshare logo figshare : data

J4. Scalable estimator of the diversity for de novo molecular generation resulting in a more robust QM dataset (OD9) and a more efficient molecular optimization

Jules Leguy, Marta Glavatskikh, Thomas Cauchy, Benoit Da Mota
2021 Journal of Cheminformatics
doi logo DOi : 10.1186/s13321-021-00554-8   scimago logo scimago : Rank Q1 (CS applications)   github logo github : jules-leguy/EvoMol   figshare logo figshare : data

J3. EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation

Jules Leguy, Thomas Cauchy, Marta Glavatskikh, Béatrice Duval, Benoit Da Mota
2020 Journal of Cheminformatics
doi logo DOi : 110.1186/s13321-020-00458-z   scimago logo scimago : Rank Q1 (CS applications)   github logo github : jules-leguy/EvoMol

J2. MixONat, a Software for the Dereplication of Mixtures Based on 13C NMR Spectroscopy

Antoine Bruguière, Séverine Derbré, Joël Dietsch, Jules Leguy, Valentine Rahier, Quentin Pottier, Dimitri Bréard, Sorphon Suor‑Cherer, Guillaume Viault, Anne‑Marie Le Ray, Frédéric Saubion, Pascal Richomme
2020 Analytical Chemistry
doi logo DOi : 10.1021/acs.analchem.0c00193   scimago logo scimago : Rank Q1 (analytical chemistry)   sourceforge logo sourceforge : mixonat

J1. Dataset’s chemical diversity limits the generalizability of machine learning prediction

Marta Glavatskikh, Jules Leguy, Gilles Hunault, Thomas Cauchy, Benoit Da Mota
2019 Journal of Cheminformatics
doi logo DOi : 10.1186/s13321-019-0391-2   scimago logo scimago : Rank Q1 (CS applications)   figshare logo figshare : data

Publications in conferences with peer-reviewing

C2. Surrogate‑Based Black‑Box Optimization Method for Costly Molecular Properties

Jules Leguy, Béatrice Duval, Benoit Da Mota, Thomas Cauchy
2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)
doi logo DOi : 10.1109/ICTAI52525.2021.00124   ↕️ Conference Ranks : Rank A2 Qualis/B ERA   github logo github : jules-leguy/BBOMol

C1. Des réseaux de neurones pour prédire des distances interatomiques extraites d’une base de données ouverte de calculs en chimie quantique

Jules Leguy, Thomas Cauchy, Béatrice Duval, Benoit Da Mota
2019 Extraction et Gestion des connaissances, EGC 2019, Metz, France (National-level conference)
🥇 Award of the best application article
🔗 URL : RNTI   ↕️ Conference Ranks : Rank C ERA

Publications of book chapters with peer-reviewing

B2. Goal‑directed generation of new molecules by AI methods" (Chapter 2)

Jules Leguy, Thomas Cauchy, Béatrice Duval, Benoit Da Mota
2022 Computational and Data‑Driven Chemistry Using Artificial Intelligence
doi logo DOi : 10.1016/B978-0-12-822249-2.00004-9

B1. Predicting Interatomic Distances of Molecular Quantum Chemistry Calculations" (long version of [C1])

Jules Leguy, Thomas Cauchy, Béatrice Duval, Benoit Da Mota
2022 Advances in Knowledge Discovery and Management: Volume 9. Studies in Computational Intelligence. Springer International Publishing
doi logo DOi : 110.1007/978-3-030-90287-2_8

Publications in conference workshops with light peer-reviewing

W1. Génération d’explications contre‑factuelles pour la chimie moléculaire

Jules Leguy, Bryan Garreau, Thomas Cauchy, Benoit Da Mota, Béatrice Duval
2022 Workshop EXPLAIN’AI hosted at EGC 2022, Blois, France (National-level conference)
HAL logo HAL open archive: hal-04034150   ↕️ Conference Ranks : EGC Rank C ERA

Posters

Po1. Surrogate-based black-box framework to optimize electronic properties for de novo organic molecular materials

Jules Leguy, Thomas Cauchy, Béatrice Duval, Benoit Da Mota
2021 4th RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Symposium (Virtual international event)
Also presented at 2022 Symposium GDR Madics (National research event in France)
🔗 URL : poster

Preprints

P1. WebXAII: An Open-Source Web Framework to Study Human-XAI Interaction

Jules Leguy, Pierre-Antoine Jean, Felipe Torres Figueroa, Sébastien Harispe
2025 arXiv:2506.14777 [cs.HC]
doi logo DOi : 10.48550/arXiv.2506.14777   github logo github : PAJEAN/WebXAII

PhD thesis


My Ph.D thesis manuscript entitled "Combinatorial search lead by machine learning for molecular chemistry" (2022) is available on HAL logo HAL (in french). The defense slides can be found here.

Contributions to research projects

XAI and Human-XAI interaction (postdoctoral work, current)

I am studying the interaction of human operators and XAI techniques in order to assess the impact of XAI on human-machine collaboration. There has been a lot of work in the domain in the last decade, but it is yet unclear how XAI techniques compare to each other, and only very few of them have demonstrated a beneficial effect at helping a human end-user solve a task. I am working on the evaluation of various techniques in a controlled setting, in experiments involving human participants. As a first step, I lead the developement of a web-interface which can embody experimental protocols and collect participants' answers for human-XAI studies [P1].

Optimization and machine learning for molecular discovery (PhD work, 2019-2022)

The purpose of this research project is to develop methods for the automatic generation of molecules that satisfy desired properties, with a focus on the chemistry of organic molecular materials. I proposed an evolutionary algorithm named EvoMol, which is designed to be interpretable and generic so that it can be used in various subdomains of chemistry [J3]. As evolutionary algorithms tend to generate unrealistic molecules, I also worked on a filter-based approach which favors the generation of realistic molecules [J5].

Most properties of interest in the field of organic molecular materials depend on costly quantum chemistry computations (DFT calculations). This motivates the use of machine learning algorithms as fast estimators of these properties. I worked on machine learning methods predicting the DFT-optimized geometry of molecules, which is closely related to the electronic properties of interest [C1, B1]. I worked with a postdoctoral researcher to measure the importance of chemical diversity in the training datasets of machine learning models. We demonstrated that a lack of chemical diversity in the training data (including in a state-of-the-art dataset) can signifantly impair model performance [J1]. We further proposed an efficient method based on EvoMol to maximize various measures of chemical diversity, which we used to obtain a large and diverse dataset of molecules [J4].

I also proposed to combine an optimization method with a machine learning model, in the form of a surrogate-based black-box optimization method. I showed that this approach is more efficient than an evolutionary search for the optimization of a costly electronic property [C2, Po1]. The use of ML models for molecular chemistry raises questions about their interpretability. I proposed an approach based on EvoMol to generate counterfactual explanations to any binary classification model of molecules [W1].

I also published a review of the state of the art of the field of de novo molecular generation [B2].

Dereplication in vegetal-based chemistry (2019)

I worked with a group of scientists in vegetal-based chemistry during my MSc studies. My role was to improve a pre-existing tool using NMR spectrum for the identification of compounds in a mixture. I performed a refactoring of the source code and I formalized the matching algorithm and improved its efficiency [J2].