Developing a molecule from the ground up is one of chemistry’s most formidable challenges, involving more than just connecting atoms—it requires knowing the right reaction sequence, safeguarding delicate molecular parts, and sidestepping pitfalls that can derail months of lab efforts.
Traditionally, this expertise resides in the minds of seasoned chemists. A new initiative at EPFL aims to codify it within a language model.
Under Philippe Schwaller’s leadership, researchers published a paper this week in Matter detailing Synthegy, an innovative framework that employs large language models as reasoning engines for planning chemical syntheses. The core idea is nuanced yet significant: instead of having AI generate molecules, the team uses AI to assess synthesis pathways generated by conventional software.
The process begins with chemists inputting their objectives in plain English, like “construct the pyrimidine ring early.” Retrosynthesis software then generates numerous potential synthesis routes. Synthegy transforms each route into text and presents it to an LLM, which evaluates how well each aligns with the chemist’s instructions. The most suitable options are highlighted with explanations.
“The user interface is crucial in tools for chemists; previous tools depended on cumbersome filters and rules,” stated lead author Andres M. Bran in a comment from EPFL.
In a double-blind study involving 36 independent chemists reviewing 368 route pairs, Synthegy’s selections matched those of the human experts 71.2% of the time—a rate comparable to expert agreement levels. Senior researchers (professors and research scientists) concurred with Synthegy more frequently than PhD students, indicating that the system mimics the strategic insights gained through experience.
The team explored various AI models, including GPT-4o, Claude, and DeepSeek-r1. While AI has long been penetrating drug discovery, most strategies utilize narrowly trained models for specific tasks. Synthegy’s modular design allows it to integrate with any retrosynthesis engine in the backend and any capable LLM on the reasoning side. Gemini-2.5-pro topped the benchmark, while DeepSeek-r1 emerged as a strong open-source alternative that can run locally.
Synthegy also addresses reaction mechanism elucidation—the analysis of why chemical reactions occur through electron movements at each step. It breaks down reactions into basic moves and has LLMs evaluate each for plausibility. On straightforward reactions like nucleophilic substitutions, top models achieved near-perfect accuracy.
Potential applications are extensive, with drug discovery being the most apparent. AI has already shown promise in predicting cancer treatment outcomes, but this method is applicable anywhere chemists need to design new materials or refine industrial processes. Evaluating 60 candidate routes with Synthegy takes about 12 minutes and incurs API fees of approximately $2–3.
The paper recognizes existing limitations: LLMs occasionally misinterpret reaction directions in their text representations, leading to incorrect feasibility assessments. Smaller models perform no better than random guesses. Routes exceeding 20 steps become difficult to follow coherently.
The code and benchmarks are accessible on GitHub at github.com/schwallergroup/steer.