Enhancing molecular machine learning with quantum-chemical insight

May 29, 2025

Layered illustration of different types of molecular representations. The prominent image shows red, gray, and blue spheres of different sizes connected by black rods.

Source: Gabe Gomes

Variety of molecular representation explored in this work

Molecular machine learning (ML) underpins critical workflows in drug discovery, material science, and catalyst optimization by rapidly predicting molecular interactions and properties. For instance, in drug discovery, ML models forecast interactions between drug candidates and biological targets, greatly enhancing efficiency and accelerating discovery.

Despite their utility, traditional molecular representations, including simplified graphs, three-dimensional coordinates, textual formats, and global descriptors, have inherent limitations. These methods frequently overlook crucial quantum-mechanical details essential for accurately capturing molecular properties and behaviors. As prediction tasks become more complex, developing representations that explicitly incorporate quantum-level molecular information is increasingly important.

In Nature Machine Intelligence, Gabe Gomes, Daniil Boiko, and their collaborators propose a new type of molecular ML representation that includes quantum-chemical interactions.

Boiko, a Ph.D. student in chemical engineering, and Gomes, an assistant professor of chemical engineering and chemistry, show a path to improving models using less data and an interpretable, chemistry-infused approach. Their representation, which incorporates additional information about (natural bond) orbitals and their interactions, performs better than standard molecular graphs.

Computational chemists use orbitals to describe the location and behavior of electrons in a molecule. Stereoelectronic effects arise from the spatial relationships between a molecule's orbitals and their electronic interactions, directly influencing molecular geometry, reactivity, stability, and various other physical and chemical properties. Gomes has been studying the relationship between molecular structure and reactivity for the past decade, with particular focus on the development and applications of stereoelectronic effects. His latest work with Boiko encodes stereoelectronic information into a molecular ML model to create stereoelectronics-infused molecular graphs (SIMGs).

Calculating interactions between orbitals can be computationally expensive, making these methods slow for moderately sized molecules and intractable for larger molecules. To address this limitation, Boiko and Gomes developed an additional model that can quickly generate the extended representation based on a standard molecular graph. Compared with methods that take hours or days, the new model works in seconds. It is trained on small molecules and can accurately predict the extended graphs for larger molecules.

This model can be applied when regular quantum chemistry calculations are not possible, like for entire peptides and proteins.

Daniil Boiko, Ph.D. student, Chemical Engineering

By approximating outputs of quantum chemistry calculations using another pipeline, Boiko and Gomes hope their model will unlock previously inaccessible chemical insight.

In developing the models, it was important to Boiko and Gomes that their new representation be easily interpretable by the molecular ML and general chemistry communities. They created a web application to quickly analyze the stereoelectronic interactions of molecules, and the tool also makes their methods more accessible. The application extends a simple molecular graph with known information about bonds; calculates different targets, including atom charges and lone pairs; provides a description of bond orbitals; and outputs a map of orbital interactions.

"In chemistry, we have very small data sets," says Boiko. "On this scale of data, more explicit representation of what's going on in the molecule is very important."

By enhancing existing molecular representations and enabling rapid generation of new quantum-informed graphs, Boiko and Gomes have significantly advanced the capabilities of molecular machine learning. The team is working on expanding the scope of the representation to the entire periodic table and showing myriad applications from spectroscopy to catalysis.


For media inquiries, please contact Lauren Smith at lsmith2@andrew.cmu.edu.