New “AI Scientist” Combines Theory and Data to Discover Scientific Equations

The numerical data, background theory, and a discovered model are depicted for Kepler’s third law of planetary motion giving the orbital period of a planet in the solar system. The data consists of measurements (m1, m2, d, p) of the mass of the sun m1, the orbital period p and mass m2 for each planet and its distance d from the sun. The background theory amounts to Newton’s laws of motion, i.e., the formulae for centrifugal force, gravitational force, and equilibrium conditions. The 4-tuples (m1, m2, d, p) are projected into (m1 + m2, d, p). The blue manifold represents solutions of 𝑓 , which is the function derivable from the background-theory axioms that represents the variable of interest. The gray manifold represents solutions of the discovered model f. The double arrows indicate the distances β ( f ) and ε( f ).

In 1918, the American chemist Irving Langmuir published a paper examining the behavior of gas molecules sticking to a solid surface. Guided by the results of careful experiments, as well as his theory that solids offer discrete sites for the gas molecules to fill, he worked out a series of equations that describe how much gas will stick, given the pressure.

An interpretation of the scientific method as implemented by our system.

Now, about a hundred years later, an “AI scientist” developed by researchers at IBM Research, Samsung AI, and the University of Maryland, Baltimore County (UMBC) has reproduced a key part of Langmuir’s Nobel Prize-winning work. The system—artificial intelligence (AI) functioning as a scientist—also rediscovered Kepler’s third law of planetary motion, which can calculate the time it takes one space object to orbit another given the distance separating them, and produced a good approximation of Einstein’s relativistic time-dilation law, which shows that time slows down for fast-moving objects.

The research was supported by the Defense Advanced Research Projects Agency (DARPA). A paper describing the results will be published in the journal Nature Communications on April 12.

A machine-learning tool that reasons

The new AI scientist—dubbed “AI-Descartes” by the researchers—joins the likes of AI Feynman and other recently developed computing tools that aim to speed up scientific discovery. At the core of these systems is a concept called symbolic regression, which finds equations to fit data. Given basic operators, such as addition, multiplication, and division, the systems can generate hundreds to millions of candidate equations, searching for the ones that most accurately describe the relationships in the data.

AI-Descartes offers a few advantages over other systems, but its most distinctive feature is its ability to logically reason, says Cristina Cornelio, a research scientist at Samsung AI in Cambridge, England who is first author on the paper. If there are multiple candidate equations that fit the data well, the system identifies which equations fit best with background scientific theory. The ability to reason also distinguishes the system from “generative AI” programs such as ChatGPT, whose large language model has limited logical skills and sometimes messes up basic math.

“In our work, we are merging a first-principles approach, which has been used by scientists for centuries to derive new formulas from existing background theories, with a data-driven approach that is more common in the machine learning era,” Cornelio says. “This combination allows us to take advantage of both approaches and create more accurate and meaningful models for a wide range of applications.”

The name AI-Descartes is a nod to 17th-century mathematician and philosopher René Descartes, who argued that the natural world could be described by a few fundamental physical laws and that logical deduction played a key role in scientific discovery.

Colored components correspond to our system, and gray components indicate standard techniques for scientific discovery (human-driven or artificial) that have not been integrated into the current system. The colors match the respective components of the discovery cycle of Fig. 2. The present system generates hypotheses from data using symbolic regression, which are posed as conjectures to an automated deductive reasoning system, which proves or disproves them based on background theory or provides reasoning-based quality measures.

Suited for real-world data

The system works particularly well on noisy, real-world data, which can trip up traditional symbolic regression programs that might overlook the real signal in an effort to find formulas that capture every errant zig and zag of the data. It also handles small data sets well, even finding reliable equations when fed as few as ten data points.

One factor that might slow down the adoption of a tool like AI-Descartes for frontier science is the need to identify and code associated background theory for open scientific questions. The team is working to create new datasets that contain both real measurement data and an associated background theory to refine their system and test it on new terrain.

They would also like to eventually train computers to read scientific papers and construct the background theory themselves.

“In this work, we needed human experts to write down, in formal, computer-readable terms, what the axioms of the background theory are, and if the human missed any or got any of those wrong, the system won’t work,” says co-author Tyler Josephson, assistant professor of Chemical, Biochemical and Environmental Engineering at UMBC. “In the future,” he says, “we’d like to automate this part of the work as well, so we can explore many more areas of science and engineering.” 

This goal motivates Josephson’s research on AI tools to advance chemical engineering. 

Ultimately, the team hopes their AI-Descartes, like the real person, may inspire a productive new approach to science. “One of the most exciting aspects of our work is the potential to make significant advances in scientific research,” Cornelio says.

No Comments Yet

Leave a Reply

Your email address will not be published.

© 2025 Open Data News Wire. Use Our Intel. All Rights Reserved. Washington, D.C.