AI-Descartes: A Scientific Renaissance in the World of Artificial Intelligence

AI-Descartes, an AI scientist developed by researchers at IBM Research, Samsung AI, and the University of Maryland, Baltimore County, has reproduced key parts of Nobel Prize-winning work, including Langmuir’s gas behavior equations and Kepler’s third law of planetary motion. Supported by the Defense Advanced Research Projects Agency (DARPA), the AI system utilizes symbolic regression to find equations fitting data, and its most distinctive feature is its logical reasoning ability. This enables AI-Descartes to determine which equations best fit with background scientific theory. The system is particularly effective with noisy, real-world data and small data sets. The team is working on creating new datasets and training computers to read scientific papers and construct background theories to refine and expand the system’s capabilities.

The system demonstrated its chops on Kepler’s third law of planetary motion, Einstein’s relativistic time-dilation law, and Langmuir’s equation of gas adsorption.

AI-Descartes, a new AI scientist, has successfully reproduced Nobel Prize-winning work using logical reasoning and symbolic regression to find accurate equations. The system is effective with real-world data and small datasets, with future goals including automating the construction of background theories.

In 1918, the American chemist Irving Langmuir published a paper examining the behavior of gas molecules sticking to a solid surface. Guided by the results of careful experiments, as well as his theory that solids offer discrete sites for the gas molecules to fill, he worked out a series of equations that describe how much gas will stick, given the pressure.

Now, about a hundred years later, an “AI scientist” developed by researchers at IBM Research, Samsung AI, and the University of Maryland, Baltimore County (UMBC) has reproduced a key part of Langmuir’s Nobel Prize-winning work. The system—artificial intelligence (AI) functioning as a scientist—also rediscovered Kepler’s third law of planetary motion, which can calculate the time it takes one space object to orbit another given the distance separating them, and produced a good approximation of Einstein’s relativistic time-dilation law, which shows that time slows down for fast-moving objects.

The research was supported by the Defense Advanced Research Projects Agency (DARPAFormed in 1958 (as ARPA), the Defense Advanced Research Projects Agency (DARPA) is an agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. DARPA formulates and executes research and development projects to expand the frontiers of technology and science, often beyond immediate U.S. military requirements, by collaborating with academic, industry, and government partners.” data-gt-translate-attributes=”[{“attribute”:”data-cmtooltip”, “format”:”html”}]”>DARPA). A paper describing the results will be published today (April 12) in the journal Nature Communications<em>Nature Communications</em> is a peer-reviewed, open-access, multidisciplinary, scientific journal published by Nature Portfolio. It covers the natural sciences, including physics, biology, chemistry, medicine, and earth sciences. It began publishing in 2010 and has editorial offices in London, Berlin, New York City, and Shanghai. ” data-gt-translate-attributes=”[{“attribute”:”data-cmtooltip”, “format”:”html”}]”>Nature Communications.

A machine-learning tool that reasons

The new AI scientist—dubbed “AI-Descartes” by the researchers—joins the likes of AI Feynman and other recently developed computing tools that aim to speed up scientific discovery. At the core of these systems is a concept called symbolic regression, which finds equations to fit data. Given basic operators, such as addition, multiplication, and division, the systems can generate hundreds to millions of candidate equations, searching for the ones that most accurately describe the relationships in the data.

AI-Descartes offers a few advantages over other systems, but its most distinctive feature is its ability to logically reason, says Cristina Cornelio, a research scientist at Samsung AI in Cambridge, England who is first author on the paper. If there are multiple candidate equations that fit the data well, the system identifies which equations fit best with background scientific theory. The ability to reason also distinguishes the system from “generative AI” programs such as ChatGPT, whose large language model has limited logical skills and sometimes messes up basic math.

“In our work, we are merging a first-principles approach, which has been used by scientists for centuries to derive new formulas from existing background theories, with a data-driven approach that is more common in the machine learningMachine learning is a subset of artificial intelligence (AI) that deals with the development of algorithms and statistical models that enable computers to learn from data and make predictions or decisions without being explicitly programmed to do so. Machine learning is used to identify patterns in data, classify data into different categories, or make predictions about future events. It can be categorized into three main types of learning: supervised, unsupervised and reinforcement learning.” data-gt-translate-attributes=”[{“attribute”:”data-cmtooltip”, “format”:”html”}]”>machine learning era,” Cornelio says. “This combination allows us to take advantage of both approaches and create more accurate and meaningful models for a wide range of applications.”

The name AI-Descartes is a nod to 17^th-century mathematician and philosopher René Descartes, who argued that the natural world could be described by a few fundamental physical laws and that logical deduction played a key role in scientific discovery.

Suited for real-world data

The system works particularly well on noisy, real-world data, which can trip up traditional symbolic regression programs that might overlook the real signal in an effort to find formulas that capture every errant zig and zag of the data. It also handles small data sets well, even finding reliable equations when fed as few as ten data points.

One factor that might slow down the adoption of a tool like AI-Descartes for frontier science is the need to identify and code associated background theory for open scientific questions. The team is working to create new datasets that contain both real measurement data and an associated background theory to refine their system and test it on new terrain.

They would also like to eventually train computers to read scientific papers and construct the background theory themselves.

“In this work, we needed human experts to write down, in formal, computer-readable terms, what the axioms of the background theory are, and if the human missed any or got any of those wrong, the system won’t work,” says co-author Tyler Josephson, assistant professor of Chemical, Biochemical and Environmental Engineering at UMBC. “In the future,” he says, “we’d like to automate this part of the work as well, so we can explore many more areas of science and engineering.”

This goal motivates Josephson’s research on AI tools to advance chemical engineering.

Ultimately, the team hopes their AI-Descartes, like the real person, may inspire a productive new approach to science. “One of the most exciting aspects of our work is the potential to make significant advances in scientific research,” Cornelio says.

Reference: “Combining Data and Theory for Derivable Scientific Discovery with AI-Descartes” 12 April 2023, Nature Communications.
DOI: 10.1038/s41467-023-37236-y

Funding: Defense Advanced Research Projects Agency

Source: SciTechDaily