Machine-learning models can speed up the discovery of new materials by making predictions and suggesting experiments. But most models today only consider a few specific types of data or variables. Compare that with human scientists, who work in a collaborative environment and consider experimental results, the broader scientific literature, imaging and structural analysis, personal experience or intuition, and input from colleagues and peer reviewers.
Now, MIT researchers have developed a method for optimizing materials recipes and planning experiments that incorporates information from diverse sources like insights from the literature, chemical compositions, microstructural images, and more. The approach is part of a new platform, named Copilot for Real-world Experimental Scientists (CRESt), that also uses robotic equipment for high-throughput materials testing, the results of which are fed back into large multimodal models to further optimize materials recipes.
Human researchers can converse with the system in natural language, with no coding required, and the system makes its own observations and hypotheses along the way. Cameras and visual language models also allow the system to monitor experiments, detect issues, and suggest corrections.
“In the field of AI for science, the key is designing new experiments,” says Ju Li, School of Engineering Carl Richard Soderberg Professor of Power Engineering. “We use multimodal feedback — for example information from previous literature on how palladium behaved in fuel cells at this temperature, and human feedback — to complement experimental data and design new experiments. We also use robots to synthesize and characterize the material’s structure and to test performance.”
The system is described in a paper published in Nature. The researchers used CRESt to explore more than 900 chemistries and conduct 3,500 electrochemical tests, leading to the discovery of a catalyst material that delivered record power density in a fuel cell that runs on formate salt to produce electricity.
Joining Li on the paper as first authors are PhD student Zhen Zhang, Zhichu Ren PhD ’24, PhD student Chia-Wei Hsu, and postdoc Weibin Chen. Their coauthors are MIT Assistant Professor Iwnetim Abate; Associate Professor Pulkit Agrawal; JR East Professor of Engineering Yang Shao-Horn; MIT.nano researcher Aubrey Penn; Zhang-Wei Hong PhD ’25, Hongbin Xu PhD ’25; Daniel Zheng PhD ’25; MIT graduate students Shuhan Miao and Hugh Smith; MIT postdocs Yimeng Huang, Weiyin Chen, Yungsheng Tian, Yifan Gao, and Yaoshen Niu; former MIT postdoc Sipei Li; and collaborators including Chi-Feng Lee, Yu-Cheng Shao, Hsiao-Tsu Wang, and Ying-Rui Lu.
A smarter system
Materials science experiments can be time-consuming and expensive. They require researchers to carefully design workflows, make new material, and run a series of tests and analysis to understand what happened. Those results are then used to decide how to improve the material.
To improve the process, some researchers have turned to a machine-learning strategy known as active learning to make efficient use of previous experimental data points and explore or exploit those data. When paired with a statistical technique known as Bayesian optimization (BO), active learning has helped researchers identify new materials for things like batteries and advanced semiconductors.
“Bayesian optimization is like Netflix recommending the next movie to watch based on your viewing history, except instead it recommends the next experiment to do,” Li explains. “But basic Bayesian optimization is too simplistic. It uses a boxed-in design space, so if I say I’m going to use platinum, palladium, and iron, it only changes the ratio of those elements in this small space. But real materials have a lot more dependencies, and BO often gets lost.”
Most active learning approaches also rely on single data streams that don’t capture everything that goes on in an experiment. To equip computational systems with more human-like knowledge, while still taking advantage of the speed and control of automated systems, Li and his collaborators built CRESt.
CRESt’s robotic equipment includes a liquid-handling robot, a carbothermal shock system to rapidly synthesize materials, an automated electrochemical workstation for testing, characterization equipment including automated electron microscopy and optical microscopy, and auxiliary devices such as pumps and gas valves, which can also be remotely controlled. Many processing parameters can also be tuned.
With the user interface, researchers can chat with CRESt and tell it to use active learning to find promising materials recipes for different projects. CRESt can include up to 20 precursor molecules and substrates into its recipe. To guide material designs, CRESt’s models search through scientific papers for descriptions of elements or precursor molecules that might be useful. When human researchers tell CRESt to pursue new recipes, it kicks off a robotic symphony of sample preparation, characterization, and testing. The researcher can also ask CRESt to perform image analysis from scanning electron microscopy imaging, X-ray diffraction, and other sources.
Information from those processes is used to train the active learning models, which use both literature knowledge and current experimental results to suggest further experiments and accelerate materials discovery.
“For each recipe we use previous literature text or databases, and it creates these huge representations of every recipe based on the previous knowledge base before even doing the experiment,” says Li. “We perform principal component analysis in this knowledge embedding space to get a reduced search space that captures most of the performance variability. Then we use Bayesian optimization in this reduced space to design the new experiment. After the new experiment, we feed newly acquired multimodal experimental data and human feedback into a large language model to augment the knowledgebase and redefine the reduced search space, which gives us a big boost in active learning efficiency.”
Materials science experiments can also face reproducibility challenges. To address the problem, CRESt monitors its experiments with cameras, looking for potential problems and suggesting solutions via text and voice to human researchers.
The researchers used CRESt to develop an electrode material for an advanced type of high-density fuel cell known as a direct formate fuel cell. After exploring more than 900 chemistries over three months, CRESt discovered a catalyst material made from eight elements that achieved a 9.3-fold improvement in power density per dollar over pure palladium, an expensive precious metal. In further tests, CRESTs material was used to deliver a record power density to a working direct formate fuel cell even though the cell contained just one-fourth of the precious metals of previous devices.
The results show the potential for CRESt to find solutions to real-world energy problems that have plagued the materials science and engineering community for decades.
“A significant challenge for fuel-cell catalysts is the use of precious metal,” says Zhang. “For fuel cells, researchers have used various precious metals like palladium and platinum. We used a multielement catalyst that also incorporates many other cheap elements to create the optimal coordination environment for catalytic activity and resistance to poisoning species such as carbon monoxide and adsorbed hydrogen atom. People have been searching low-cost options for many years. This system greatly accelerated our search for these catalysts.”
A helpful assistant
Early on, poor reproducibility emerged as a major problem that limited the researchers’ ability to perform their new active learning technique on experimental datasets. Material properties can be influenced by the way the precursors are mixed and processed, and any number of problems can subtly alter experimental conditions, requiring careful inspection to correct.
To partially automate the process, the researchers coupled computer vision and vision language models with domain knowledge from the scientific literature, which allowed the system to hypothesize sources of irreproducibility and propose solutions. For example, the models can notice when there’s a millimeter-sized deviation in a sample’s shape or when a pipette moves something out of place. The researchers incorporated some of the model’s suggestions, leading to improved consistency, suggesting the models already make good experimental assistants.
The researchers noted that humans still performed most of the debugging in their experiments.
“CREST is an assistant, not a replacement, for human researchers,” Li says. “Human researchers are still indispensable. In fact, we use natural language so the system can explain what it is doing and present observations and hypotheses. But this is a step toward more flexible, self-driving labs.”




