Rightsizing AI Models and Datasets for Materials Design

In silico materials design has long faced a fundamental tradeoff between accuracy, universality, and efficiency. In 2022, we pioneered the concept of a universal machine learning interatomic potential (UMLIP) [Chen & Ong, Nat. Comput. Sci., 2022, 2, 718–728] – a foundational materials model (FMM) with comprehensive coverage of the periodic table. FMMs enable accurate, large-scale simulations across a broad spectrum of materials, offering transformative potential for materials discovery and design. More recently, the field has seen a trend toward increasingly complex FMM architectures—often with over 10 million parameters—trained on datasets exceeding 100 million structures, driven largely by major tech companies like Google DeepMind, Microsoft, and various startups. In this talk, I challenge the prevailing “bigger is better” paradigm in FMM development. I will present MatPES, a foundational, community-curated potential energy surface (PES) dataset of ~400,000 structures. Leveraging MatPES, we demonstrate that gains in FMM performance are primarily driven by data quality, and there are no “accuracy moat” in FMM architectures. Models trained on MatPES match or exceed the accuracy of previous FMMs across a diverse set of equilibrium, near-equilibrium, and dynamic benchmarks. Finally, I will argue that the key priorities in architectural and algorithmic development should be in the parallelization and scaling of such FMMs in high performance computing and their integration in high-throughput materials workflows.

Discussion(0)

No comments yet. Be the first to comment.

Publication Info

DOI: 10.1149/ma2025-027989mtgabs
Year: 2025
Published: —
Language

Article Details

Volume: MA2025-02
Issue: 7
Link Of The Paper: https://doi.org/10.1149/ma2025-027989mtgabs

Timeline

Created:June 19, 2026

Related publications

Preprint2025

Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature

Leigh Weston, Vahe Tshitoyan, John Dagdelen, Olga Kononova, Amalie Trewartha, Kristin A. Persson, Gerbrand Ceder, Anubhav Jain

Article2022

Deep AI-Powered Cyber Threat Analysis in IIoT

Iram Bibi, Adnan Akhunzada, Neeraj Kumar

Rightsizing AI Models and Datasets for Materials Design

Abstract

Discussion(0)

Related publications

A Foundational Potential Energy Surface Dataset for Materials

Materials Graph Library (MatGL), an open-source graph deep learning library for materials science and chemistry

Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning

Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature

Deep AI-Powered Cyber Threat Analysis in IIoT