Quantifying the performance of machine learning models in materials discovery

Marco Musto

Citrine Informatics, USA


Dr. Marco Musto is a Senior Research Scientist at Citrine Informatics, where he focuses on publicly funded material informatics and digitalisation projects.
He holds a doctorate in Computational Mechanics from Brunel University London.
During his +15 years' career he held various R&D roles covering diverse topics such as for example long-term ageing characterisation, development of material constitutive laws for fracture,  machine-learning modelling of failure phenomena.


Quantifying the performance of machine learning models in materials discovery

The performance of machine learning (ML) models is typically assessed by simple statistics such as the root-mean-square error (RMSE) or the correlation coefficient (r2) between predicted and actual material property values. However intuitively appealing, it is not clear whether the same metrics are effective in evaluating ML models used to drive iterative material discovery. The question is investigated by simulating a sequential learning (SL)-guided material discovery workflow on several datasets. It is observed that no clear connection exists between statistics such as RMSE and the ML model ability to drive the iterative (and possibly extrapolative) discovery of new materials. Critical factors affecting ML model performance are identified and discussed. Furthermore, new measures such as Discovery Yield (DY) and Discovery Probability (DP) are formulated to overcome the limitations of "static"metrics.

European Materials Modelling Council
Silversquare Stéphanie
Avenue Louise 54
1050 Brussels
CBE no: 0731 621 312