Chinese Science Bulletin, Volume 64 , Issue 32 : 3270-3275(2019) https://doi.org/10.1360/TB-2019-0456

Research progress and perspective of machine learning in material design

More info
  • ReceivedAug 3, 2019
  • AcceptedOct 8, 2019
  • PublishedOct 11, 2019


With rapid development of economy and society, the excessive demand for resources has caused the imbalance of ecological environment. It is therefore urgent to develop new functional materials, especially energy conversion materials, to solve the scientific and engineering problems in the field of resources and environment. However, the research and development of materials was traditionally based on inefficient trial-and-error experiments. Although state-of-the-art approach such as density functional theory (DFT) is able to simulate materials properties, the calculations of high temperature, high pressure and strong magnetic field environment, as well as the selection of strong and weak correlation system between electrons and interaction potential between atoms are still unsatisfactory.

Huge amount of data produced by experiments and simulations provides databases for machine learning. Combining the theory of probability and statistics algorithm, machine learning has recently made much progress in the new material discovery and design, the prediction of material performance and application and other purposes ranged from the macroscopic to the microscopic scale, such as the statistics classification of perovskite materials, the stability prediction of perovskite materials based on high-throughput computing and intermetallic compound electrocatalysts design and selection, etc. Meanwhile, developing a physically interpretable descriptor that captures the trend of materials properties is a critical goal of data-driven science. Machine learning has been applied in the field of materials science and engineering, exhibiting a different perspective from traditional approaches. In this paper, recent progress of materials design in photovoltaics, electrocatalytic and performance evaluation of energy storage batteries are reviewed. In those efforts, machine learning aims to discover the relationships among compositional and structural features and functionality in complex systems of materials.

Machine learning is a data-driven approach that relies heavily on data. Compared with image recognition and other fields that usually have millions of data, material science research often leads to over-fitting of machine learning models when the amount of training data is limited, which greatly reduces the generalization ability of machine learning methods. In order to increase the amount of material data, researchers can obtain theoretical data through high-throughput computing on the one hand, and develop methods for intelligent reading of literatures to access and obtain a large number of relevant experimental and theoretical data from publications on the other hand. Another promising method to solve the problem of finite data sets is meta-learning, that is, learning knowledge within or across problems. The development of new technologies such as neural Turing machine and imitation learning makes this process possible. Recently, it has been reported that Bayesian optimization can reach the experience level of human judgment on things through one-shot learning under the condition of limited data, which may have a huge promotion effect on materials science with scarce data with slow and expensive acquisition speed. Although machine learning methods have greatly improved their predictive accuracy in material discovery, design, performance and application, they have not expanded well in terms of transferability. Active learning methods also provide consistent and automated improvements in accuracy and transferability, making a significant contribution to the success of the universal model. In addition, one of the promising point of material science using machine learning method is to develop new descriptor owning physical interpretability, makes the black box model of statistical machine learning be explainable. In a word, the development of computer intelligence algorithms would promote the innovation of new materials discovery.

Funded by




[1] Hattrick-Simpers J R, Gregoire J M, Kusne A G. Perspective: Composition-structure-property mapping in high-throughput experiments: Turning data into knowledge. APL Mater, 2016, 4: 053211 CrossRef ADS Google Scholar

[2] Butler K T, Davies D W, Cartwright H, et al. Machine learning for molecular and materials science. Nature, 2018, 559: 547-555 CrossRef PubMed ADS Google Scholar

[3] Agrawal A, Choudhary A. Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science. APL Mater, 2016, 4: 053208 CrossRef ADS Google Scholar

[4] Hill J, Mulholland G, Persson K, et al. Materials science with large-scale data and informatics: Unlocking new opportunities. MRS Bull, 2016, 41: 399-409 CrossRef Google Scholar

[5] Schmidt J, Marques M R G, Botti S, et al. Recent advances and applications of machine learning in solid-state materials science. npj Comput Mater, 2019, 5: 83 CrossRef ADS Google Scholar

[6] Ramprasad R, Batra R, Pilania G, et al. Machine learning in materials informatics: Recent applications and prospects. npj Comput Mater, 2017, 3: 54 CrossRef ADS arXiv Google Scholar

[7] Tetko I V, Maran U, Tropsha A. Public (Q)SAR services, integrated modeling environments, and model repositories on the web: State of the art and perspectives for future development. Mol Inf, 2017, 36: 1600082 CrossRef PubMed Google Scholar

[8] Kalidindi S R, Brough D B, Li S, et al. Role of materials data science and informatics in accelerated materials innovation. MRS Bull, 2016, 41: 596-602 CrossRef Google Scholar

[9] Correa-Baena J P, Hippalgaonkar K, van Duren J, et al. Accelerating materials development via automation, machine learning, and high-performance computing. Joule, 2018, 2: 1410-1420 CrossRef Google Scholar

[10] Brunton S L, Kutz J N. Methods for data-driven multiscale model discovery for materials. J Phys Mater, 2019, 2: 044002 CrossRef ADS Google Scholar

[11] Schleder G R, Padilha A C M, Acosta C M, et al. From DFT to machine learning: Recent approaches to materials science–A review. J Phys Mater, 2019, 2: 032001. Google Scholar

[12] Raccuglia P, Elbert K C, Adler P D F, et al. Machine-learning-assisted materials discovery using failed experiments. Nature, 2016, 533: 73-76 CrossRef PubMed ADS Google Scholar

[13] Yu Y, Tan X, Ning S, et al. Machine learning for understanding compatibility of organic-inorganic hybrid perovskites with post-treatment amines. ACS Energy Lett, 2019, 4: 397-404 CrossRef Google Scholar

[14] Ward L, Wolverton C. Atomistic calculations and materials informatics: A review. Curr Opin Solid State Mater Sci, 2017, 21: 167-176 CrossRef ADS Google Scholar

[15] Belsky A, Hellenbrandt M, Karen V L, et al. New developments in the inorganic crystal structure database (ICSD): Accessibility in support of materials research and design. Acta Cryst Sect A Found Cryst, 2002, 58: 364-369 CrossRef PubMed Google Scholar

[16] Allen F H. The cambridge structural database: A quarter of a million crystal structures and rising. Acta Cryst Sect B Struct Sci Cryst Eng Mater, 2002, 58: 380−388. Google Scholar

[17] Gražulis S, Chateigner D, Downs R T, et al. Crystallography Open Database – An open-access collection of crystal structures. J Appl Crystlogr, 2009, 42: 726-729 CrossRef PubMed Google Scholar

[18] Villars P, Berndt M, Brandenburg K, et al. The pauling file, binaries edition. J Alloys Compd, 2004, 367: 293-297 CrossRef Google Scholar

[19] Xu Y, Yamazaki M, Villars P. Inorganic materials database for exploring the nature of material. Jpn J Appl Phys, 2011, 50: 11RH02 CrossRef Google Scholar

[20] Kirklin S, Saal J E, Meredig B, et al. The Open Quantum Materials Database (OQMD): Assessing the accuracy of DFT formation energies. npj Comput Mater, 2015, 1: 15010 CrossRef ADS Google Scholar

[21] Hachmann J, Olivares-Amaya R, Atahan-Evrenk S, et al. The harvard clean energy project: Large-scale computational screening and design of organic photovoltaics on the world community grid. J Phys Chem Lett, 2011, 2: 2241−2251. Google Scholar

[22] Curtarolo S, Setyawan W, Hart G L W, et al. AFLOW: An automatic framework for high-throughput materials discovery. Comput Mater Sci, 2012, 58: 218-226 CrossRef Google Scholar

[23] Liu Y, Zhao T, Ju W, et al. Materials discovery and design using machine learning. J Materiom, 2017, 3: 159-177 CrossRef Google Scholar

[24] Ward L, Agrawal A, Choudhary A, et al. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput Mater, 2016, 2: 16028 CrossRef Google Scholar

[25] Peña M A, Fierro J L G. Chemical structures and performance of perovskite oxides. Chem Rev, 2001, 101: 1981-2018 CrossRef Google Scholar

[26] Yin W J, Weng B, Ge J, et al. Oxide perovskites, double perovskites and derivatives for electrocatalysis, photocatalysis, and photovoltaics. Energy Environ Sci, 2019, 12: 442-462 CrossRef Google Scholar

[27] Roth R S. Classification of perovskite and other ABO3-type compounds. J Res Nat Bur Stand, 1957, 58: 75−88. Google Scholar

[28] Zhang H, Li N, Li K, et al. Structural stability and formability of ABO3-type perovskite compounds. Acta Cryst Sect A Found Cryst, 2007, 63: 812-818 CrossRef PubMed Google Scholar

[29] Li C, Lu X, Ding W, et al. Formability of ABX3 (X = F, Cl, Br, I) halide perovskites. Acta Crystlogr B Struct Sci, 2008, 64: 702-707 CrossRef PubMed Google Scholar

[30] Balachandran P V, Emery A A, Gubernatis J E, et al. Predictions of new ABO3 perovskite compounds by combining machine learning and density functional theory. Phys Rev Mater, 2018, 2: 043802 CrossRef ADS Google Scholar

[31] Li W, Jacobs R, Morgan D. Predicting the thermodynamic stability of perovskite oxides using machine learning models. Comput Mater Sci, 2018, 150: 454-463 CrossRef Google Scholar

[32] Xu Q, Li Z, Liu M, et al. Rationalizing perovskite data for machine learning and materials design. J Phys Chem Lett, 2018, 9: 6948-6954 CrossRef PubMed Google Scholar

[33] Lu S, Zhou Q, Ouyang Y, et al. Accelerated discovery of stable lead-free hybrid organic-inorganic perovskites via machine learning. Nat Commun, 2018, 9: 3405 CrossRef PubMed ADS Google Scholar

[34] Li Z, Xu Q, Sun Q, et al. Thermodynamic stability landscape of halide double perovskites via high-throughput computing and machine learning. Adv Funct Mater, 2019, 29: 1807280 CrossRef Google Scholar

[35] Sun Q, Yin W J. Thermodynamic stability trend of cubic perovskites. J Am Chem Soc, 2017, 139: 14905-14908 CrossRef PubMed Google Scholar

[36] Tran K, Ulissi Z W. Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution. Nat Catal, 2018, 1: 696-703 CrossRef Google Scholar

[37] Gómez-Bombarelli R, Aguilera-Iparraguirre J, Hirzel T D, et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat Mater, 2016, 15: 1120−1127. Google Scholar

[38] Warmuth M K, Liao J, Rätsch G, et al. Active learning with support vector machines in the drug discovery process. J Chem Inf Comput Sci, 2003, 43: 667-673 CrossRef PubMed Google Scholar

[39] Gubaev K, Podryabinkin E V, Shapeev A V. Machine learning of molecular properties: Locality and active learning. J Chem Phys, 2018, 148: 1−9. Google Scholar

[40] Weng B, Song Z, Zhu R, et al. Symbolic regression discovery of new perovskite catalysts with high oxygen evolution reaction activity. 2019, arXiv:1908.06778. Google Scholar

[41] Waag W, Fleischer C, Sauer D U. Critical review of the methods for monitoring of lithium-ion batteries in electric and hybrid vehicles. J Power Sources, 2014, 258: 321-339 CrossRef ADS Google Scholar

[42] Wu L, Fu X, Guan Y. Review of the remaining useful life prognostics of vehicle lithium-ion batteries using data-driven methodologies. Appl Sci, 2016, 6: 166 CrossRef Google Scholar

[43] Severson K A, Attia P M, Jin N, et al. Data-driven prediction of battery cycle life before capacity degradation. Nat Energy, 2019, 4: 383-391 CrossRef ADS Google Scholar

[44] Wolpert D H, Macready W G. No free lunch theorems for optimization. IEEE Trans Evol Comput, 1996, 1: 67. Google Scholar

[45] Wang Y, Wagner N, Rondinelli J M. Symbolic regression in materials science. MRC Commun, 2019, 9: 793-805 CrossRef Google Scholar

[46] Bartel C J, Sutton C, Goldsmith B R, et al. New tolerance factor to predict the stability of perovskite oxides and halides. Sci Adv, 2019, 5: eaav0693 CrossRef PubMed ADS arXiv Google Scholar

[47] Bartel C J, Millican S L, Deml A M, et al. Physical descriptor for the Gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry. Nat Commun, 2018, 9: 4168 CrossRef PubMed ADS arXiv Google Scholar

[48] Jankowski N, Duch W, Grąbczewski K. Meta-learning in Computational Intelligence. Berlin: Springer, 2011. Google Scholar

[49] Graves A, Wayne G, Danihelka I. Neural turing machines. 2014, arXiv:1410.5401. arXiv Google Scholar

[50] Duan Y, Andrychowicz M, Stadie B, et al. One-shot imitation learning. In: Guyon I, Luxburg U V, Bengio S, et al, eds. Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook: NIPS, 2017. Google Scholar

[51] Lake B M, Salakhutdinov R, Tenenbaum J B. Human-level concept learning through probabilistic program induction. Science, 2015, 350: 1332-1338 CrossRef PubMed ADS Google Scholar

[52] Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design using machine learning: Generative models for matter engineering. Science, 2018, 361: 360-365 CrossRef PubMed ADS Google Scholar

[53] Zubatyuk R, Smith J S, Leszczynski J, et al. Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network. Sci Adv, 2019, 5: eaav6490 CrossRef PubMed Google Scholar

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号