npj: 对X射线衍射小型数据集分类—数据增强-深度神经网络
海归学者发起的公益学术平台
分享信息,整合资源
交流学术,偶尔风月
快速材料表征对于高通量新材料探索十分重要。XRD是材料表征和筛选的重要手段,但获得XRD并对其进行分类通常比较耗时,成为高通量材料表征的瓶颈之一。如高角度分辨率的XRD数据采集通常需要1小时,此后一般还需晶体学专业人员再耗费1-2小时进行Rietveld精修,这还只是对已知结晶相所作的数据采集,对未知的结晶相将花费更多时间。
来自麻省理工学院和新加坡的研究团队发展了一种基于监督机器学习的框架用于快速获得和识别新型薄膜材料的XRD图谱。他们首先根据ICSD数据库中164种薄膜卤化物和115种实验合成薄膜的XRD图谱建立了一个数据库。基于这个小型库发展了一个与模型无关的、物理信息输入的数据扩展方法用于构建训练数据集。进而采用该数据集训练了一个卷积神经网络用于XRD图谱分类,其维度和空间群分类准确率分别可达93和89%。本研究提出的方法可以成功解决新材料探索固有的数据稀缺问题,能够快速地(在5.5分钟以内)得到一个新材料的XRD图谱并对其进行分类。
该文近期发表于npj Computational Materials 5: 60 (2019),英文标题与摘要如下,点击左下角“阅读原文”可以自由获取论文PDF。
Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks
Felipe Oviedo, Zekun Ren, Shijing Sun, Charles Settens, Zhe Liu, Noor Titan Putri Hartono, Savitha Ramasamy, Brian L. De Cost, Siyu I. P. Tian, Giuseppe Romano, Aaron Gilad Kusne & Tonio Buonassisi
X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce data problem intrinsic to novel materials development by coupling a supervised machine learning approach with a model-agnostic, physics-informed data augmentation strategy using simulated data from the Inorganic Crystal Structure Database (ICSD) and experimental data. As a test case, 115 thin-film metal-halides spanning three dimensionalities and seven space groups are synthesized and classified. After testing various algorithms, we develop and implement an all convolutional neural network, with cross-validated accuracies for dimensionality and space group classification of 93 and 89%, respectively. We propose average class activation maps, computed from a global average pooling layer, to allow high model interpretability by human experimentalists, elucidating the root causes of misclassification. Finally, we systematically evaluate the maximum XRD pattern step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be 0.16° 2θ, which enables an XRD pattern to be obtained and classified in 5.5 min or less.
扩展阅读
本文系网易新闻·网易号“各有态度”特色内容
媒体转载联系授权请看下方