Abstract:
In this work, machine learning regression techniques are applied to a large amount of data from Materials Project Database, to develop machine learning models capable of accurately predicting the properties of sodium-ion battery cathode materials. Different machine learning models, namely, Bayesian ridge, gradient boosting regressor, light gradient boosting machine, extra trees regressor, random forest, and orthogonal matching pursuit are successfully developed and validated, using SIB materials’ properties calculated from DFT as input dataset, with the models’ efficiency based on elemental properties of materials constituents feature vectors.
The target properties in this work include formation energy, final energy, Fermi energy, energy above hull, density, and band gap. The importance of feature vectors derived from the properties of materials’ chemical compounds and elemental properties of their constituent is evaluated. The average covalent radius and the average single bond covalent radius were found to be the most important descriptors in predicting formation and final energies, whilst the estimated face centred cubic lattice parameter, the average electronegativity, and the average density to be the most important descriptors for predicting the Fermi energy. The optimal features in predicting energy above hull were found to be the sum of sound velocity, sum of total unfilled electron, and the average ground state energy. Furthermore, the results show that maximum mass specific heat capacity and variance of density functional theory energy per atom descriptors are the most essential in accurately predicting the materials density and valence electron in d shell, the average radius and the average electronegativity been the most important features for predicting band gap.
Amongst various algorithms that are evaluated, the Bayesian ridge model is found to be the best model in predicting the formation energy with an accuracy of 0.99 and 0.01 eV coefficient of determination and mean square error, respectively, and final energy of 0.98 and 0.03 eV accuracy for the coefficient of determination and mean square error, respectively. Light gradient boosting machine model is found to be the best model in predicting the Fermi energy with an accuracy of 0.82 and 0.52 eV coefficient of determination and mean square error, respectively, and energy above hull of 0.67 and 0.01 eV, for the coefficient of determination and mean square error, respectively Extra trees regressor is found to be the best model in predicting the density with an accuracy of 0.95 and 0.09 g/cm3 coefficient of determination and mean square error, respectively, and band gap of 0.78 and 0.66 eV, for the coefficient of determination and mean square error, respectively. The models demonstrate an improvement accuracy in predicting the sodium-ion battery materials properties as demonstrated by the regression scores.