Biophys. J. 118 (3) 765-780 [2020-02-04; online 2019-12-21]
Biomolecular simulations are intrinsically high dimensional and generate noisy data sets of ever-increasing size. Extracting important features from the data is crucial for understanding the biophysical properties of molecular processes, but remains a big challenge. Machine learning (ML) provides powerful dimensionality reduction tools. However, such methods are often criticized as resembling black boxes with limited human-interpretable insight. We use methods from supervised and unsupervised ML to efficiently create interpretable maps of important features from molecular simulations. We benchmark the performance of several methods, including neural networks, random forests, and principal component analysis, using a toy model with properties reminiscent of macromolecular behavior. We then analyze three diverse biological processes: conformational changes within the soluble protein calmodulin, ligand binding to a G protein-coupled receptor, and activation of an ion channel voltage-sensor domain, unraveling features critical for signal transduction, ligand binding, and voltage sensing. This work demonstrates the usefulness of ML in understanding biomolecular states and demystifying complex simulations.