Get Started
This software package provides highly efficient tools for robust and stable clustering of molecular dynamics (MD) trajectories including boarder correction methods. With this package one can obtain well-defined microstates from low-dimensional (\(d\lesssim10\)) input coordinates. This reduction can be achieved eg. via principal component analysis (PCA) using either interatom distances or backbone dihedral angles as input coordinates.
The shell commands invoking the clustering program are written in multiple lines to emphasize the single parameters and options given to the program. This is, however, only for better readability in this tutorial. All commands should be executed in a single line, without the additional line breaks.
Advantages
The main advantage of our density-based clustering algorithm, Sittel et al., 2016, over the widely used k-means is that it cuts by design the microstates at the energy barriers. Therewith, we ensure that there are no intrastate barriers which leads to an optimal assignment in the sense of Markovianity. Hence, it is typically sufficient to split the MD data in approx. 50-100 states. For achieving similar Markovianity, k-means may need several thousands of states where, even for \(10^7\) frames, most transitions are poorly sampled.
Scaling asymptotically with \(N \log{N}\) and applying GPU acceleration, the algorithm manages to cluster \(\ge10^7\) points in six dimensions in a couple of hours on a desktop computer.
Installation
This package can be installed with conda via
$ conda install moldyn-clustering -c conda-forge
If conda is not available, it can be compiled as well.
Compilation
Requirements
- BOOST >= 1.49
- cmake >= 2.8
- a recent GCC compiler (e.g. GNU g++ >= 4.9, must support C++11 standard)
- CUDA >= 9.1 (optional)
Compilation
Both vectorization and parallelization are supported. The cmake-flags are listed below the code block.
# download the package and enter directory
$ git clone https://github.com/moldyn/Clustering.git
$ cd Clustering
# generate and enter build directory
$ mkdir build
$ cd build
# setup compilation and compile
$ cmake .. -DCMAKE_INSTALL_PREFIX=/my/installation/path
$ make
# optinal installation
$ make install
# optional activation of bash completion
# add following line to your .basrc file
# source /my/installation/path/bash_completion_clustering.sh
CUDA
The density-based clustering clustering density can be parallized with CUDA 9 (and OpenMP). Note, CUDA 9.1 (9.2) supports only gcc5 (gcc7)! Therefore, following flags need to be set with cmake
-DUSE_CUDA=1 -DCMAKE_C_COMPILER=/path/to/gcc-5 -DCMAKE_CXX_COMPILER=/path/to/g++-5
Vectorization
The clustering package supports modern vectorization technologies. If you have a modern computer with vectorizing instruction sets (SSE2, SSE4_2, AVX, ...), set the following cmake-option
-DCPU_ACCELERATION=VEC
where VEC is one of: SSE2, SSE4_1, SSE4_2 or AVX. If a unsupported option is set, it will raise an segmentation fault.
Licensing
Citation
The underlying methods are based on the following articles:
- F. Sittel and G. Stock, Robust Density-Based Clustering to Identify Metastable Conformational States of Proteins, J. Chem. Theory Comput., 12, 2426, 2016; DOI: 10.1021/acs.jctc.5b01233
- A. Jain and G. Stock, Hierarchical folding free energy landscape of HP35 revealed by most probable path clustering, J. of Phys. Chem. B, 118, 7750 - 7760, 2014; DOI: 10.1021/jp410398a
- D. Nagel, A. Weber, B. Lickert and G. Stock, Dynamical coring of Markov state models, J. Chem. Phys., 150, 094111, 2019; DOI: 10.1063/1.5081767
We kindly ask you to cite these articles if you use this software package for published works.
License
This project was created by lettis and is currently maintained by
moldyn-nagel.
Copyright (c) 2015-2019, Florian Sittel and Daniel Nagel All rights
reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the
distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.