Get Started

This software package provides highly efficient tools for robust and stable clustering of molecular dynamics (MD) trajectories including boarder correction methods. With this package one can obtain well-defined microstates from low-dimensional (\(d\lesssim10\)) input coordinates. This reduction can be achieved eg. via principal component analysis (PCA) using either interatom distances or backbone dihedral angles as input coordinates.

The shell commands invoking the clustering program are written in multiple lines to emphasize the single parameters and options given to the program. This is, however, only for better readability in this tutorial. All commands should be executed in a single line, without the additional line breaks.

Advantages

The main advantage of our density-based clustering algorithm, Sittel et al., 2016, over the widely used k-means is that it cuts by design the microstates at the energy barriers. Therewith, we ensure that there are no intrastate barriers which leads to an optimal assignment in the sense of Markovianity. Hence, it is typically sufficient to split the MD data in approx. 50-100 states. For achieving similar Markovianity, k-means may need several thousands of states where, even for \(10^7\) frames, most transitions are poorly sampled.

Scaling asymptotically with \(N \log{N}\) and applying GPU acceleration, the algorithm manages to cluster \(\ge10^7\) points in six dimensions in a couple of hours on a desktop computer.

Installation

Requirements

  • BOOST >= 1.49
  • cmake >= 2.8
  • a recent GCC compiler (e.g. GNU g++ >= 4.9, must support C++11 standard)
  • CUDA >= 9.1 (optional)

Compilation

Both vectorization and parallelization are supported. The cmake-flags are listed below the code block.

              
                # download the package and enter directory
                $ git clone https://github.com/moldyn/Clustering.git
                $ cd Clustering

                # generate and enter build directory
                $ mkdir build
                $ cd build

                # setup compilation and compile
                $ cmake .. -DCMAKE_INSTALL_PREFIX=/my/installation/path
                $ make

                # optinal installation
                $ make install
              
            

CUDA

The density-based clustering clustering density can be parallized with CUDA 9 (and OpenMP). Note, CUDA 9.1 (9.2) supports only gcc5 (gcc7)! Therefore, following flags need to be set with cmake

              
                -DUSE_CUDA=1 -DCMAKE_C_COMPILER=/path/to/gcc-5 -DCMAKE_CXX_COMPILER=/path/to/g++-5
              
            

Vectorization

The clustering package supports modern vectorization technologies. If you have a modern computer with vectorizing instruction sets (SSE2, SSE4_2, AVX, ...), set the following cmake-option

              
                -DCPU_ACCELERATION=VEC
              
            

where VEC is one of: SSE2, SSE4_1, SSE4_2 or AVX. If a unsupported option is set, it will raise an segmentation fault.

Licensing

Citation

The underlying methods are based on the following articles:

  • F. Sittel and G. Stock, Robust Density-Based Clustering to Identify Metastable Conformational States of Proteins, J. Chem. Theory Comput., 12, 2426, 2016; DOI: 10.1021/acs.jctc.5b01233
  • A. Jain and G. Stock, Hierarchical folding free energy landscape of HP35 revealed by most probable path clustering, J. of Phys. Chem. B, 118, 7750 - 7760, 2014; DOI: 10.1021/jp410398a
  • D. Nagel, A. Weber, B. Lickert and G. Stock, Dynamical coring of Markov state models, J. Chem. Phys., 150, 094111, 2019; DOI: 10.1063/1.5081767

We kindly ask you to cite these articles if you use this software package for published works.

License

This project was created by lettis and is currently maintained by moldyn-nagel.
Copyright (c) 2015-2019, Florian Sittel and Daniel Nagel All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.