Dynamical Clustering via most probable path algorithm


This method was proposed in Jain et al., 2014. It lumps microstates into macrostates by their dynamical connectivity.


clustering mpp -s state_file
               -D free_energy_file
               -l lagtime
               --qmin-from FROM
               --qmin-to TO
               --qmin-step STEP
               --concat-nframes NFRAMES
               --concat-limits limit_file
               --tprob transition_matrix
               -o output_basename
               -n N


Input Parameters

Parameter Description
\(\mathtt{\mbox{-}s :}\) The name(path) of the clusterd state trajectory file.
\(\mathtt{\mbox{-}D :}\) Filename to read the free energies from.
\(\mathtt{\mbox{-}l :}\) Lagtime \(\tau_\text{mpp}\) to be used for generating a transition matrix [in frames].
\(\mathtt{\mbox{--}qmin\mbox{-}from :}\) Initial value of metastability \(Q_\text{min}\). (Default: \(0.01\)).
\(\mathtt{\mbox{--}qmin\mbox{-}to :}\) Final value of metastability \(Q_\text{min}\). (Default: \(1.00\)).
\(\mathtt{\mbox{--}qmin\mbox{-}step :}\) Stepwidth, starting from \(\mathtt{\mbox{--}qmin\mbox{-}from}\). (Default: \(0.01\)).
\(\mathtt{\mbox{--}concat\mbox{-}nframes:}\) The number of frames per (equally sized) sub-trajectories for concatenated trajectory files.
\(\mathtt{\mbox{--}concat\mbox{-}limits:}\) The name(path) to the limit file. It should be a single column file with the length of each trajectory. for a concatenated trajectory of three chunks of sizes 100, 50 and 300 frames: '100 50 300'.
\(\mathtt{\mbox{--}tprob:}\) Initial transition probability matrix. Format:three space-separated columns


Output Parameters

Parameter Description
\(\mathtt{\mbox{-}o}\) Basename for the output files. (Default: \(\mathtt{mpp}\))

Miscellaneous Parameters

Parameter Description
\(\mathtt{\mbox{-}n}\) The number of parallel threads to use (for SMP machines). This is ignored if CUDA is used.
\(\mathtt{\mbox{-}v}\) Verbose mode with some output.

Detailed Description

To get a dynamical description of the state space, the \(\mathtt{MPP}\) method can be used. The geometric microstates are lumped by their respective transition probabilities. This way, we identify the dynamically more stable from the less stable states. The most important parameter for MPP is the lagtime \(\tau_\text{mpp}\), which is - for the clustering program - always given in numbers of frames. The lagtime \(\tau_\text{mpp}\) in units of time is trivially given by its value in numbers of frames multiplied by the time step of the underlying simulation.

The lagtime \(\tau_\text{mpp}\) is the amount of time (or number of frames) that is skipped when calculating transition probabilities from one state to another. In effect, timescales below the given lagtime are discarded and the dynamical clustering will only be able to describe processes of length higher then the given lagtime. It acts as a control parameter to blend out (uninteresting) processes on too short timescales and focus on processes of essential motion, which typically happen at longer timescales as the simulation stepping.

Additionally, at a high enough lagtime the system will be approximately markovian, resulting in a discrete set of states well described by markovian dynamics.

The MPP run generates lots of new files, per default called \(\mathtt{mpp\_pop\_^*}\) and \(\mathtt{mpp\_traj\_^*}\). The star stands for the metastability (\(Q_\text{min}\)-value) of the run. The \(\mathtt{{}^*pop^*}\)-files hold the population information of the clusters for the given metastibility, while the \(\mathtt{{}^*traj^*}\)-files are the resulting cluster trajectories.

The metastability value controls, how stable a state has to be to remain as single state. All states with a stability less then the given \(Q_\text{min}\)-value will be lumped according to their most probable state-path.