Skip to content

CLI Usage Guide

MPP is invoked via python -m MPP.run. This guide covers inputs, kernel selection, plot generation, and output interpretation for deterministic workflows.


Prerequisites

MPP must be installed (pip install mpp-lumping). The following inputs are required:

  • A YAML configuration file
  • A microstate trajectory file (plain-text, one integer state per line)
  • A multi-feature trajectory file (plain-text, one row of floats per frame)

Basic Invocation

python -m MPP.run <config.yml> <d> <g> -Z <Z.npy> [-p <plot>] [-o <output>]

Positional arguments:

Argument Description
config.yml YAML configuration file
d Dynamic similarity selector (T, KL, none, or gpcca)
g Feature similarity selector (JS or none)

Options:

Flag Description
-Z <path> Path to save (or load) the Z matrix (.npy). If the file already exists, it is loaded instead of recomputed.
-p <plot> Plot type to generate (see Plot Types)
-o <output> Output file path for the plot or macrostate trajectory
--scale <float> Scaling factor for plot size (default 1)
--n-timescales <N> Number of implied timescales to compute (overrides config value)
--rmsd <path> Compute C-alpha RMSD and write to .npy file
--rmsd-feature <CA\|feature> RMSD variant: CA (default) or feature
-r <N> Draw N random frame indices per macrostate (writes .ndx files)
--get-least-moving-residues <contact_index_file> Write least-varying residues per macrostate to file
--metrics Print all quality metrics to stdout as key=value pairs

YAML Configuration

The config file specifies input paths and lumping parameters. All keys use snake_case.

# example/sample_system/input/config.yml
source: example/sample_system/input

microstate_trajectory: traj
multi_feature_trajectory: feature_traj

lagtime: 20          # lag time in frames
pop_thr: 0.15        # minimum macrostate population (fraction)
q_min: 0.5           # minimum macrostate metastability
frame_length: 0.2    # frame length in ns

contact_threshold: 0.45  # distance threshold to binarise feature (nm)

Required keys: source, microstate_trajectory, multi_feature_trajectory, lagtime, pop_thr, q_min, frame_length.

Optional keys:

Key Description
contact_threshold Feature binarisation threshold (default 0.45)
cluster_file Contact index file for contact plots
contact_index_file Contact pair index file for structural analysis
topology_file PDB topology file for structural analysis
xtc_file XTC trajectory file for structural analysis
xtc_stride Stride for XTC reading
n_timescales Number of implied timescales to compute
helices Helix residue ranges for RMSD annotation
limits Concatenated trajectory lengths (for multiple independent simulations)

Dynamic Similarity Selectors (d)

The d argument selects how microstate similarity is computed during lumping.

d Description
T Transition probability (reference kernel; recommended default)
KL Kullback-Leibler divergence of transition probability rows
none Disable dynamic similarity; use feature kernel only
gpcca Use GPCCA instead of MPP (comparison only)

Feature Similarity Selector (g)

The g argument optionally incorporates geometric information via feature distributions.

g Description
none No feature similarity
JS Jensen-Shannon divergence of feature distributions
reference_count (gpcca mode only) Use macrostate count from the reference T lumping
<int> (gpcca mode only) Use a fixed number of macrostates

Kernel Combinations

d g Description
T none Transition probability only (default/reference)
KL none Kullback-Leibler divergence only
T JS Combined transition probability + feature similarity
KL JS Combined KL divergence + feature similarity
none JS Feature similarity only

Examples

Run lumping with transition probability kernel and save Z matrix:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy

Run with KL divergence kernel:

python -m MPP.run example/sample_system/input/config.yml KL none \
    -Z results/kl/Z.npy

Run with combined transition probability + Jensen-Shannon feature kernel:

python -m MPP.run example/sample_system/input/config.yml T JS \
    -Z results/t_js/Z.npy

Load an existing Z matrix and generate a dendrogram:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy -p dendrogram -o results/t/dendrogram.pdf

Generate a Sankey diagram:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy -p sankey -o results/t/sankey.pdf

Save the macrostate trajectory as a text file:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy -p macrostate_trajectory -o results/t/macrostate_trajectory.txt

Print all quality metrics:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy --metrics

Output format (one metric per line, key=value; comma-separated values for stochastic runs with n>1):

shannon_entropy=0.7440447
davies_bouldin=2.1873796
gmrq=2.6583023
gmrq2=2.3739562
silhouette=0.20912119
calinski_harabasz=6498.0444

Plot Types

Available values for -p:

Plot Description
dendrogram Lumping tree with macrostate boundaries
timescales Implied timescales of micro- and macrostate models
sankey Sankey diagram comparing this lumping to the reference
contacts Contact representation per macrostate
macrotraj Macrostate trajectory as a color-coded time series
ck_test Chapman-Kolmogorov test
rmsd Per-macrostate C-alpha RMSD
delta_rmsd Per-macrostate delta RMSD relative to macrostate 0
state_network Macrostate transition network
transition_matrix Macrostate transition matrix heatmap
transition_time Mean first-passage times between macrostates
macrostate_trajectory Write macrostate trajectory to text file (use -o <file.txt>)

Output Interpretation

Z matrix (Z.npy): Shape (n_runs, n_states-1, 4). Each row encodes one merge step: [state_a, state_b, metastability_a, joint_population]. The Z matrix is in scipy linkage format with n_states + i as the intermediate cluster index. For deterministic runs, n_runs = 1.

Macrostate map (macrostate_map.npy): Integer array of shape (n_states,). Entry i gives the macrostate index assigned to microstate i. Written automatically to the same directory as Z.npy whenever -Z is used.

Macrostate trajectory (text): One integer per line, 0-based macrostate index. Written by -p macrostate_trajectory -o macrostate_trajectory.txt.

RMSD file (.npy): Shape (n_macrostates, n_CA_atoms), C-alpha RMSD values per macrostate.

Stochastic Z matrix: Shape (n_runs, n_states-1, 4) with n_runs > 1. The per-run macrostate assignment is accessed at index [i] on the Lumping object.


Stochastic Lumping

When the stochastic block is present in the YAML config, MPP performs multiple randomised lumping runs and returns a Z matrix of shape (n_runs, n_states-1, 4).

YAML configuration

stochastic:
  method: n       # 'n' = top-N options (or 'p' = probability-mass threshold)
  param: 2        # for 'n': number of candidate target states per merge
  n: 10           # number of independent runs
  seed: 42        # integer seed for reproducible results (optional)
  • method: n + param: 2: at each merge step, the two most-similar candidate states are selected; one is chosen randomly with probability proportional to similarity.
  • method: p + param: 0.5: all states whose cumulative similarity exceeds the threshold are considered.
  • seed: pass any integer to pin numpy.random.default_rng for reproducible stochastic lumpings. Omit for a random seed.

Stochastic-specific plot types

Plot Description
stochastic_state_similarity Overlap of macrostate assignments across runs
relative_implied_timescales Timescales relative to the reference T lumping
macro_feature Mean feature per macrostate across all stochastic runs