CLI Usage Guide¶

MPP is invoked via python -m MPP.run. This guide covers inputs, kernel selection, plot generation, and output interpretation for deterministic workflows.

Prerequisites¶

MPP must be installed (pip install mpp-lumping). The following inputs are required:

A YAML configuration file
A microstate trajectory file (plain-text, one integer state per line)
A multi-feature trajectory file (plain-text, one row of floats per frame)

Basic Invocation¶

python -m MPP.run <config.yml> <d> <g> -Z <Z.npy> [-p <plot>] [-o <output>]

Positional arguments:

Argument	Description
`config.yml`	YAML configuration file
`d`	Dynamic similarity selector (`T`, `KL`, `none`, or `gpcca`)
`g`	Feature similarity selector (`JS` or `none`)

Options:

Flag	Description
`-Z <path>`	Path to save (or load) the Z matrix (`.npy`). If the file already exists, it is loaded instead of recomputed.
`-p <plot>`	Plot type to generate (see Plot Types)
`-o <output>`	Output file path for the plot or macrostate trajectory
`--scale <float>`	Scaling factor for plot size (default `1`)
`--n-timescales <N>`	Number of implied timescales to compute (overrides config value)
`--rmsd <path>`	Compute C-alpha RMSD and write to `.npy` file
`--rmsd-feature <CA\\|feature>`	RMSD variant: `CA` (default) or `feature`
`-r <N>`	Draw N random frame indices per macrostate (writes `.ndx` files)
`--get-least-moving-residues <contact_index_file>`	Write least-varying residues per macrostate to file
`--metrics`	Print all quality metrics to stdout as `key=value` pairs

YAML Configuration¶

The config file specifies input paths and lumping parameters. All keys use snake_case.

# example/sample_system/input/config.yml
microstate_trajectory: traj
multi_feature_trajectory: feature_traj

lagtime: 20          # lag time in frames
pop_thr: 0.15        # minimum macrostate population (fraction)
q_min: 0.5           # minimum macrostate metastability
frame_length: 0.2    # frame length in ns

contact_threshold: 0.45  # distance threshold to binarise feature (nm)

Required keys: microstate_trajectory, multi_feature_trajectory, lagtime, pop_thr, q_min, frame_length.

Optional keys:

Key	Description
`source`	Root directory for all input file paths. Relative paths are resolved relative to the config file's directory; absolute paths are used as-is. Defaults to the config file's own directory (i.e. place the config next to your data and omit `source` entirely). Individual file entries can also be absolute paths or relative paths (resolved against `source`), so files scattered across different locations can be referenced without moving them.
`contact_threshold`	Feature binarisation threshold (default `0.45`)
`cluster_file`	Contact index file for contact plots
`contact_index_file`	Contact pair index file for structural analysis
`topology_file`	PDB topology file for structural analysis
`xtc_file`	XTC trajectory file for structural analysis
`xtc_stride`	Stride for XTC reading
`n_timescales`	Number of implied timescales to compute
`helices`	Helix residue ranges for RMSD annotation
`limits`	Concatenated trajectory lengths (for multiple independent simulations)

Dynamic Similarity Selectors (`d`)¶

The d argument selects how microstate similarity is computed during lumping.

`d`	Description
`T`	Transition probability (reference kernel; recommended default)
`KL`	Kullback-Leibler divergence of transition probability rows
`none`	Disable dynamic similarity; use feature kernel only
`gpcca`	Use GPCCA instead of MPP (comparison only)

Feature Similarity Selector (`g`)¶

The g argument optionally incorporates geometric information via feature distributions.

`g`	Description
`none`	No feature similarity
`JS`	Jensen-Shannon divergence of feature distributions
`reference_count`	(gpcca mode only) Use macrostate count from the reference `T` lumping
`<int>`	(gpcca mode only) Use a fixed number of macrostates

Kernel Combinations¶

`d`	`g`	Description
`T`	`none`	Transition probability only (default/reference)
`KL`	`none`	Kullback-Leibler divergence only
`T`	`JS`	Combined transition probability + feature similarity
`KL`	`JS`	Combined KL divergence + feature similarity
`none`	`JS`	Feature similarity only

Examples¶

Run lumping with transition probability kernel and save Z matrix:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy

Run with KL divergence kernel:

python -m MPP.run example/sample_system/input/config.yml KL none \
    -Z results/kl/Z.npy

Run with combined transition probability + Jensen-Shannon feature kernel:

python -m MPP.run example/sample_system/input/config.yml T JS \
    -Z results/t_js/Z.npy

Load an existing Z matrix and generate a dendrogram:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy -p dendrogram -o results/t/dendrogram.pdf

Generate a Sankey diagram:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy -p sankey -o results/t/sankey.pdf

Save the macrostate trajectory as a text file:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy -p macrostate_trajectory -o results/t/macrostate_trajectory.txt

Print all quality metrics:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy --metrics

Output format (one metric per line, key=value; comma-separated values for stochastic runs with n>1):

shannon_entropy=0.7440447
davies_bouldin=2.1873796
gmrq=2.6583023
gmrq2=2.3739562
silhouette=0.20912119
calinski_harabasz=6498.0444

Plot Types¶

Available values for -p:

Plot	Description
`dendrogram`	Lumping tree with macrostate boundaries
`timescales`	Implied timescales of micro- and macrostate models
`sankey`	Sankey diagram comparing this lumping to the reference
`contacts`	Contact representation per macrostate
`macrotraj`	Macrostate trajectory as a color-coded time series
`ck_test`	Chapman-Kolmogorov test
`rmsd`	Per-macrostate C-alpha RMSD
`delta_rmsd`	Per-macrostate delta RMSD relative to macrostate 0
`state_network`	Macrostate transition network
`transition_matrix`	Macrostate transition matrix heatmap
`transition_time`	Mean first-passage times between macrostates
`macrostate_trajectory`	Write macrostate trajectory to text file (use `-o <file.txt>`)

Output Interpretation¶

Z matrix (Z.npy): Shape (n_runs, n_states-1, 4). Each row encodes one merge step: [state_a, state_b, metastability_a, joint_population]. The Z matrix is in scipy linkage format with n_states + i as the intermediate cluster index. For deterministic runs, n_runs = 1.

Macrostate map (macrostate_map.npy): Integer array of shape (n_states,). Entry i gives the macrostate index assigned to microstate i. Written automatically to the same directory as Z.npy whenever -Z is used.

Macrostate trajectory (text): One integer per line, 0-based macrostate index. Written by -p macrostate_trajectory -o macrostate_trajectory.txt.

RMSD file (.npy): Shape (n_macrostates, n_CA_atoms), C-alpha RMSD values per macrostate.

Stochastic Z matrix: Shape (n_runs, n_states-1, 4) with n_runs > 1. The per-run macrostate assignment is accessed at index [i] on the Lumping object.

Stochastic Lumping¶

When the stochastic block is present in the YAML config, MPP performs multiple randomised lumping runs and returns a Z matrix of shape (n_runs, n_states-1, 4).

YAML configuration¶

stochastic:
  method: n       # 'n' = top-N options (or 'p' = probability-mass threshold)
  param: 2        # for 'n': number of candidate target states per merge
  n: 10           # number of independent runs
  seed: 42        # integer seed for reproducible results (optional)

method: n + param: 2: at each merge step, the two most-similar candidate states are selected; one is chosen randomly with probability proportional to similarity.
method: p + param: 0.5: all states whose cumulative similarity exceeds the threshold are considered.
seed: pass any integer to pin numpy.random.default_rng for reproducible stochastic lumpings. Omit for a random seed.

Stochastic-specific plot types¶

Plot	Description
`stochastic_state_similarity`	Overlap of macrostate assignments across runs
`relative_implied_timescales`	Timescales relative to the reference T lumping
`macro_feature`	Mean feature per macrostate across all stochastic runs