CLI Usage Guide¶
MPP is invoked via python -m MPP.run. This guide covers inputs, kernel
selection, plot generation, and output interpretation for deterministic
workflows.
Prerequisites¶
MPP must be installed (pip install mpp-lumping). The
following inputs are required:
- A YAML configuration file
- A microstate trajectory file (plain-text, one integer state per line)
- A multi-feature trajectory file (plain-text, one row of floats per frame)
Basic Invocation¶
Positional arguments:
| Argument | Description |
|---|---|
config.yml |
YAML configuration file |
d |
Dynamic similarity selector (T, KL, none, or gpcca) |
g |
Feature similarity selector (JS or none) |
Options:
| Flag | Description |
|---|---|
-Z <path> |
Path to save (or load) the Z matrix (.npy). If the file already exists, it is loaded instead of recomputed. |
-p <plot> |
Plot type to generate (see Plot Types) |
-o <output> |
Output file path for the plot or macrostate trajectory |
--scale <float> |
Scaling factor for plot size (default 1) |
--n-timescales <N> |
Number of implied timescales to compute (overrides config value) |
--rmsd <path> |
Compute C-alpha RMSD and write to .npy file |
--rmsd-feature <CA\|feature> |
RMSD variant: CA (default) or feature |
-r <N> |
Draw N random frame indices per macrostate (writes .ndx files) |
--get-least-moving-residues <contact_index_file> |
Write least-varying residues per macrostate to file |
--metrics |
Print all quality metrics to stdout as key=value pairs |
YAML Configuration¶
The config file specifies input paths and lumping parameters. All keys use
snake_case.
# example/sample_system/input/config.yml
microstate_trajectory: traj
multi_feature_trajectory: feature_traj
lagtime: 20 # lag time in frames
pop_thr: 0.15 # minimum macrostate population (fraction)
q_min: 0.5 # minimum macrostate metastability
frame_length: 0.2 # frame length in ns
contact_threshold: 0.45 # distance threshold to binarise feature (nm)
Required keys: microstate_trajectory,
multi_feature_trajectory, lagtime, pop_thr, q_min, frame_length.
Optional keys:
| Key | Description |
|---|---|
source |
Root directory for all input file paths. Relative paths are resolved relative to the config file's directory; absolute paths are used as-is. Defaults to the config file's own directory (i.e. place the config next to your data and omit source entirely). Individual file entries can also be absolute paths or relative paths (resolved against source), so files scattered across different locations can be referenced without moving them. |
contact_threshold |
Feature binarisation threshold (default 0.45) |
cluster_file |
Contact index file for contact plots |
contact_index_file |
Contact pair index file for structural analysis |
topology_file |
PDB topology file for structural analysis |
xtc_file |
XTC trajectory file for structural analysis |
xtc_stride |
Stride for XTC reading |
n_timescales |
Number of implied timescales to compute |
helices |
Helix residue ranges for RMSD annotation |
limits |
Concatenated trajectory lengths (for multiple independent simulations) |
Dynamic Similarity Selectors (d)¶
The d argument selects how microstate similarity is computed during lumping.
d |
Description |
|---|---|
T |
Transition probability (reference kernel; recommended default) |
KL |
Kullback-Leibler divergence of transition probability rows |
none |
Disable dynamic similarity; use feature kernel only |
gpcca |
Use GPCCA instead of MPP (comparison only) |
Feature Similarity Selector (g)¶
The g argument optionally incorporates geometric information via feature
distributions.
g |
Description |
|---|---|
none |
No feature similarity |
JS |
Jensen-Shannon divergence of feature distributions |
reference_count |
(gpcca mode only) Use macrostate count from the reference T lumping |
<int> |
(gpcca mode only) Use a fixed number of macrostates |
Kernel Combinations¶
d |
g |
Description |
|---|---|---|
T |
none |
Transition probability only (default/reference) |
KL |
none |
Kullback-Leibler divergence only |
T |
JS |
Combined transition probability + feature similarity |
KL |
JS |
Combined KL divergence + feature similarity |
none |
JS |
Feature similarity only |
Examples¶
Run lumping with transition probability kernel and save Z matrix:
Run with KL divergence kernel:
Run with combined transition probability + Jensen-Shannon feature kernel:
Load an existing Z matrix and generate a dendrogram:
python -m MPP.run example/sample_system/input/config.yml T none \
-Z results/t/Z.npy -p dendrogram -o results/t/dendrogram.pdf
Generate a Sankey diagram:
python -m MPP.run example/sample_system/input/config.yml T none \
-Z results/t/Z.npy -p sankey -o results/t/sankey.pdf
Save the macrostate trajectory as a text file:
python -m MPP.run example/sample_system/input/config.yml T none \
-Z results/t/Z.npy -p macrostate_trajectory -o results/t/macrostate_trajectory.txt
Print all quality metrics:
Output format (one metric per line, key=value; comma-separated values for
stochastic runs with n>1):
shannon_entropy=0.7440447
davies_bouldin=2.1873796
gmrq=2.6583023
gmrq2=2.3739562
silhouette=0.20912119
calinski_harabasz=6498.0444
Plot Types¶
Available values for -p:
| Plot | Description |
|---|---|
dendrogram |
Lumping tree with macrostate boundaries |
timescales |
Implied timescales of micro- and macrostate models |
sankey |
Sankey diagram comparing this lumping to the reference |
contacts |
Contact representation per macrostate |
macrotraj |
Macrostate trajectory as a color-coded time series |
ck_test |
Chapman-Kolmogorov test |
rmsd |
Per-macrostate C-alpha RMSD |
delta_rmsd |
Per-macrostate delta RMSD relative to macrostate 0 |
state_network |
Macrostate transition network |
transition_matrix |
Macrostate transition matrix heatmap |
transition_time |
Mean first-passage times between macrostates |
macrostate_trajectory |
Write macrostate trajectory to text file (use -o <file.txt>) |
Output Interpretation¶
Z matrix (Z.npy): Shape (n_runs, n_states-1, 4). Each row encodes one
merge step: [state_a, state_b, metastability_a, joint_population]. The Z
matrix is in scipy linkage format with n_states + i as the intermediate
cluster index. For deterministic runs, n_runs = 1.
Macrostate map (macrostate_map.npy): Integer array of shape (n_states,).
Entry i gives the macrostate index assigned to microstate i. Written
automatically to the same directory as Z.npy whenever -Z is used.
Macrostate trajectory (text): One integer per line, 0-based macrostate
index. Written by -p macrostate_trajectory -o macrostate_trajectory.txt.
RMSD file (.npy): Shape (n_macrostates, n_CA_atoms), C-alpha RMSD
values per macrostate.
Stochastic Z matrix: Shape (n_runs, n_states-1, 4) with n_runs > 1.
The per-run macrostate assignment is accessed at index [i] on the Lumping object.
Stochastic Lumping¶
When the stochastic block is present in the YAML config, MPP performs multiple
randomised lumping runs and returns a Z matrix of shape (n_runs, n_states-1, 4).
YAML configuration¶
stochastic:
method: n # 'n' = top-N options (or 'p' = probability-mass threshold)
param: 2 # for 'n': number of candidate target states per merge
n: 10 # number of independent runs
seed: 42 # integer seed for reproducible results (optional)
method: n+param: 2: at each merge step, the two most-similar candidate states are selected; one is chosen randomly with probability proportional to similarity.method: p+param: 0.5: all states whose cumulative similarity exceeds the threshold are considered.seed: pass any integer to pinnumpy.random.default_rngfor reproducible stochastic lumpings. Omit for a random seed.
Stochastic-specific plot types¶
| Plot | Description |
|---|---|
stochastic_state_similarity |
Overlap of macrostate assignments across runs |
relative_implied_timescales |
Timescales relative to the reference T lumping |
macro_feature |
Mean feature per macrostate across all stochastic runs |