Tutorial: End-to-End MPP Analysis¶

This tutorial walks through a complete MPP analysis using the sample dataset included in the repository (example/sample_system/). The dataset contains a 100 000-frame microstate trajectory with 6 microstates and a corresponding single-feature trajectory.

Both the CLI and Python API approaches are shown side by side.

Setup¶

Install the package:

pip install mpp-lumping

Clone the repository to access the example data:

git clone https://github.com/moldyn/MPP.git
cd MPP

Step 1 — Inspect the input data¶

The example config file is example/sample_system/input/config.yml:

source: example/sample_system/input

microstate_trajectory: traj
multi_feature_trajectory: feature_traj
contact_threshold: null

frame_length: 0.2   # ns per frame
lagtime: 20         # frames
pop_thr: 0.15       # minimum macrostate population
q_min: 0.5          # minimum macrostate metastability

Python API:

import numpy as np

traj = np.loadtxt("example/sample_system/input/traj", dtype=int)
feat = np.loadtxt("example/sample_system/input/feature_traj", ndmin=2)

print(f"Trajectory: {traj.shape[0]} frames, {len(np.unique(traj))} unique microstates")
print(f"Features:   {feat.shape[1]} feature(s) per frame")
# Trajectory: 100000 frames, 6 unique microstates
# Features:   1 feature(s) per frame

Step 2 — Run the MPP lumping¶

CLI:

mkdir -p results/t
python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy

The -Z results/t/Z.npy flag saves the lumping tree. If the file already exists, it is loaded instead of recomputed.

Python API:

import MPP
import MPP.kernel

kernel = MPP.kernel.LumpingKernel(similarity="T")
mpp = MPP.Lumping(
    traj,
    lagtime=20,
    feature_trajectory=feat,
    pop_thr=0.15,
    q_min=0.5,
    frame_length=0.2,
)
mpp.run_mpp(kernel)

Step 3 — Inspect macrostate results¶

Python API:

n = mpp.n_macrostates[0]
print(f"Number of macrostates: {n}")
# Number of macrostates: 3

# Map from microstate index to macrostate index
print(f"Macrostate map: {mpp.macrostate_map[0]}")

# Population of each macrostate (number of frames)
pop = mpp.macrostate_population[0]
print(f"Populations: {pop}")
print(f"Fractions:   {pop / pop.sum():.3f}")

Step 4 — Quality metrics¶

CLI:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy --metrics
# shannon_entropy=0.99240...
# davies_bouldin=0.43912...
# gmrq=...
# gmrq2=...
# silhouette=...
# calinski_harabasz=...

Python API:

print(f"Shannon entropy:    {mpp.shannon_entropy[0]:.4f}")   # 0.9924
print(f"Davies-Bouldin:     {mpp.davies_bouldin_index[0]:.4f}")  # 0.4391
print(f"GMRQ:               {mpp.gmrq[0]:.4f}")
print(f"Silhouette:         {mpp.silhouette[0]:.4f}")
print(f"Calinski-Harabász:  {mpp.calinski_harabasz[0]:.1f}")

Step 5 — Generate plots¶

CLI:

# Lumping dendrogram
python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy -p dendrogram -o results/t/dendrogram.pdf

# Macrostate trajectory (color-coded time series)
python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy -p macrotraj -o results/t/macrotraj.pdf

# Chapman-Kolmogorov test
python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy -p ck_test -o results/t/ck_test.pdf

Python API:

mpp.plot.dendrogram("results/t/dendrogram.pdf")
mpp.plot.macrostate_trajectory("results/t/macrotraj.pdf")
mpp.plot.ck_test("results/t/ck_test.pdf")
mpp.plot.state_network("results/t/state_network.pdf")
mpp.plot.transition_matrix("results/t/transition_matrix.pdf")

Step 6 — Save the macrostate trajectory¶

CLI:

python -m MPP.run example/sample_system/input/config.yml T none \
    -Z results/t/Z.npy \
    -p macrostate_trajectory -o results/t/macrostate_trajectory.txt

Python API:

mpp.save_macrostate_trajectory("results/t/macrostate_trajectory.txt", one_based=True)

The output file contains one macrostate index per line (1-based).

Step 7 — Try other kernels¶

Replace T none with any supported kernel combination to compare results:

# Kullback-Leibler divergence
python -m MPP.run example/sample_system/input/config.yml KL none \
    -Z results/kl/Z.npy --metrics

# Combined T + Jensen-Shannon feature similarity
python -m MPP.run example/sample_system/input/config.yml T JS \
    -Z results/t_js/Z.npy --metrics

See the CLI Usage guide for the full list of kernel combinations.

Step 8 — Config-based Python workflow¶

For production workflows, MPP.run.Data reads the YAML config and orchestrates the full pipeline, matching the CLI behaviour exactly:

from MPP.run import Data

data = Data("example/sample_system/input/config.yml")
data.setup_mpp("T", "none")
data.perform_mpp("results/t/Z.npy")   # loads if file exists

mpp = data.mpp
print(f"Macrostates: {mpp.n_macrostates[0]}")
mpp.plot.dendrogram("results/t/dendrogram.pdf")

Next steps¶

Read the CLI Usage guide for the full argument reference.
Read the Python API guide for advanced usage (stochastic lumping, quality metrics, RMSD, concatenated trajectories).
Explore all plot types with CLI and API examples.