Python API Usage Guide¶
This guide covers practical usage of the MPP.Lumping class and supporting
classes for deterministic lumping workflows.
Installation¶
Quick Start¶
import numpy as np
import MPP
# Load input data
trajectory = np.loadtxt("example/sample_system/input/traj", dtype=np.uint16)
feature_trajectory = np.loadtxt("example/sample_system/input/feature_traj", ndmin=2)
# Create Lumping object
mpp = MPP.Lumping(
trajectory,
lagtime=20,
feature_trajectory=feature_trajectory,
pop_thr=0.15,
q_min=0.5,
)
# Run MPP with transition probability kernel (default)
kernel = MPP.kernel.LumpingKernel(similarity="T")
mpp.run_mpp(kernel)
# Access results
print(f"Number of macrostates: {mpp.n_macrostates[0]}")
print(f"Macrostate assignment shape: {mpp.macrostate_assignment[0].shape}")
MPP.Lumping¶
The central class. Holds the microstate trajectory, transition matrix, feature data, and all results.
Constructor¶
MPP.Lumping(
trajectory, # ndarray of int, shape (N,) — microstate trajectory
lagtime, # int — lag time in frames
feature_trajectory=None, # ndarray of float, shape (N, M) — optional
contact_threshold=0.45, # float — feature binarisation threshold (nm)
pop_thr=0.005, # float — minimum macrostate population
q_min=0.5, # float — minimum macrostate metastability
frame_length=0.2, # float — frame length in ns
limits=None, # list of int — trajectory lengths for concatenated runs
quiet=False, # bool — suppress progress output
)
The trajectory must be 0-based and contiguous. 1-based trajectories are
shifted automatically with a warning.
run_mpp¶
Runs the MPP algorithm and populates all macrostate attributes.
mpp.run_mpp(
kernel=MPP.kernel.LumpingKernel(), # LumpingKernel instance
feature_kernel=None, # FeatureKernel or None
n=1, # int — number of runs (use 1 for deterministic runs)
)
load_Z / save_Z¶
load_Z accepts either a path string or a NumPy array directly.
assign_macrostates¶
Re-parse the lumping tree without re-running the algorithm. Useful after
manually changing pop_thr or q_min.
Lumping Kernels¶
MPP.kernel.LumpingKernel¶
Determines which microstate is merged and with which neighbour.
# Transition probability (default, recommended)
kernel = MPP.kernel.LumpingKernel(similarity="T")
# Kullback-Leibler divergence
kernel = MPP.kernel.LumpingKernel(similarity="KL")
# Feature-only (use with FeatureKernel)
kernel = MPP.kernel.LumpingKernel(similarity="none")
MPP.kernel.FeatureKernel¶
Incorporates geometric similarity via Jensen-Shannon divergence of feature
distributions. Pass alongside LumpingKernel to run_mpp.
feature_kernel = MPP.kernel.FeatureKernel(
feature_trajectory, # binary feature trajectory, shape (N, M)
trajectory, # microstate trajectory, shape (N,)
)
mpp.run_mpp(kernel, feature_kernel=feature_kernel)
Kernel Combinations¶
LumpingKernel(similarity=...) |
feature_kernel |
Equivalent CLI |
|---|---|---|
"T" |
None |
T none |
"KL" |
None |
KL none |
"T" |
FeatureKernel(...) |
T JS |
"KL" |
FeatureKernel(...) |
KL JS |
"none" |
FeatureKernel(...) |
none JS |
Accessing Results¶
After calling run_mpp or load_Z, the following attributes are populated.
For deterministic runs, index [0] selects the single run.
# Number of macrostates
n = mpp.n_macrostates[0]
# Macrostate assignment: bool array, shape (n_macrostates, n_states)
assignment = mpp.macrostate_assignment[0]
# Map from microstate index to macrostate index, shape (n_states,)
macro_map = mpp.macrostate_map[0]
# Macrostate trajectory, shape (n_runs, n_frames)
macrotraj = mpp.macrostate_trajectory[0]
# Macrostate transition matrix, shape (n_macrostates, n_macrostates)
macrotmat = mpp.macrostate_tmat[0]
# Macrostate populations (in frames), shape (n_macrostates,)
pop = mpp.macrostate_population[0]
# Mean feature value per macrostate, list of float
feature = mpp.macrostate_feature[0]
# Z matrix, shape (n_runs, n_states-1, 4)
Z = mpp.Z
Config-Based Workflow¶
For production use, MPP.run.Data reads a YAML config file and orchestrates
the full workflow.
from MPP.run import Data
data = Data("example/sample_system/input/config.yml")
data.setup_mpp("T", "none") # d="T", g="none"
data.perform_mpp("results/t/Z.npy") # run or load Z
mpp = data.mpp
print(f"Macrostates: {mpp.n_macrostates[0]}")
perform_mpp loads an existing Z matrix if the file is already present;
pass overwrite=True to force recomputation.
Generating Plots¶
Plots are accessed via mpp.plot, an instance of Lumping.Plotter.
mpp.plot.dendrogram("results/t/dendrogram.pdf")
mpp.plot.implied_timescales("results/t/timescales.pdf")
mpp.plot.sankey("results/t/sankey.pdf")
mpp.plot.ck_test("results/t/ck_test.pdf")
mpp.plot.macrostate_trajectory("results/t/macrotraj.pdf")
mpp.plot.state_network("results/t/state_network.pdf")
mpp.plot.transition_matrix("results/t/transition_matrix.pdf")
mpp.plot.transition_time("results/t/transition_time.pdf")
Contact and RMSD plots require additional files:
# Contact representation (requires cluster_file)
mpp.plot.contact_rep("path/to/cluster_file", "results/t/contacts.pdf")
# RMSD plots (requires topology and XTC files)
mpp.topology_file = "path/to/structure.pdb"
mpp.xtc_trajectory_file = "path/to/trajectory.xtc"
mpp.plot.rmsd("results/t/rmsd.pdf")
mpp.plot.delta_rmsd("results/t/delta_rmsd.pdf")
Quality Metrics¶
All metrics are lazy-computed properties, cached on first access, and return an
ndarray of shape (n_runs,). For a single deterministic run, index [0] to
get a scalar.
Print all metrics at once from the CLI with:
Implied Timescales¶
# Compute implied timescales (shape: n_runs × n_timescales)
ts = mpp.timescales
# Or compute a specific number:
mpp.calc_timescales(n=5)
ts = mpp.timescales # shape (n_runs, 5)
Timescales are in nanoseconds when frame_length is provided in ns.
Shannon Entropy¶
Measures how evenly frames are distributed across macrostates. 0 = single dominant macrostate, 1 = perfectly uniform distribution.
Davies-Bouldin Index¶
Ratio of within-cluster scatter to between-cluster separation. Lower values
indicate better-separated macrostates. Requires multi_feature_trajectory.
GMRQ and GMRQ2¶
The Generalized Matrix Rayleigh Quotient (GMRQ) is the sum of the 2nd–4th largest eigenvalues of the macrostate transition matrix — higher values indicate better-preserved slow dynamics. GMRQ2 uses the sum of squares of those eigenvalues.
RMSD Sharpness¶
Population-weighted mean of per-macrostate mean RMSDs. Lower values indicate
more compact macrostates. Requires RMSD data (call mpp.rmsd or load via
mpp.load_rmsd(path) first).
Silhouette Coefficient¶
Measures how similar each frame is to its own macrostate compared to neighbouring
macrostates. Values near +1 indicate well-separated, compact macrostates; values
near −1 indicate misclassified frames. Requires multi_feature_trajectory and
at least 2 macrostates.
Calinski–Harabász Index¶
Ratio of between-macrostate dispersion to within-macrostate dispersion. Higher
values indicate more compact, well-separated macrostates. Requires
multi_feature_trajectory and at least 2 macrostates.
Saving the Macrostate Trajectory¶
The output is a plain-text file with one integer per line.
Stochastic Lumping¶
Pass n > 1 to run_mpp to perform multiple randomised lumping runs. Use a
seed in the LumpingKernel for reproducible results.
kernel = MPP.kernel.LumpingKernel(similarity="T", method="n", param=2, seed=42)
mpp.run_mpp(kernel, n=10)
# Access results per run — shape (n_runs, ...)
for i in range(mpp.n_runs):
print(f"Run {i}: {mpp.n_macrostates[i]} macrostates")
# Stochastic-specific plots
mpp.plot.stochastic_state_similarity("results/stoch/state_similarity.pdf")
mpp.plot.relative_implied_timescales("results/stoch/rel_timescales.pdf")
mpp.plot.macro_feature("results/stoch/macro_feature.pdf")
When using the config-based workflow (MPP.run.Data), stochastic parameters
are set in the YAML stochastic block (see CLI guide for details). The seed
key is optional; omit it for a random seed.
Concatenated Trajectories¶
When the microstate trajectory is composed of several independent simulations,
pass limits as the list of individual trajectory lengths: