Quality Metrics¶
All metrics are lazy-computed properties of MPP.Lumping, cached on first access,
and return an ndarray of shape (n_runs,). For a single deterministic run,
index [0] to get a scalar.
Print all metrics at once from the CLI with:
Implied Timescales¶
# Compute implied timescales (shape: n_runs × n_timescales)
ts = mpp.timescales
# Or compute a specific number:
mpp.calc_timescales(ntimescales=5)
ts = mpp.timescales # shape (n_runs, 5)
The \(k\)-th implied timescale of a Markov state model is derived from the \(k\)-th largest eigenvalue of the transition matrix \(\mathbf{T}(\tau)\) at lag time \(\tau\):
where \(\lambda_1 = 1\) is the stationary eigenvalue (excluded) and
\(\lambda_2 \geq \lambda_3 \geq \cdots\) are the remaining eigenvalues sorted
in descending order. Larger timescales correspond to slower dynamical processes.
The values returned by mpp.timescales are in frames; multiply by
frame_length (ns per frame) to obtain physical units.
Shannon Entropy¶
Normalized Shannon entropy of the macrostate population distribution:
where \(p_j\) is the population fraction of macrostate \(j\) and \(K\) is the number of macrostates. \(H = 0\) when all frames belong to a single macrostate; \(H = 1\) when all macrostates are equally populated.
Reference: Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27(3), 379–423. DOI: 10.1002/j.1538-7305.1948.tb01338.x
Davies-Bouldin Index¶
Ratio of within-cluster scatter to between-cluster separation:
where \(s_i\) is the mean feature distance from frames in macrostate \(i\) to
their centroid \(c_i\), and \(d(c_i, c_j)\) is the distance between centroids.
Lower values indicate better-separated macrostates. Requires
multi_feature_trajectory.
Reference: Davies, D. L. & Bouldin, D. W. (1979). A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell., PAMI-1(2), 224–227. DOI: 10.1109/TPAMI.1979.4766909
GMRQ and GMRQ2¶
The Generalized Matrix Rayleigh Quotient (GMRQ) is the sum of the 2nd through 4th largest eigenvalues of the macrostate transition matrix:
where \(\lambda_1 \geq \lambda_2 \geq \cdots\) are the eigenvalues sorted in descending order. Higher values indicate that more slow dynamical modes are preserved in the lumping. GMRQ2 uses the sum of squares:
Reference: McGibbon, R. T. & Pande, V. S. (2015). Variational cross-validation of slow dynamical modes in molecular kinetics. J. Chem. Phys., 142(12), 124105. DOI: 10.1063/1.4916292
RMSD Sharpness¶
Population-weighted mean of per-macrostate mean C\(\alpha\) RMSDs:
where \(\langle\text{RMSD}\rangle_j\) is the mean C\(\alpha\) RMSD of all frames
in macrostate \(j\) relative to the macrostate mean structure, and \(p_j\) is its
population in frames. Lower values indicate more structurally compact macrostates.
Requires RMSD data (access mpp.rmsd or load via mpp.load_rmsd(path) first).
Silhouette Coefficient¶
For each frame \(i\), the silhouette value is:
where \(a(i)\) is the mean feature distance to all other frames in the same
macrostate, and \(b(i)\) is the mean feature distance to frames in the nearest
other macrostate. The reported metric is the mean over all frames. Values near
+1 indicate well-separated, compact macrostates; values near −1 indicate
misclassified frames. Requires multi_feature_trajectory and at least 2
macrostates.
Reference: Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53–65. DOI: 10.1016/0377-0427(87)90125-7
Calinski–Harabász Index¶
Ratio of between-macrostate dispersion to within-macrostate dispersion, normalised by degrees of freedom:
where \(\mathrm{SS}_B\) is the between-cluster sum of squared distances to the
global centroid, \(\mathrm{SS}_W\) is the within-cluster sum of squared distances
to each macrostate centroid, \(N\) is the total number of frames, and \(K\) is
the number of macrostates. Higher values indicate more compact, well-separated
macrostates. Requires multi_feature_trajectory and at least 2 macrostates.
Reference: Calinski, T. & Harabasz, J. (1974). A dendrite method for cluster analysis. Commun. Stat., 3(1), 1–27. DOI: 10.1080/03610927408827101