msm
Markov State Modeling¶
This submodule contains methods related to Markov state modeling, a powerful technique for analyzing complex systems. It provides a set of functions for constructing and analyzing Markov models, including methods for calculating transition probabilities and estimating various time scales.
The submodule is structured into the following submodules:
- msm: This submodule contains all methods related to estimate the Markov state model.
- tests: This submodule holds methods for validating Markov state models.
- timescales: This submodule contains methods for estimating various timescales based on a Markov model.
- utils: This submodule provides some useful linear algebra methods.
chapman_kolmogorov_test(trajs, lagtimes, tmax)
¶
Calculate the Chapman-Kolmogorov equation.
This method evaluates both sides of the Chapman-Kolmogorov equation
So to compare the transition probability estimated based on the lag time \(n\tau\) (referred as "MD") with the transition probability estimated based on the lag time \(\tau\) and propagated \(n\) times (referred as "MSM"), we can use the Chapman-Kolmogorov test. If the model is Markovian, both sides are identical, and the deviation indicates how Markovian the model is. The Chapman-Kolmogorov test is commonly projected onto the diagonal (so limiting to \(T_{ii}\)). For more details, see the review by Prinz et al. 1.
The returned dictionary can be visualized using msmhelper.plot.plot_ck_test. An example can be found in the tutorial.
-
Prinz et al., Markov models of molecular kinetics: Generation and validation, J. Chem. Phys., 134, 174105 (2011), doi:10.1063/1.3565032 ↩
Parameters:
-
trajs
(StateTraj or list or ndarray or list of ndarray
) –State trajectory/trajectories. The states should start from zero and need to be integers.
-
lagtimes
(list or ndarray int
) –Lagtimes for estimating the markov model given in [frames].
-
tmax
(int
) –Longest time to evaluate the CK equation given in [frames].
Returns:
-
cktest
(dict
) –Dictionary holding for each lagtime the CK equation and with 'md' the reference.
Source code in src/msmhelper/msm/tests.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
estimate_markov_model(trajs, lagtime)
¶
Estimates Markov State Model.
This method estimates the MSM based on the transition count matrix.
Parameters:
-
trajs
(StateTraj or list or ndarray or list of ndarray
) –State trajectory/trajectories used to estimate the MSM.
-
lagtime
(int
) –Lag time for estimating the markov model given in [frames].
Returns:
-
T
(ndarray
) –Transition probability matrix \(T_{ij}\), containing the transition probability transition from state \(i o j\).
-
states
(ndarray
) –Array holding states corresponding to the columns of \(T_{ij}\).
Source code in src/msmhelper/msm/msm.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
equilibrium_population(tmat, allow_non_ergodic=True)
¶
Calculate equilibirum population.
If there are non ergodic states, their population is set to zero.
Parameters:
-
tmat
(ndarray
) –Quadratic transition matrix, needs to be ergodic.
-
allow_non_ergodic
(bool
, default:True
) –If True only the largest ergodic subset will be used. Otherwise it will throw an error if not ergodic.
Returns:
-
peq
(ndarray
) –Equilibrium population of input matrix.
Source code in src/msmhelper/msm/msm.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
row_normalize_matrix(mat)
¶
Row normalize the given 2d matrix.
Parameters:
-
mat
(ndarray
) –Matrix to be row normalized.
Returns:
-
mat
(ndarray
) –Normalized matrix.
Source code in src/msmhelper/msm/msm.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
implied_timescales(trajs, lagtimes, ntimescales=None, reversible=False)
¶
Calculate the implied timescales.
Calculate the implied timescales, which are defined by
the \(i\)-th eigenvalue \(\lambda_i\).
Note
It is not checked if for higher lagtimes the dimensionality changes.
Parameters:
-
trajs
(StateTraj or list or ndarray or list of ndarray
) –State trajectory/trajectories. The states should start from zero and need to be integers.
-
lagtimes
(list or ndarray int
) –Lagtimes for estimating the markov model given in [frames]. This is not implemented yet!
-
ntimescales
(int
, default:None
) –Number of returned lagtimes.
-
reversible
(bool
, default:False
) –If reversibility should be enforced for the markov state model.
Returns:
-
ts
(ndarray
) –Matrix containing the implied Timescales.
Source code in src/msmhelper/msm/timescales.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
estimate_waiting_times(*, trajs, lagtime, start, final, steps, return_list=False)
¶
Estimates waiting times between stated states.
The stated states (from/to) will be treated as a basin. The function calculates all transitions from first entering the start-basin until first reaching the final-basin.
Parameters:
-
trajs
(statetraj or list or ndarray or list of ndarray
) –State trajectory/trajectories. The states should start from zero and need to be integers.
-
lagtime
(int
) –Lag time for estimating the markov model given in [frames].
-
start
(int or list of
) –States to start counting.
-
final
(int or list of
) –States to start counting.
-
steps
(int
) –Number of MCMC propagation steps of MCMC run.
-
return_list
(bool
, default:False
) –If true a list of all events is returned, else the probability density together with the edges is returned.
Returns:
-
ts
(ndarray
) –Density probability of the time distribution. If
return_list=True
, return a sorted (!) list containing all times. -
edges
(ndarray
) –Array containing the edges corresponding to the probability, given in frames. Only for
return_list=False
.
Source code in src/msmhelper/msm/timescales.py
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
|
estimate_waiting_time_dist(trajs, max_lagtime, start, final, steps, n_lagtimes=50)
¶
Estimate waiting time distribution.
Parameters:
-
trajs
(statetraj or list or ndarray or list of ndarray
) –State trajectory/trajectories. The states should start from zero and need to be integers.
-
max_lagtime
(int
) –Maximal lag time for estimating the markov model given in [frames].
-
start
(int or list of
) –States to start counting.
-
final
(int or list of
) –States to start counting.
-
steps
(int
) –Number of MCMC propagation steps of MCMC run.
Returns:
-
wtd
(dict
) –Dictionary containing waiting time distribution.
Source code in src/msmhelper/msm/timescales.py
556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 |
|
estimate_paths(*, trajs, lagtime, start, final, steps)
¶
Estimates paths and waiting times between stated states.
The stated states (from/to) will be treated as a basin. The function estimates transitions from first entering the start-basin until first reaching the final-basin. The results will be listed by the corresponding pathways, where loops are removed occuring first.
Note
This function is a simple wrapper and in contrast to estimate_wt it stores the whole MCMC trajectory in memory. Hence, it memory-hungry.
Parameters:
-
trajs
(statetraj or list or ndarray or list of ndarray
) –State trajectory/trajectories. The states should start from zero and need to be integers.
-
lagtime
(int
) –Lag time for estimating the markov model given in [frames].
-
start
(int or list of
) –States to start counting.
-
final
(int or list of
) –States to start counting.
-
steps
(int
) –Number of MCMC propagation steps of MCMC run.
Returns:
-
paths
(dict
) –Dictionary containing the the paths as keys and and an array holding the times of all paths as value.
Source code in src/msmhelper/msm/timescales.py
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 |
|