msmhelper
msmhelper¶
This package is designed for the analysis of discrete time series data from Molecular Dynamics (MD) simulations. It focuses on Markov state modeling (MSM), a powerful technique for analyzing complex systems, and provides a set of functions for constructing and analyzing Markov models, including methods calculating transition probabilities, and fitting models to data. The package is suitable for researchers and engineers who need to analyze large and complex datasets in order to gain insights into the behavior of the underlying dynamics.
The module is structured into the following submodules:
-
io: This submodule contains all methods related to reading data from text files and writing data to text files, including helpful header comments.
-
md: This submodule offers techniques for the analysis of state trajectories—commonly known as Molecular Dynamics (MD)—without relying on Markov state models. It encompasses functions for determining timescales, recognizing significant events, correcting dynamical anomalies, and evaluating various state discretization methods. These functions provide a comprehensive solution for analyzing time-series data and understanding the underlying dynamics of complex systems.
-
msm: This submodule contains methods related to Markov state modeling, a powerful technique for analyzing complex systems. It provides a set of functions for constructing and analyzing Markov models, including methods for calculating transition probabilities and estimating various time scales.
-
plot: This submodule is dedicated to visualizing results. It offers a collection of functions for generating frequently used figures, such as the CK-test, implied timescales, and waiting times.
-
statetraj: This submodule contains the two classes StateTraj and LumpedStateTraj which are used to represent the time series and allows for an improved performance.
-
utils: This submodule provides utility functions that can be used to manipulate and test data, such as filtering and validation methods. The functions in this submodule can be used in conjunction with other parts of the software to perform a variety of tasks, making it an essential part of the package.
LumpedStateTraj(macrotrajs, microtrajs=None, positive=False)
¶
Bases: StateTraj
Class for using the Hummer-Szabo projection with state trajectories.
Initialize LumpedStateTraj.
If called with LumpedStateTraj instance, it will be returned instead. This class is an implementation of the Hummer-Szabo projection1.
-
Hummer and Szabo, Optimal Dimensionality Reduction of Multistate Kinetic and Markov-State Models, J. Phys. Chem. B, 119 (29), 9029-9037 (2015), doi: 10.1021/jp508375q ↩
Parameters:
-
macrotrajs
(list or ndarray or list of ndarray
) –Lumped state trajectory/trajectories. The states need to be integers and all states needs to correspond to union of microstates.
-
microtrajs
(list or ndarray or list of ndarray
, default:None
) –State trajectory/trajectories. EaThe states should start from zero and need to be integers.
-
positive
(bool
, default:False
) –If
True
\(T_ij\ge0\) will be enforced, else small negative values are possible.
Source code in src/msmhelper/statetraj.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
|
states
property
¶
Return active set of macrostates.
Returns:
-
states
(ndarray
) –Numpy array holding active set of states.
nstates
property
¶
Return number of macrostates.
Returns:
-
nstates
(int
) –Number of states.
microstate_trajs
property
¶
Return microstate trajectory.
Returns:
-
trajs
(list of ndarrays
) –List of ndarrays holding the input data.
microstate_trajs_flatten
property
¶
Return flattened state trajectory.
Returns:
-
trajs
(ndarray
) –1D ndarrays representation of state trajectories.
microstate_index_trajs
property
¶
Return microstate index trajectory.
Returns:
-
trajs
(list of ndarrays
) –List of ndarrays holding the microstate index trajectory.
microstate_index_trajs_flatten
property
¶
Return flattened microstate index trajectory.
Returns:
-
trajs
(ndarray
) –1D ndarrays representation of microstate index trajectories.
trajs
property
¶
Return macrostate trajectory.
Returns:
-
trajs
(list of ndarrays
) –List of ndarrays holding the input macrostate data.
index_trajs
property
¶
Return index trajectory.
Returns:
-
trajs
(list of ndarrays
) –List of ndarrays holding the input data.
microstates
property
¶
Return active set of microstates.
Returns:
-
states
(ndarray
) –Numpy array holding active set of states.
nmicrostates
property
¶
Return number of active set of states.
Returns:
-
states
(ndarray
) –Numpy array holding active set of states.
state_assignment
property
¶
Return micro to macrostate assignment vector.
Returns:
-
state_assignment
(ndarray
) –Micro to macrostate assignment vector.
estimate_markov_model(lagtime)
¶
Estimates Markov State Model.
This method estimates the microstate MSM based on the transition count matrix, followed by Szabo-Hummer projection1 formalism to macrostates.
-
Hummer and Szabo, Optimal Dimensionality Reduction of Multistate Kinetic and Markov-State Models, J. Phys. Chem. B, 119 (29), 9029-9037 (2015), doi: 10.1021/jp508375q ↩
Parameters:
-
lagtime
(int
) –Lag time for estimating the markov model given in [frames].
Returns:
-
T
(ndarray
) –Transition probability matrix \(T_{ij}\), containing the transition probability transition from state \(i\to j\).
-
states
(ndarray
) –Array holding states corresponding to the columns of \(T_{ij}\).
Source code in src/msmhelper/statetraj.py
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 |
|
StateTraj(trajs)
¶
Class for handling discrete state trajectories.
Initialize StateTraj and convert to index trajectories.
If called with StateTraj instance, it will be returned instead.
Parameters:
-
trajs
(list or ndarray or list of ndarray
) –State trajectory/trajectories. The states need to be integers.
Source code in src/msmhelper/statetraj.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
states
property
¶
Return active set of states.
Returns:
-
states
(ndarray
) –Numpy array holding active set of states.
nstates
property
¶
Return number of states.
Returns:
-
nstates
(int
) –Number of states.
ntrajs
property
¶
Return number of trajectories.
Returns:
-
ntrajs
(int
) –Number of trajectories.
nframes
property
¶
Return cumulative length of all trajectories.
Returns:
-
nframes
(int
) –Number of frames of all trajectories.
trajs
property
¶
Return state trajectory.
Returns:
-
trajs
(list of ndarrays
) –List of ndarrays holding the input data.
trajs_flatten
property
¶
Return flattened state trajectory.
Returns:
-
trajs
(ndarray
) –1D ndarray representation of state trajectories.
index_trajs
property
¶
Return index trajectory.
Returns:
-
trajs
(list of ndarrays
) –List of ndarrays holding the input data.
index_trajs_flatten
property
¶
Return flattened index trajectory.
Returns:
-
trajs
(ndarray
) –1D ndarray representation of index trajectories.
estimate_markov_model(lagtime)
¶
Estimates Markov State Model.
This method estimates the MSM based on the transition count matrix.
Parameters:
-
lagtime
(int
) –Lag time for estimating the markov model given in [frames].
Returns:
-
T
(ndarray
) –Transition probability matrix \(T_{ij}\), containing the transition probability transition from state \(i o j\).
-
states
(ndarray
) –Array holding states corresponding to the columns of \(T_{ij}\).
Source code in src/msmhelper/statetraj.py
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
|
state_to_idx(state)
¶
Get idx corresponding to state.
Parameters:
-
state
(int
) –State to get idx of.
Returns:
-
idx
(int
) –Idx corresponding to state.
Source code in src/msmhelper/statetraj.py
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 |
|
opentxt(file_name, comment='#', nrows=None, **kwargs)
¶
Open a text file.
This method can load an nxm array of floats from an ascii file. It uses either pandas read_csv for a single comment or as fallback the slower np.loadtxt for multiple comments.
Warning
In contrast to pandas the order of usecols will be used. So if
using data = opentxt(..., uscols=[1, 0])
you access the first column
by data[:, 0]
and the second one by data[:, 1]
.
Parameters:
-
file_name
(string
) –Name of file to be opened.
-
comment
(str or array of str
, default:'#'
) –Characters with which a comment starts.
-
nrows
(int
, default:None
) –The maximum number of lines to be read
-
usecols
(int - array
) –Columns to be read from the file (zero indexed).
-
skiprows
(int
) –The number of leading rows which will be skipped.
-
dtype
(data - type
) –Data-type of the resulting array. Default: float.
Returns:
-
array
(ndarray
) –Data read from the text file.
Source code in src/msmhelper/io.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
savetxt(file_name, array, header=None, fmt='%.5f')
¶
Save nxm array of floats to a text file.
It uses numpys savetxt method and extends the header with information of execution.
Parameters:
-
file_name
(string
) –File name to store data.
-
array
(ndarray
) –Data to be stored.
-
header
(str
, default:None
) –Comment written into the header of the output file.
-
fmt
(str or sequence of strs
, default:'%.5f'
) –See numpy.savetxt.
Source code in src/msmhelper/io.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
opentxt_limits(file_name, limits_file=None, **kwargs)
¶
Load file and split according to limit file.
If limits_file is not provided it will return [traj]
.
Parameters:
-
file_name
(string
) –Name of file to be opened.
-
limits_file
(str
, default:None
) –File name of limit file. Should be single column ascii file.
-
**kwargs
–See parameters defined in opentxt
Returns:
-
traj
(ndarray
) –Return array of subtrajectories.
Source code in src/msmhelper/io.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
openmicrostates(file_name, limits_file=None, **kwargs)
¶
Load 1d file and split according to limit file.
Both, the limit file and the trajectory file needs to be a single column file. If limits_file is not provided it will return [traj]. The trajectory will of dtype np.int16, so the states needs to be smaller than 32767.
Parameters:
-
file_name
(string
) –Name of file to be opened.
-
limits_file
(str
, default:None
) –File name of limit file. Should be single column ascii file.
-
**kwargs
–See parameters defined in opentxt
Returns:
-
traj
(ndarray
) –Return array of subtrajectories.
Source code in src/msmhelper/io.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
|
open_limits(data_length, limits_file=None)
¶
Load and check limit file.
The limits give the length of each single trajectory. So e.g. [5, 5, 5] for 3 equally-sized subtrajectories of length 5.
Parameters:
-
data_length
(int
) –Length of data read.
-
limits_file
(str
, default:None
) –File name of limit file. Should be single column ascii file.
Returns:
-
limits
(ndarray
) –Return cumsum of limits.
Source code in src/msmhelper/io.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
|
rename_by_population(trajs, return_permutation=False)
¶
Rename states sorted by their population starting from 1.
Parameters:
-
trajs
(list or ndarray or list of ndarrays
) –State trajectory or list of state trajectories.
-
return_permutation
(bool
, default:False
) –Return additionaly the permutation to achieve performed renaming. Default is False.
Returns:
-
trajs
(ndarray
) –Renamed data.
-
permutation
(ndarray
) –Permutation going from old to new state nameing. So the
i
th state of the new naming corresponds to the old statepermutation[i-1]
.
Source code in src/msmhelper/utils/_utils.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
rename_by_index(trajs, return_permutation=False)
¶
Rename states sorted by their numerical values starting from 0.
Parameters:
-
trajs
(list or ndarray or list of ndarrays
) –State trajectory or list of state trajectories.
-
return_permutation
(bool
, default:False
) –Return additionaly the permutation to achieve performed renaming. Default is False.
Returns:
-
trajs
(ndarray
) –Renamed data.
-
permutation
(ndarray
) –Permutation going from old to new state nameing. So the
i
th state of the new naming corresponds to the old statepermutation[i-1]
.
Source code in src/msmhelper/utils/_utils.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
shift_data(array, val_old, val_new, dtype=np.int64)
¶
Shift integer array (data) from old to new values.
Warning
The values of val_old
, val_new
and data
needs to be integers.
The basic function is based on Ashwini_Chaudhary solution: https://stackoverflow.com/a/29408060
Parameters:
-
array
(StateTraj or ndarray or list or list of ndarrays
) –1D data or a list of data.
-
val_old
(ndarray or list
) –Values in data which should be replaced. All values needs to be within the range of
[data.min(), data.max()]
-
val_new
(ndarray or list
) –Values which will be used instead of old ones.
-
dtype
(data - type
, default:int64
) –The desired data-type. Needs to be of type unsigned integer.
Returns:
-
array
(ndarray
) –Shifted data in same shape as input.
Source code in src/msmhelper/utils/_utils.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
unique(trajs, **kwargs)
¶
Apply numpy.unique to traj.
Parameters:
-
trajs
(list or ndarray or list of ndarrays
) –State trajectory or list of state trajectories.
-
**kwargs
–Arguments of numpy.unique
Returns:
-
unique
(ndarray
) –Array containing all states, see numpy for more details.
Source code in src/msmhelper/utils/_utils.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|