Skip to content

msmhelper

msmhelper

This package is designed for the analysis of discrete time series data from Molecular Dynamics (MD) simulations. It focuses on Markov state modeling (MSM), a powerful technique for analyzing complex systems, and provides a set of functions for constructing and analyzing Markov models, including methods calculating transition probabilities, and fitting models to data. The package is suitable for researchers and engineers who need to analyze large and complex datasets in order to gain insights into the behavior of the underlying dynamics.

The module is structured into the following submodules:

  • io: This submodule contains all methods related to reading data from text files and writing data to text files, including helpful header comments.

  • md: This submodule offers techniques for the analysis of state trajectories—commonly known as Molecular Dynamics (MD)—without relying on Markov state models. It encompasses functions for determining timescales, recognizing significant events, correcting dynamical anomalies, and evaluating various state discretization methods. These functions provide a comprehensive solution for analyzing time-series data and understanding the underlying dynamics of complex systems.

  • msm: This submodule contains methods related to Markov state modeling, a powerful technique for analyzing complex systems. It provides a set of functions for constructing and analyzing Markov models, including methods for calculating transition probabilities and estimating various time scales.

  • plot: This submodule is dedicated to visualizing results. It offers a collection of functions for generating frequently used figures, such as the CK-test, implied timescales, and waiting times.

  • statetraj: This submodule contains the two classes StateTraj and LumpedStateTraj which are used to represent the time series and allows for an improved performance.

  • utils: This submodule provides utility functions that can be used to manipulate and test data, such as filtering and validation methods. The functions in this submodule can be used in conjunction with other parts of the software to perform a variety of tasks, making it an essential part of the package.

LumpedStateTraj(macrotrajs, microtrajs=None, positive=False)

Bases: StateTraj

Class for using the Hummer-Szabo projection with state trajectories.

Initialize LumpedStateTraj.

If called with LumpedStateTraj instance, it will be returned instead. This class is an implementation of the Hummer-Szabo projection1.


  1. Hummer and Szabo, Optimal Dimensionality Reduction of Multistate Kinetic and Markov-State Models, J. Phys. Chem. B, 119 (29), 9029-9037 (2015), doi: 10.1021/jp508375q 

Parameters:

  • macrotrajs (list or ndarray or list of ndarray) –

    Lumped state trajectory/trajectories. The states need to be integers and all states needs to correspond to union of microstates.

  • microtrajs (list or ndarray or list of ndarray, default: None ) –

    State trajectory/trajectories. EaThe states should start from zero and need to be integers.

  • positive (bool, default: False ) –

    If True \(T_ij\ge0\) will be enforced, else small negative values are possible.

Source code in src/msmhelper/statetraj.py
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
def __init__(self, macrotrajs, microtrajs=None, positive=False):
    r"""Initialize LumpedStateTraj.

    If called with LumpedStateTraj instance, it will be returned instead.
    This class is an implementation of the Hummer-Szabo projection[^1].

    [^1]: Hummer and Szabo, **Optimal Dimensionality Reduction of
          Multistate Kinetic and Markov-State Models**, *J. Phys. Chem. B*,
          119 (29), 9029-9037 (2015),
          doi: [10.1021/jp508375q](https://doi.org/10.1021/jp508375q)

    Parameters
    ----------
    macrotrajs : list or ndarray or list of ndarray
        Lumped state trajectory/trajectories. The states need to be
        integers and all states needs to correspond to union of
        microstates.
    microtrajs : list or ndarray or list of ndarray
        State trajectory/trajectories. EaThe states should start from zero
        and need to be integers.
    positive : bool
        If `True` $T_ij\ge0$ will be enforced, else small negative values
        are possible.

    """
    if isinstance(macrotrajs, LumpedStateTraj):
        return

    if microtrajs is None:
        raise TypeError(
            'microtrajs may only be None when macrotrajs is of type ' +
            'LumpedStateTraj.',
        )

    self.positive = positive

    # parse macrotraj
    macrotrajs = mh.utils.format_state_traj(macrotrajs)
    self._macrostates = mh.utils.unique(macrotrajs)

    # init microstate trajectories
    super().__init__(microtrajs)

    # cache flattened trajectories to speed up code for many states
    macrotrajs_flatten = np.concatenate(macrotrajs)
    microtrajs_flatten = self.microstate_trajs_flatten

    self._state_assignment = np.zeros(self.nmicrostates, dtype=np.int64)
    for idx, microstate in enumerate(self.microstates):
        idx_first = mh.utils.find_first(microstate, microtrajs_flatten)
        self._state_assignment[idx] = macrotrajs_flatten[idx_first]

states property

Return active set of macrostates.

Returns:

  • states ( ndarray ) –

    Numpy array holding active set of states.

nstates property

Return number of macrostates.

Returns:

  • nstates ( int ) –

    Number of states.

microstate_trajs property

Return microstate trajectory.

Returns:

  • trajs ( list of ndarrays ) –

    List of ndarrays holding the input data.

microstate_trajs_flatten property

Return flattened state trajectory.

Returns:

  • trajs ( ndarray ) –

    1D ndarrays representation of state trajectories.

microstate_index_trajs property

Return microstate index trajectory.

Returns:

  • trajs ( list of ndarrays ) –

    List of ndarrays holding the microstate index trajectory.

microstate_index_trajs_flatten property

Return flattened microstate index trajectory.

Returns:

  • trajs ( ndarray ) –

    1D ndarrays representation of microstate index trajectories.

trajs property

Return macrostate trajectory.

Returns:

  • trajs ( list of ndarrays ) –

    List of ndarrays holding the input macrostate data.

index_trajs property

Return index trajectory.

Returns:

  • trajs ( list of ndarrays ) –

    List of ndarrays holding the input data.

microstates property

Return active set of microstates.

Returns:

  • states ( ndarray ) –

    Numpy array holding active set of states.

nmicrostates property

Return number of active set of states.

Returns:

  • states ( ndarray ) –

    Numpy array holding active set of states.

state_assignment property

Return micro to macrostate assignment vector.

Returns:

  • state_assignment ( ndarray ) –

    Micro to macrostate assignment vector.

estimate_markov_model(lagtime)

Estimates Markov State Model.

This method estimates the microstate MSM based on the transition count matrix, followed by Szabo-Hummer projection1 formalism to macrostates.


  1. Hummer and Szabo, Optimal Dimensionality Reduction of Multistate Kinetic and Markov-State Models, J. Phys. Chem. B, 119 (29), 9029-9037 (2015), doi: 10.1021/jp508375q 

Parameters:

  • lagtime (int) –

    Lag time for estimating the markov model given in [frames].

Returns:

  • T ( ndarray ) –

    Transition probability matrix \(T_{ij}\), containing the transition probability transition from state \(i\to j\).

  • states ( ndarray ) –

    Array holding states corresponding to the columns of \(T_{ij}\).

Source code in src/msmhelper/statetraj.py
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
def estimate_markov_model(self, lagtime):
    r"""Estimates Markov State Model.

    This method estimates the microstate MSM based on the transition count
    matrix, followed by Szabo-Hummer projection[^1] formalism to
    macrostates.

    [^1]: Hummer and Szabo, **Optimal Dimensionality Reduction of
          Multistate Kinetic and Markov-State Models**, *J. Phys. Chem. B*,
          119 (29), 9029-9037 (2015),
          doi: [10.1021/jp508375q](https://doi.org/10.1021/jp508375q)

    Parameters
    ----------
    lagtime : int
        Lag time for estimating the markov model given in [frames].

    Returns
    -------
    T : ndarray
        Transition probability matrix $T_{ij}$, containing the transition
        probability transition from state $i\to j$.
    states : ndarray
        Array holding states corresponding to the columns of $T_{ij}$.

    """
    # in the following corresponds 'i' to micro and 'a' to macro
    msm_i, _ = mh.msm.msm._estimate_markov_model(
        self.microstate_index_trajs,
        lagtime,
        self.nmicrostates,
        self.microstates,
    )
    if not mh.utils.tests.is_ergodic(msm_i):
        raise TypeError('tmat needs to be ergodic transition matrix.')
    return (self._estimate_markov_model(msm_i), self.states)

StateTraj(trajs)

Class for handling discrete state trajectories.

Initialize StateTraj and convert to index trajectories.

If called with StateTraj instance, it will be returned instead.

Parameters:

  • trajs (list or ndarray or list of ndarray) –

    State trajectory/trajectories. The states need to be integers.

Source code in src/msmhelper/statetraj.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def __init__(self, trajs):
    """Initialize StateTraj and convert to index trajectories.

    If called with StateTraj instance, it will be returned instead.

    Parameters
    ----------
    trajs : list or ndarray or list of ndarray
        State trajectory/trajectories. The states need to be integers.

    """
    if isinstance(trajs, StateTraj):
        return

    self._trajs = mh.utils.format_state_traj(trajs)

    # get number of states
    self._states = mh.utils.unique(self._trajs)

    # enforce true copy of trajs
    if np.array_equal(self._states, np.arange(self.nstates)):
        self._trajs = [traj.copy() for traj in self._trajs]
    # shift to indices
    elif np.array_equal(self._states, np.arange(1, self.nstates + 1)):
        self._states = np.arange(1, self.nstates + 1)
        self._trajs = [traj - 1 for traj in self._trajs]
    else:  # not np.array_equal(self._states, np.arange(self.nstates)):
        self._trajs, self._states = mh.utils.rename_by_index(
            self._trajs,
            return_permutation=True,
        )

states property

Return active set of states.

Returns:

  • states ( ndarray ) –

    Numpy array holding active set of states.

nstates property

Return number of states.

Returns:

  • nstates ( int ) –

    Number of states.

ntrajs property

Return number of trajectories.

Returns:

  • ntrajs ( int ) –

    Number of trajectories.

nframes property

Return cumulative length of all trajectories.

Returns:

  • nframes ( int ) –

    Number of frames of all trajectories.

trajs property

Return state trajectory.

Returns:

  • trajs ( list of ndarrays ) –

    List of ndarrays holding the input data.

trajs_flatten property

Return flattened state trajectory.

Returns:

  • trajs ( ndarray ) –

    1D ndarray representation of state trajectories.

index_trajs property

Return index trajectory.

Returns:

  • trajs ( list of ndarrays ) –

    List of ndarrays holding the input data.

index_trajs_flatten property

Return flattened index trajectory.

Returns:

  • trajs ( ndarray ) –

    1D ndarray representation of index trajectories.

estimate_markov_model(lagtime)

Estimates Markov State Model.

This method estimates the MSM based on the transition count matrix.

Parameters:

  • lagtime (int) –

    Lag time for estimating the markov model given in [frames].

Returns:

  • T ( ndarray ) –

    Transition probability matrix \(T_{ij}\), containing the transition probability transition from state \(i o j\).

  • states ( ndarray ) –

    Array holding states corresponding to the columns of \(T_{ij}\).

Source code in src/msmhelper/statetraj.py
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
def estimate_markov_model(self, lagtime):
    """Estimates Markov State Model.

    This method estimates the MSM based on the transition count matrix.

    Parameters
    ----------
    lagtime : int
        Lag time for estimating the markov model given in [frames].

    Returns
    -------
    T : ndarray
        Transition probability matrix $T_{ij}$, containing the transition
        probability transition from state $i\to j$.
    states : ndarray
        Array holding states corresponding to the columns of $T_{ij}$.

    """
    return mh.msm.msm._estimate_markov_model(
        self.index_trajs,
        lagtime,
        self.nstates,
        self.states,
    )

state_to_idx(state)

Get idx corresponding to state.

Parameters:

  • state (int) –

    State to get idx of.

Returns:

  • idx ( int ) –

    Idx corresponding to state.

Source code in src/msmhelper/statetraj.py
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
def state_to_idx(self, state):
    """Get idx corresponding to state.

    Parameters
    ----------
    state : int
        State to get idx of.

    Returns
    -------
    idx : int
        Idx corresponding to state.

    """
    idx = np.where(self.states == state)[0]
    if not idx.size:
        raise ValueError(
            'State "{state}" does not exists in trajectory.'.format(
                state=state,
            ),
        )
    return idx[0]

opentxt(file_name, comment='#', nrows=None, **kwargs)

Open a text file.

This method can load an nxm array of floats from an ascii file. It uses either pandas read_csv for a single comment or as fallback the slower np.loadtxt for multiple comments.

Warning

In contrast to pandas the order of usecols will be used. So if using data = opentxt(..., uscols=[1, 0]) you access the first column by data[:, 0] and the second one by data[:, 1].

Parameters:

  • file_name (string) –

    Name of file to be opened.

  • comment (str or array of str, default: '#' ) –

    Characters with which a comment starts.

  • nrows (int, default: None ) –

    The maximum number of lines to be read

  • usecols (int - array) –

    Columns to be read from the file (zero indexed).

  • skiprows (int) –

    The number of leading rows which will be skipped.

  • dtype (data - type) –

    Data-type of the resulting array. Default: float.

Returns:

  • array ( ndarray ) –

    Data read from the text file.

Source code in src/msmhelper/io.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def opentxt(file_name, comment='#', nrows=None, **kwargs):
    r"""Open a text file.

    This method can load an nxm array of floats from an ascii file. It uses
    either pandas read_csv for a single comment or as fallback the slower
    [np.loadtxt][numpy.loadtxt] for multiple comments.

    !!! warning
        In contrast to pandas the order of usecols will be used. So if
        using `data = opentxt(..., uscols=[1, 0])` you access the first column
        by `data[:, 0]` and the second one by `data[:, 1]`.

    Parameters
    ----------
    file_name : string
        Name of file to be opened.
    comment : str or array of str, optional
        Characters with which a comment starts.
    nrows : int, optional
        The maximum number of lines to be read
    usecols : int-array, optional
        Columns to be read from the file (zero indexed).
    skiprows : int, optional
        The number of leading rows which will be skipped.
    dtype : data-type, optional
        Data-type of the resulting array. Default: float.

    Returns
    -------
    array : ndarray
        Data read from the text file.

    """
    if len(comment) == 1:
        # pandas does not support array of single char
        if not isinstance(comment, str):
            comment = comment[0]

        # force pandas to load in stated order without sorting
        cols = kwargs.pop('usecols', None)
        if cols is not None:
            idx = np.argsort(cols)
            cols = np.atleast_1d(cols).astype(np.int32)[idx]

        array = pd.read_csv(
            file_name,
            sep=r'\s+',
            header=None,
            comment=comment,
            nrows=nrows,
            usecols=cols,
            **kwargs,
        ).values

        if array.shape[-1] == 1:
            array = array.flatten()
        # swap columns back to ensure correct order
        elif cols is not None:
            array = utils.swapcols(array, idx, np.arange(len(idx)))

        return array

    return np.loadtxt(
        file_name,
        comments=comment,
        max_rows=nrows,
        **kwargs,
    )

savetxt(file_name, array, header=None, fmt='%.5f')

Save nxm array of floats to a text file.

It uses numpys savetxt method and extends the header with information of execution.

Parameters:

  • file_name (string) –

    File name to store data.

  • array (ndarray) –

    Data to be stored.

  • header (str, default: None ) –

    Comment written into the header of the output file.

  • fmt (str or sequence of strs, default: '%.5f' ) –
Source code in src/msmhelper/io.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def savetxt(file_name, array, header=None, fmt='%.5f'):  # noqa: WPS323
    """Save nxm array of floats to a text file.

    It uses numpys savetxt method and extends the header with information
    of execution.

    Parameters
    ----------
    file_name : string
        File name to store data.
    array : ndarray
        Data to be stored.
    header : str, optional
        Comment written into the header of the output file.
    fmt : str or sequence of strs, optional
        See [numpy.savetxt][].

    """
    # prepare header comments
    RUI = _get_runtime_user_information()

    header_comment = (
        'This file was generated by {script_dir}/{script_name}:\n{args}' +
        '\n\n{date}, {user}@{pc}'
    ).format(**RUI, args=' '.join(sys.argv))

    if header:  # print column title if given
        header_comment += '\n{0}'.format(header)

    # save file
    np.savetxt(file_name, array, fmt=fmt, header=header_comment)

opentxt_limits(file_name, limits_file=None, **kwargs)

Load file and split according to limit file.

If limits_file is not provided it will return [traj].

Parameters:

  • file_name (string) –

    Name of file to be opened.

  • limits_file (str, default: None ) –

    File name of limit file. Should be single column ascii file.

  • **kwargs

    See parameters defined in opentxt

Returns:

  • traj ( ndarray ) –

    Return array of subtrajectories.

Source code in src/msmhelper/io.py
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def opentxt_limits(file_name, limits_file=None, **kwargs):
    """Load file and split according to limit file.

    If limits_file is not provided it will return `[traj]`.

    Parameters
    ----------
    file_name : string
        Name of file to be opened.
    limits_file : str, optional
        File name of limit file. Should be single column ascii file.
    **kwargs
        See parameters defined in [opentxt][msmhelper.io.opentxt]

    Returns
    -------
    traj : ndarray
        Return array of subtrajectories.

    """
    # open trajectory
    traj = opentxt(file_name, **kwargs)

    # open limits
    limits = open_limits(limits_file=limits_file, data_length=len(traj))

    # split trajectory
    return np.split(traj, limits)[:-1]

openmicrostates(file_name, limits_file=None, **kwargs)

Load 1d file and split according to limit file.

Both, the limit file and the trajectory file needs to be a single column file. If limits_file is not provided it will return [traj]. The trajectory will of dtype np.int16, so the states needs to be smaller than 32767.

Parameters:

  • file_name (string) –

    Name of file to be opened.

  • limits_file (str, default: None ) –

    File name of limit file. Should be single column ascii file.

  • **kwargs

    See parameters defined in opentxt

Returns:

  • traj ( ndarray ) –

    Return array of subtrajectories.

Source code in src/msmhelper/io.py
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
def openmicrostates(file_name, limits_file=None, **kwargs):
    """Load 1d file and split according to limit file.

    Both, the limit file and the trajectory file needs to be a single column
    file. If limits_file is not provided it will return [traj]. The trajectory
    will of dtype np.int16, so the states needs to be smaller than 32767.

    Parameters
    ----------
    file_name : string
        Name of file to be opened.
    limits_file : str, optional
        File name of limit file. Should be single column ascii file.
    **kwargs
        See parameters defined in [opentxt][msmhelper.io.opentxt]

    Returns
    -------
    traj : ndarray
        Return array of subtrajectories.

    """
    # open trajectory
    if 'dtype' in kwargs and not np.issubdtype(kwargs['dtype'], np.integer):
        raise TypeError('dtype should be integer')
    else:
        kwargs['dtype'] = np.int16

    # load split trajectory
    traj = opentxt_limits(file_name, limits_file, **kwargs)

    if len(traj[0].shape) != 1:
        raise FileError('Microstate trjectory shoud be single column file.')

    return traj

open_limits(data_length, limits_file=None)

Load and check limit file.

The limits give the length of each single trajectory. So e.g. [5, 5, 5] for 3 equally-sized subtrajectories of length 5.

Parameters:

  • data_length (int) –

    Length of data read.

  • limits_file (str, default: None ) –

    File name of limit file. Should be single column ascii file.

Returns:

  • limits ( ndarray ) –

    Return cumsum of limits.

Source code in src/msmhelper/io.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
def open_limits(data_length, limits_file=None):
    """Load and check limit file.

    The limits give the length of each single trajectory. So e.g.
    [5, 5, 5] for 3 equally-sized subtrajectories of length 5.

    Parameters
    ----------
    data_length : int
        Length of data read.
    limits_file : str, optional
        File name of limit file. Should be single column ascii file.

    Returns
    -------
    limits : ndarray
        Return cumsum of limits.

    """
    if limits_file is None:
        return np.array([data_length])  # for single trajectory

    # open limits file
    limits = opentxt(limits_file)
    if len(limits.shape) != 1:
        raise FileError('Shoud be single column file.')

    # convert to cumulative sum
    limits = np.cumsum(limits)
    if data_length != limits[-1]:
        raise ValueError('Limits are inconsistent to data.')

    return limits

rename_by_population(trajs, return_permutation=False)

Rename states sorted by their population starting from 1.

Parameters:

  • trajs (list or ndarray or list of ndarrays) –

    State trajectory or list of state trajectories.

  • return_permutation (bool, default: False ) –

    Return additionaly the permutation to achieve performed renaming. Default is False.

Returns:

  • trajs ( ndarray ) –

    Renamed data.

  • permutation ( ndarray ) –

    Permutation going from old to new state nameing. So the ith state of the new naming corresponds to the old state permutation[i-1].

Source code in src/msmhelper/utils/_utils.py
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
def rename_by_population(trajs, return_permutation=False):
    r"""Rename states sorted by their population starting from 1.

    Parameters
    ----------
    trajs : list or ndarray or list of ndarrays
        State trajectory or list of state trajectories.
    return_permutation : bool
        Return additionaly the permutation to achieve performed renaming.
        Default is False.

    Returns
    -------
    trajs : ndarray
        Renamed data.
    permutation : ndarray
        Permutation going from old to new state nameing. So the `i`th state
        of the new naming corresponds to the old state `permutation[i-1]`.

    """
    # get unique states with population
    states, pop = unique(trajs, return_counts=True)

    # get decreasing order
    idx_sort = np.argsort(pop)[::-1]
    states = states[idx_sort]

    # rename states
    trajs_renamed = shift_data(
        trajs,
        val_old=states,
        val_new=np.arange(len(states)) + 1,
    )
    if return_permutation:
        return trajs_renamed, states
    return trajs_renamed

rename_by_index(trajs, return_permutation=False)

Rename states sorted by their numerical values starting from 0.

Parameters:

  • trajs (list or ndarray or list of ndarrays) –

    State trajectory or list of state trajectories.

  • return_permutation (bool, default: False ) –

    Return additionaly the permutation to achieve performed renaming. Default is False.

Returns:

  • trajs ( ndarray ) –

    Renamed data.

  • permutation ( ndarray ) –

    Permutation going from old to new state nameing. So the ith state of the new naming corresponds to the old state permutation[i-1].

Source code in src/msmhelper/utils/_utils.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def rename_by_index(trajs, return_permutation=False):
    r"""Rename states sorted by their numerical values starting from 0.

    Parameters
    ----------
    trajs : list or ndarray or list of ndarrays
        State trajectory or list of state trajectories.
    return_permutation : bool
        Return additionaly the permutation to achieve performed renaming.
        Default is False.

    Returns
    -------
    trajs : ndarray
        Renamed data.
    permutation : ndarray
        Permutation going from old to new state nameing. So the `i`th state
        of the new naming corresponds to the old state `permutation[i-1]`.

    """
    # get unique states
    states = unique(trajs)

    # rename states
    trajs_renamed = shift_data(
        trajs,
        val_old=states,
        val_new=np.arange(len(states)),
    )
    if return_permutation:
        return trajs_renamed, states
    return trajs_renamed

shift_data(array, val_old, val_new, dtype=np.int64)

Shift integer array (data) from old to new values.

Warning

The values of val_old, val_new and data needs to be integers.

The basic function is based on Ashwini_Chaudhary solution: https://stackoverflow.com/a/29408060

Parameters:

  • array (StateTraj or ndarray or list or list of ndarrays) –

    1D data or a list of data.

  • val_old (ndarray or list) –

    Values in data which should be replaced. All values needs to be within the range of [data.min(), data.max()]

  • val_new (ndarray or list) –

    Values which will be used instead of old ones.

  • dtype (data - type, default: int64 ) –

    The desired data-type. Needs to be of type unsigned integer.

Returns:

  • array ( ndarray ) –

    Shifted data in same shape as input.

Source code in src/msmhelper/utils/_utils.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def shift_data(array, val_old, val_new, dtype=np.int64):
    """Shift integer array (data) from old to new values.

    !!! warning
        The values of `val_old`, `val_new` and `data` needs to be integers.

    The basic function is based on Ashwini_Chaudhary solution:
    https://stackoverflow.com/a/29408060

    Parameters
    ----------
    array : StateTraj or ndarray or list or list of ndarrays
        1D data or a list of data.
    val_old : ndarray or list
        Values in data which should be replaced. All values needs to be within
        the range of `[data.min(), data.max()]`
    val_new : ndarray or list
        Values which will be used instead of old ones.
    dtype : data-type, optional
        The desired data-type. Needs to be of type unsigned integer.

    Returns
    -------
    array : ndarray
        Shifted data in same shape as input.

    """
    # check data-type
    if not np.issubdtype(dtype, np.integer):
        raise TypeError('An unsigned integer type is needed.')

    # flatten data
    array, shape_kwargs = _flatten_data(array)

    # offset data and val_old to allow negative values
    offset = np.min([np.min(array), np.min(val_new)])

    # convert to np.array
    val_old = (np.asarray(val_old) - offset).astype(dtype)
    val_new = (np.asarray(val_new) - offset).astype(dtype)

    # convert data and shift
    array = (array - offset).astype(dtype)

    # shift data
    conv = np.arange(array.max() + 1, dtype=dtype)
    conv[val_old] = val_new
    array = conv[array]

    # shift data back
    array = array.astype(np.int32) + offset

    # reshape and return
    return _unflatten_data(array, shape_kwargs)

unique(trajs, **kwargs)

Apply numpy.unique to traj.

Parameters:

  • trajs (list or ndarray or list of ndarrays) –

    State trajectory or list of state trajectories.

  • **kwargs

    Arguments of numpy.unique

Returns:

  • unique ( ndarray ) –

    Array containing all states, see numpy for more details.

Source code in src/msmhelper/utils/_utils.py
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def unique(trajs, **kwargs):
    r"""Apply numpy.unique to traj.

    Parameters
    ----------
    trajs : list or ndarray or list of ndarrays
        State trajectory or list of state trajectories.
    **kwargs
        Arguments of [numpy.unique][]

    Returns
    -------
    unique : ndarray
        Array containing all states, see numpy for more details.

    """
    # flatten data
    trajs, _ = _flatten_data(trajs)

    # get unique states with population
    return np.unique(trajs, **kwargs)