io

Input and Output Text Files¶

This submodule contains all methods related to reading data from text files and writing data to text files, including helpful header comments.

`FileError` ¶

Bases: Exception

An exception for wrongly formated input files.

`opentxt(file_name, comment='#', nrows=None, **kwargs)` ¶

Open a text file.

This method can load an nxm array of floats from an ascii file. It uses either pandas read_csv for a single comment or as fallback the slower np.loadtxt for multiple comments.

Warning

In contrast to pandas the order of usecols will be used. So if using data = opentxt(..., uscols=[1, 0]) you access the first column by data[:, 0] and the second one by data[:, 1].

Parameters:

file_name (string) –

Name of file to be opened.
comment (str or array of str, default: '#' ) –

Characters with which a comment starts.
nrows (int, default: None ) –

The maximum number of lines to be read
usecols (int - array) –

Columns to be read from the file (zero indexed).
skiprows (int) –

The number of leading rows which will be skipped.
dtype (data - type) –

Data-type of the resulting array. Default: float.

Returns:

array ( ndarray ) –

Data read from the text file.

Source code in src/msmhelper/io.py

def opentxt(file_name, comment='#', nrows=None, **kwargs):
    r"""Open a text file.

    This method can load an nxm array of floats from an ascii file. It uses
    either pandas read_csv for a single comment or as fallback the slower
    [np.loadtxt][numpy.loadtxt] for multiple comments.

    !!! warning
        In contrast to pandas the order of usecols will be used. So if
        using `data = opentxt(..., uscols=[1, 0])` you access the first column
        by `data[:, 0]` and the second one by `data[:, 1]`.

    Parameters
    ----------
    file_name : string
        Name of file to be opened.
    comment : str or array of str, optional
        Characters with which a comment starts.
    nrows : int, optional
        The maximum number of lines to be read
    usecols : int-array, optional
        Columns to be read from the file (zero indexed).
    skiprows : int, optional
        The number of leading rows which will be skipped.
    dtype : data-type, optional
        Data-type of the resulting array. Default: float.

    Returns
    -------
    array : ndarray
        Data read from the text file.

    """
    if len(comment) == 1:
        # pandas does not support array of single char
        if not isinstance(comment, str):
            comment = comment[0]

        # force pandas to load in stated order without sorting
        cols = kwargs.pop('usecols', None)
        if cols is not None:
            idx = np.argsort(cols)
            cols = np.atleast_1d(cols).astype(np.int32)[idx]

        array = pd.read_csv(
            file_name,
            sep=r'\s+',
            header=None,
            comment=comment,
            nrows=nrows,
            usecols=cols,
            **kwargs,
        ).values

        if array.shape[-1] == 1:
            array = array.flatten()
        # swap columns back to ensure correct order
        elif cols is not None:
            array = utils.swapcols(array, idx, np.arange(len(idx)))

        return array

    return np.loadtxt(
        file_name,
        comments=comment,
        max_rows=nrows,
        **kwargs,
    )

`savetxt(file_name, array, header=None, fmt='%.5f')` ¶

Save nxm array of floats to a text file.

It uses numpys savetxt method and extends the header with information of execution.

Parameters:

file_name (string) –

File name to store data.
array (ndarray) –

Data to be stored.
header (str, default: None ) –

Comment written into the header of the output file.
fmt (str or sequence of strs, default: '%.5f' ) –

See numpy.savetxt.

Source code in src/msmhelper/io.py

def savetxt(file_name, array, header=None, fmt='%.5f'):  # noqa: WPS323
    """Save nxm array of floats to a text file.

    It uses numpys savetxt method and extends the header with information
    of execution.

    Parameters
    ----------
    file_name : string
        File name to store data.
    array : ndarray
        Data to be stored.
    header : str, optional
        Comment written into the header of the output file.
    fmt : str or sequence of strs, optional
        See [numpy.savetxt][].

    """
    # prepare header comments
    RUI = _get_runtime_user_information()

    header_comment = (
        'This file was generated by {script_dir}/{script_name}:\n{args}' +
        '\n\n{date}, {user}@{pc}'
    ).format(**RUI, args=' '.join(sys.argv))

    if header:  # print column title if given
        header_comment += '\n{0}'.format(header)

    # save file
    np.savetxt(file_name, array, fmt=fmt, header=header_comment)

`opentxt_limits(file_name, limits_file=None, **kwargs)` ¶

Load file and split according to limit file.

If limits_file is not provided it will return [traj].

Parameters:

file_name (string) –

Name of file to be opened.
limits_file (str, default: None ) –

File name of limit file. Should be single column ascii file.
**kwargs –

See parameters defined in opentxt

Returns:

traj ( ndarray ) –

Return array of subtrajectories.

Source code in src/msmhelper/io.py

def opentxt_limits(file_name, limits_file=None, **kwargs):
    """Load file and split according to limit file.

    If limits_file is not provided it will return `[traj]`.

    Parameters
    ----------
    file_name : string
        Name of file to be opened.
    limits_file : str, optional
        File name of limit file. Should be single column ascii file.
    **kwargs
        See parameters defined in [opentxt][msmhelper.io.opentxt]

    Returns
    -------
    traj : ndarray
        Return array of subtrajectories.

    """
    # open trajectory
    traj = opentxt(file_name, **kwargs)

    # open limits
    limits = open_limits(limits_file=limits_file, data_length=len(traj))

    # split trajectory
    return np.split(traj, limits)[:-1]

`openmicrostates(file_name, limits_file=None, **kwargs)` ¶

Load 1d file and split according to limit file.

Both, the limit file and the trajectory file needs to be a single column file. If limits_file is not provided it will return [traj]. The trajectory will of dtype np.int16, so the states needs to be smaller than 32767.

Parameters:

file_name (string) –

Name of file to be opened.
limits_file (str, default: None ) –

File name of limit file. Should be single column ascii file.
**kwargs –

See parameters defined in opentxt

Returns:

traj ( ndarray ) –

Return array of subtrajectories.

Source code in src/msmhelper/io.py

def openmicrostates(file_name, limits_file=None, **kwargs):
    """Load 1d file and split according to limit file.

    Both, the limit file and the trajectory file needs to be a single column
    file. If limits_file is not provided it will return [traj]. The trajectory
    will of dtype np.int16, so the states needs to be smaller than 32767.

    Parameters
    ----------
    file_name : string
        Name of file to be opened.
    limits_file : str, optional
        File name of limit file. Should be single column ascii file.
    **kwargs
        See parameters defined in [opentxt][msmhelper.io.opentxt]

    Returns
    -------
    traj : ndarray
        Return array of subtrajectories.

    """
    # open trajectory
    if 'dtype' in kwargs and not np.issubdtype(kwargs['dtype'], np.integer):
        raise TypeError('dtype should be integer')
    else:
        kwargs['dtype'] = np.int16

    # load split trajectory
    traj = opentxt_limits(file_name, limits_file, **kwargs)

    if len(traj[0].shape) != 1:
        raise FileError('Microstate trjectory shoud be single column file.')

    return traj

`open_limits(data_length, limits_file=None)` ¶

Load and check limit file.

The limits give the length of each single trajectory. So e.g. [5, 5, 5] for 3 equally-sized subtrajectories of length 5.

Parameters:

data_length (int) –

Length of data read.
limits_file (str, default: None ) –

File name of limit file. Should be single column ascii file.

Returns:

limits ( ndarray ) –

Return cumsum of limits.

Source code in src/msmhelper/io.py

def open_limits(data_length, limits_file=None):
    """Load and check limit file.

    The limits give the length of each single trajectory. So e.g.
    [5, 5, 5] for 3 equally-sized subtrajectories of length 5.

    Parameters
    ----------
    data_length : int
        Length of data read.
    limits_file : str, optional
        File name of limit file. Should be single column ascii file.

    Returns
    -------
    limits : ndarray
        Return cumsum of limits.

    """
    if limits_file is None:
        return np.array([data_length])  # for single trajectory

    # open limits file
    limits = opentxt(limits_file)
    if len(limits.shape) != 1:
        raise FileError('Shoud be single column file.')

    # convert to cumulative sum
    limits = np.cumsum(limits)
    if data_length != limits[-1]:
        raise ValueError('Limits are inconsistent to data.')

    return limits

io

Input and Output Text Files¶

FileError ¶

opentxt(file_name, comment='#', nrows=None, **kwargs) ¶

savetxt(file_name, array, header=None, fmt='%.5f') ¶

opentxt_limits(file_name, limits_file=None, **kwargs) ¶

openmicrostates(file_name, limits_file=None, **kwargs) ¶

open_limits(data_length, limits_file=None) ¶

`FileError` ¶

`opentxt(file_name, comment='#', nrows=None, **kwargs)` ¶

`savetxt(file_name, array, header=None, fmt='%.5f')` ¶

`opentxt_limits(file_name, limits_file=None, **kwargs)` ¶

`openmicrostates(file_name, limits_file=None, **kwargs)` ¶

`open_limits(data_length, limits_file=None)` ¶