dcTMD via Force
Introduction¶
Another option to directly analyze the constraint force time traces, is to calculate $\Delta G$ on-the-fly starting from the friction estimate ${\it{\Gamma}}$ which is calculated from the force $f_c(t)$ auto correlation via:
$$ \begin{align} \Gamma &= \beta \int_{0}^{t} \text{d}\tau \left<\delta f_c(t)\delta f_c(t-\tau)\right>_\text{N},\\ \Delta G(s) &= - v_c\int_{s_0}^{s} \text{d}s' \Gamma(s') + \int_{s_0}^{s} \text{d}s' \left<f_c(s')\right>_\text{N} \end{align} $$
This approach is computionally more demanding, since the full resolution of the force time traces is needed to determine friction and free energy estimates. Therefore, we advise the use of the WorkEstimator class, which is computationally less demanding, since it does not require the full resolution of the force time traces.
import numpy as np
from dcTMD.dcTMD import ForceEstimator
from dcTMD.storing import ForceSet, load
import dcTMD
# define variables
velocity = 0.001
res = 1
verbose = True
temperature = 300
I. create forceset¶
To calculate free energy and friction estimates a forceset containing all the force time traces is needed.
- an array containing the filenames is generated. This can be done via the function dcTMD.io.load_pullf() which takes either a glob pattern or a file containing the pullf file names as argument.
pullf_files = '../../tests/testdata/pullf_filenames.dat'
pullf_files = '../../tests/testdata/*pullf.xvg'
filenames = dcTMD.io.load_pullf(pullf_files)
filenames
file ../../tests/testdata/*pullf.xvg not found. using glob.glob(../../tests/testdata/*pullf.xvg)
['../../tests/testdata/t_middle_32_pullf.xvg', '../../tests/testdata/t_middle_03_pullf.xvg', '../../tests/testdata/t_middle_34_pullf.xvg', '../../tests/testdata/t_middle_24_pullf.xvg', '../../tests/testdata/t_middle_21_pullf.xvg', '../../tests/testdata/t_middle_04_pullf.xvg', '../../tests/testdata/t_middle_29_pullf.xvg', '../../tests/testdata/t_middle_16_pullf.xvg', '../../tests/testdata/t_middle_30_pullf.xvg', '../../tests/testdata/t_middle_19_pullf.xvg', '../../tests/testdata/t_middle_01_pullf.xvg', '../../tests/testdata/t_middle_28_pullf.xvg', '../../tests/testdata/t_middle_26_pullf.xvg', '../../tests/testdata/t_middle_31_pullf.xvg', '../../tests/testdata/t_middle_09_pullf.xvg', '../../tests/testdata/t_middle_17_pullf.xvg', '../../tests/testdata/t_middle_25_pullf.xvg', '../../tests/testdata/t_middle_05_pullf.xvg']
- the forceset is created by creating a ForceSet instance which is fitted with the filenames.
# create ForceSet instance
forceset = ForceSet(velocity=velocity,
resolution=res,
verbose=verbose,
)
forceset
ForceSet(velocity=0.001, verbose=True)
# fit/fill workset
forceset.fit(filenames)
# save workset
#dcTMD.storing.save('my_forceset', forceset)
Using ../../tests/testdata/t_middle_32_pullf.xvg to initialize arrays. length of pullf file is 20001 reduced length is 20001
Loading force files: 44%|████▍ | 8/18 [00:00<00:00, 79.37it/s]
Reading file ../../tests/testdata/t_middle_32_pullf.xvg Reading file ../../tests/testdata/t_middle_03_pullf.xvg Reading file ../../tests/testdata/t_middle_34_pullf.xvg Reading file ../../tests/testdata/t_middle_24_pullf.xvg Reading file ../../tests/testdata/t_middle_21_pullf.xvg Reading file ../../tests/testdata/t_middle_04_pullf.xvg Reading file ../../tests/testdata/t_middle_29_pullf.xvg Reading file ../../tests/testdata/t_middle_16_pullf.xvg Reading file ../../tests/testdata/t_middle_30_pullf.xvg Reading file ../../tests/testdata/t_middle_19_pullf.xvg Reading file ../../tests/testdata/t_middle_01_pullf.xvg Reading file ../../tests/testdata/t_middle_28_pullf.xvg
Loading force files: 44%|████▍ | 8/18 [00:00<00:00, 79.37it/s]
Reading file ../../tests/testdata/t_middle_26_pullf.xvg Reading file ../../tests/testdata/t_middle_31_pullf.xvg Reading file ../../tests/testdata/t_middle_09_pullf.xvg
Loading force files: 44%|████▍ | 8/18 [00:00<00:00, 79.37it/s]
Reading file ../../tests/testdata/t_middle_17_pullf.xvg Reading file ../../tests/testdata/t_middle_25_pullf.xvg
Loading force files: 100%|██████████| 18/18 [00:00<00:00, 80.68it/s]
Reading file ../../tests/testdata/t_middle_05_pullf.xvg
ForceSet(velocity=0.001, verbose=True)
II. check normality of work distribution¶
One of the main conditions which need to be fulfilled for dcTMD is a normally distributed work.
This can be checked via different methods. e.g. plotting the work time traces, normality checks at different x positions, Kolmogorov-Smirnov Test, Shapiro-Wilk Test, Anderson-Darling Test,
CAUTION: if the work distribution is not normal you results are compromised. And a path separation is necessary. For the theory on path separation see...
# plot workset
import matplotlib.pyplot as plt
from dcTMD.utils import plotting
fig, ax = plt.subplots()
plotting.plot_worklines(forceset, ax)
plt.show()
# check if work distribution follows a normal distribution
from scipy.stats import kstest, shapiro, anderson
from dcTMD.utils import plotting
index = [5000, 15000]
x = forceset.position_
fig, axs = plotting.plot_worknormalitychecks(forceset, index)
for i, p in enumerate(index):
# Shapiro-Wilk Test
shapiro_test = shapiro(forceset.work_[:,p])
print(f'shapiro wilkins results at x={x[p]} is {shapiro_test}')
# Anderson-Darling Test
# If the test statsitics is larger than the critical value of a given
# significance_level in percent, the null hypothesis that the work
# is normally distributed has to be rejected.
anderson_test = anderson(forceset.work_[:,p], 'norm')
print(f'anderson darling results at x={x[p]} is {anderson_test}.')
# Kolmogorov-Smirnov Test (requires centering and scaling of input data)
kstest_test = kstest(
(forceset.work_[:,p]-np.mean(forceset.work_[:,p]))/np.std(forceset.work_[:,p]),
'norm'
)
print(f'Kolmogorov-Smirnov results at x={x[p]} is {kstest_test}')
shapiro wilkins results at x=0.5 is ShapiroResult(statistic=0.9809889793395996, pvalue=0.9596364498138428) anderson darling results at x=0.5 is AndersonResult(statistic=0.1889729620402001, critical_values=array([0.503, 0.573, 0.687, 0.802, 0.954]), significance_level=array([15. , 10. , 5. , 2.5, 1. ])). Kolmogorov-Smirnov results at x=0.5 is KstestResult(statistic=0.1222997987555296, pvalue=0.920955734836253) shapiro wilkins results at x=1.5 is ShapiroResult(statistic=0.9350144863128662, pvalue=0.23768189549446106) anderson darling results at x=1.5 is AndersonResult(statistic=0.38109849215264546, critical_values=array([0.503, 0.573, 0.687, 0.802, 0.954]), significance_level=array([15. , 10. , 5. , 2.5, 1. ])). Kolmogorov-Smirnov results at x=1.5 is KstestResult(statistic=0.14917443864214264, pvalue=0.7646028245726414)
III. derive estimates from forceset¶
- create ForceEstimator instance
- fit ForceEstimator instance with previously created forceset
## load force
# force = load('my_force_set')
# Instantiate a ForceEstimator instance and fit it with the ForceSet
# instance
forceestimator = ForceEstimator(temperature)
forceestimator.fit(forceset)
vars(forceestimator)
{'temperature': 300, 'verbose': False, 'force_set': ForceSet(velocity=0.001, verbose=True), 'names_': array(['t_middle_32_pullf.xvg', 't_middle_03_pullf.xvg', 't_middle_34_pullf.xvg', 't_middle_24_pullf.xvg', 't_middle_21_pullf.xvg', 't_middle_04_pullf.xvg', 't_middle_29_pullf.xvg', 't_middle_16_pullf.xvg', 't_middle_30_pullf.xvg', 't_middle_19_pullf.xvg', 't_middle_01_pullf.xvg', 't_middle_28_pullf.xvg', 't_middle_26_pullf.xvg', 't_middle_31_pullf.xvg', 't_middle_09_pullf.xvg', 't_middle_17_pullf.xvg', 't_middle_25_pullf.xvg', 't_middle_05_pullf.xvg'], dtype='<U32'), 'delta_force_array': array([[-8.73736883e+02, 2.06404027e+03, -1.64675083e+03, ..., -3.84514611e+02, -6.60017244e+02, 1.03823811e+03], [-1.21761218e+03, -1.68094973e+03, -5.91505833e+02, ..., -1.45385561e+03, 6.41780156e+02, -6.25663889e+02], [-2.50703183e+02, 6.28820267e+02, -7.42814833e+02, ..., 1.87819439e+03, 6.96557756e+02, -1.87626089e+03], ..., [-2.08374518e+03, 1.03954027e+03, 2.25121917e+03, ..., 1.37587439e+03, 1.52341276e+03, -5.30872889e+02], [-2.15387183e+02, 2.92739027e+03, 1.24891917e+03, ..., -2.93190561e+03, 1.06975556e+00, -6.28384889e+02], [ 2.67058167e+01, -4.78409733e+02, -7.94146833e+02, ..., -3.65687561e+03, 1.25278176e+03, -2.15804889e+02]]), 'position_': array([0.0000e+00, 1.0000e-04, 2.0000e-04, ..., 1.9998e+00, 1.9999e+00, 2.0000e+00]), 'W_mean_': array([0.00000000e+00, 4.41837458e-02, 4.44572742e-02, ..., 9.33136501e+01, 9.32887153e+01, 9.32516402e+01]), 'W_diss_': array([0.00000000e+00, 1.31284676e-03, 5.44972880e-03, ..., 1.03329455e+02, 1.03220321e+02, 1.03098458e+02]), 'dG_': array([ 0. , 0.0428709 , 0.03900755, ..., -10.01580464, -9.93160537, -9.846818 ]), 'friction_': array([ 0. , 26256.93524359, 56480.70560122, ..., 1453172.40056802, -3635855.06739585, 1198605.76619241])}
Visualize results¶
In the package a couple of simple plot functions to get an overview of the results are implemented. e.g. plot_dcTMD_results()
# plot dcTMD results
from dcTMD.utils import plotting
fig, axs = plotting.plot_dcTMD_results(forceestimator)
plt.show()
Smooth friction estimate¶
Finally, the friction estimate needs to be smoothed. This can be done via dcTMD.utils.smoothing.gaussfilter_friction() or dcTMD.WorkEstimator.smooth_friction(sigma, mode) sigma is the standard deviation of gaussian kernel in nm the mode parameter determines how the input array is extended beyond its boundaries.
Caution: this can lead to long computations using large datasets and a big smoothing window.
# smooth friction and plot results
forceestimator.smooth_friction(sigma=0.1)
# if the friction is smoothed that value is automatically plotted
fig, axs = plotting.plot_dcTMD_results(forceestimator)
plt.show()
# but one can also specify the friction
#fig, axs = plotting.plot_dcTMD_results(
# forceestimator,
# friction=forceestimator.friction_
#)
Using different smoothing windows and modes changes the results significantly. Because of the different boundary handling in the modes. mode='nearest' leads to an overestimation at the right hand boarder (end of the simulation), while mode='reflect' leads to a wash out of the boarder on the left hand side.
Here, are some examples:
fig, axs = plt.subplots()
x = forceestimator.position_
plotting.plot_Gamma(x,
forceestimator.friction_smooth_,
axs,
label='default (reflect) 0.1nm'
)
# using different smoothing windows and modes changes the results significantly
smooth_friction = dcTMD.utils.gaussfilter_friction(forceestimator.friction_,
x,
sigma=.01,
mode='reflect',
)
axs.plot(x, smooth_friction, label='reflect .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(forceestimator.friction_,
x,
sigma=.01,
mode='nearest',
)
axs.plot(x, smooth_friction, label='nearest .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(forceestimator.friction_,
x,
sigma=0.2,
mode='reflect',
)
axs.plot(x, smooth_friction, label='reflect .2nm')
axs.legend()
plt.show()
IV. Save and load results¶
# save forceestimator instance
# dcTMD.storing.save('my_forceestimator', forceestimator)
# save data as .npz and .dat file
outname = 'my_forceestimator_results'
dcTMD.io.write_output(outname, forceestimator)
results = np.load(f'{outname}_N{len(forceestimator.names_)}_dG.npz')
results.files
save file my_forceestimator_results_N18_dG.dat save file my_forceestimator_results_N18_dG.npz
['x', 'Wmean', 'Wdiss', 'dG', 'Gamma', 'Gamma_smooth']