Commit f9431fa3 authored by Evelyne Deplazes's avatar Evelyne Deplazes

Update README.md

parent 3fd0aa10
...@@ -261,22 +261,101 @@ The umbrella integration analysis code expects the folder ...@@ -261,22 +261,101 @@ The umbrella integration analysis code expects the folder
integration_with_uncertainty to be in the same parent folder. integration_with_uncertainty to be in the same parent folder.
if this is not the case the following line in the script if this is not the case the following line in the script
umbrella_integration.py needs to be changed to the correct path. umbrella_integration.py needs to be changed to the correct path.
``` ```
sys.path.append("../") sys.path.append("../")
``` ```
Two types of error analysis are performed: Two types of error analysis are performed:
1. Bootstrapping: 1. Bootstrapping:
The error in the derivatives is calculated for each bin using The error in the derivatives is calculated for each bin using
a bootstrapping approach. the number of iterations for bootstrapping a bootstrapping approach. the number of iterations for bootstrapping
and the sample portion used for resampling (ie for creating subsets and the sample portion used for resampling (ie for creating subsets
of data) can be set by changing the following global variables of data) can be set by changing the following global variables
```
N_BOOTSTRAP_ITERATIONS = 100 N_BOOTSTRAP_ITERATIONS = 100
BOOTSTRAP_SAMPLE_PORTION = 0.05 BOOTSTRAP_SAMPLE_PORTION = 0.05
```
for a description of bootstrapping see for a description of bootstrapping see
https://en.wikipedia.org/wiki/Bootstrapping_(statistics) [Wikipedia] https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
2. Kästner & Thiel: 2. Kästner & Thiel:
The variance in the derivatives is calculated using the approach described in The variance in the derivatives is calculated using the approach described in
Kästner & Thiel, J. Chem. Phys. (2006). Kästner & Thiel, J. Chem. Phys. (2006).
----------------------
Convergence analysis
----------------------
The code can be used to run a convergence check to test whether the
simulations in each window have converged. This is important for the
reliability of the final PMF. The convergence analysis is 'switched on'
with the -c flag in the command line options
The umbrella integration code expects the folder convergence_analysis
to be in the same parent folder. if this is not the case the
following line in the script umbrella_integration.py needs to be
changed to the correct path.
```
sys.path.append("../")
```
The convergence analysis is based on the two-sample Kolmogorov-Smirnov (KS)
statistical test as described in
1. Massey Jr FJ (1951) The Kolmogorov-Smirnov test for
goodness of fit. Journal of the American statistical
Association 46:68-78
2. Smirnov N (1948) Table for estimating the goodness of
fit of empirical distributions. The annals of mathematical
statistics:279-281.
Essentially, the two-sample KS-statistic is the maximum vertical
distance between the cumulative probability distributions of two
sample distributions. For a more detailed description consult
the above listed papers.
For our analysis the convergence test is run on the derivative
data and determines the longest continuous section in the data
set for a given window. If there are multiple blocks that fit
the convergence criteria, they are all listed but the longest one is
kept for further calculation of the PMF. If the
convergence test fails, this is reported.
the input -t can be used to truncate the data used for the
calculation of the final PMF. Alternatively, the following global
variables can be used to control whether
non-converged data is used for the calculation of the final PMF.
TRUNCATE_DATA_BASED_ON_CONVERGENCE_ANALYSIS = True
If true, the data set for a given window is truncated and only the converged
section (block) is used for calculating the final PMF. IF false, the data is not
truncated and the original data set is used for calculating the PMF.
DONT_TRUNCATE_IF_NO_CONVERGED_REGION = True
if for a given window the convergence test fails and this flag is true the data
is not truncated ie the original
data set is used to calculate the PMF.
The code for the convergence analysis can be run as a stand-alone program for
any data file with x,y values
python runAnalysis.py -d <path to file> -g
the -g options illustrates the results with some graphs (somewhat cryptically
perhaps).
----------------------
Test data / Examples
----------------------
The folder test_data contains input data for three types of umbrella sampling
simluations
python umbrella_integration.py -i /Users/e.deplazes/Desktop/MD/ASIC1a/PMF/3S3X/analysis_new/0to10ns_UI/pdo_new_windows_all_windows_0to10ns_merged.dat -n 50 -m 3.2 6.9 -r left -o /Users/e.deplazes/Desktop/MD/ASIC1a/PMF/3S3X/analysis_new/0to10ns_UI/pmfUI_ASIC1aPcTx1_newwindows_0to10ns.dat
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment