# L. Error estimation in the Monte Carlo optimization

This aim of this appendix is to clarify which are the sources of errors in the estimates of the NNBP values when applying the MC optimization. There are three kinds of errors at different levels that we will denote as , and :

1. The first error comes from the fitting algorithm. The uncertainties of the estimated NNBP energies ( ) indicate how much the error function ( see Eq. 5.2) changes when the fitting parameters are varied around the minimum. For instance, a variation of the AA/TT motif ( ) around the minimum (see Fig. L.1) produces a larger change in the error function than a variation of the TA/AT motif ( ). This indicates that the uncertainty of AA/TT is lower than that of TA/AT. The curvature of the minimum in each direction gives the uncertainty. There is a different set of uncertainties for each fit (i.e., each molecule). A quantitative evaluation of the uncertainty of the NNBP parameters requires the evaluation of the function for each FDC (i.e., each fit), which is given by:

 (L.1)

where is the number of experimental points of the FDC; and are the position and the force measurements, respectively; is the vector of fitting parameters , ,loop; is the theoretically predicted FDC according to the model (see Sec. 3.4.1); and is the experimental error of the force measurements performed with the optical tweezers. The resolution of the instrument is taken as  pN. The uncertainty of the fit parameters is given by the following expression [162]:

 (L.2)

where are the diagonal elements of the variance-covariance matrix . In a non-linear least square fit, this matrix can be obtained from , where is the inverse of the Hessian matrix

 (L.3)

of evaluated at the point that minimizes the error. Note that the error function and the function are related by a constant factor, , so their Hessians are related by one constant factor, as well. The calculation of is quite straightforward and it gives values between  kcalmol. These values represent the first type of error that we call . Note that the Hessian matrix evaluated at the minima found with the heat-quench algorithm is very similar to the Hessian matrix evaluated at the minimum, which means that the curvature is almost the same in all heat-quench minima. Therefore the error of the fit () takes the same value within a region of  kcalmol.
2. The second error comes from the dispersion of the heat-quench minima. As we saw previously, there are several minima corresponding to different possible solutions (each solution being a set of 10 NNBP energies) for the same molecule. The values of the NNBP energies corresponding to the different solutions are Gaussian distributed (see Fig. 5.8c) and the average standard deviation is about  kcalmol. All these considerations result in a second typical error  kcalmol.
3. Finally, the third error corresponds to the molecular heterogeneity intrinsic to single molecule experiments. Such heterogeneity results in a variability of solutions among different molecules. Indeed, the FDCs of the molecules are never identical and this variability leads to differences in the values of the NNBP energies. This variability is the major source of error in the estimation of our results. The error bars in Figs. 5.12c,d and 5.13 indicate the standard error of the mean, which is around 0.1 kcalmol on average. This is what finally determines the statistical error of our analysis,  kcalmol.

Since the major source of errors is the variability of the results from molecule to molecule, we simply report this last error in the manuscript. Because we can safely conclude that the propagation of the errors of the heat-quench algorithm will not increase the final value of the error bar.

JM Huguet 2014-02-12