3.3 The Nearest-Neighbor model

The Nearest-Neighbor (NN) model for DNA was developed to predict the free energy of formation (hereafter, simply referred as energy) of a double helix of DNA starting from two complementary strands of DNA [26,27]. The model has been extended to include all possible secondary structures that can be formed with a single strand. The NN model assumes that the ssDNA is already formed (i.e., the phosphate backbone is already formed) and only base-pairing interactions are taken into account. The model predicts the formation energy of all the possible secondary structures we can form with the given sequence, but it does not predict the state of minimum energy (i.e., the native state). In order to find the native state, it is necessary to search the less energetic state among all possible secondary structures [118]. This requires the implementation of optimization algorithms such as Mfold that use the NN model to minimize the energy of a given sequence of ssDNA [119].

The basic idea of the NN model is that the base-pairing energy of two complementary bases only depends on the base itself and on the first neighbor located in the same strand. This allows to separate the computation of the global energy into many contributions called motifs (see Fig. 3.13a). Each motif represents a possible pairing of bases and the total energy of a state is given by the sum of the energies of different motifs. The energy of a given motif includes three main contributions: 1) the hydrogen bonds between bases, 2) the stacking interaction between consecutive bases and 3) the loss of entropy (see Fig. 3.13b).

Figure 3.13: NN model. (a) Duplex formation. The formation energy ($ \Delta g$) of the base pair $ i$ depends on itself and on the NN base pair $ i+1$. Each base pair contributes to the total formation energy of the duplex ($ G(n)$). (b) Interactions between bases. The hydrogen bonds between complementary bases, the stacking interactions and the loss of entropy (not depicted) constitute the free energy of one base pair. (c) The 16 NNBP motifs. The 12 highlighted motifs are symmetric with respect to the anti-diagonal. Each motif has the same energy than its symmetric. In the end, there are 10 different energies (6 symmetric + 4 anti-diagonal).

If the NN model is restricted to Watson-Crick base pairs, the number of possible motifs is dramatically reduced. Since DNA has four types of bases and the interaction involves the nearest neighbor, we end up with $ 4\times4=16$ different possible motifs. However, 6 pairs of them must have the same formation energy due to symmetry reasons (see Fig. 3.13c). Indeed, one given ssDNA determines the sequence of its complementary strand if we are restricted to Watson-Crick base pairs. This means that the energy of the duplex can be computed either knowing one strand or the other and the value of the energy must be the same in both cases, which imposes 6 restrictions. Therefore, only 10 different energy values define the free energy of formation of Watson-Crick sequences of DNA.

The NN model has been largely investigated in calorimetry and UV absorbance experiments. Calorimetry measurements allows us to infer the enthalpy and entropy of formation of chemical substances by measuring the heat absorbed or emitted by a vessel that contains the reactive substances while the temperature is increased. In particular, it is possible to isolate the contribution of each motif by gathering calorimetric experiments performed on several sequences of DNA. In the end, a table of enthalpies and entropies for each motif can be obtained.

JM Huguet 2014-02-12