Needleman-Wunsch Algorithm
From DrugPedia: A Wikipedia for Drug discovery
The Needleman-Wunsch Algorithm
The Needleman-Wunch (NW) algorithm Needleman and Wunsch (1970)# is a nonlinear global optimization method that was developed for amino acid sequence alignment in proteins. This was the first of many important alignment techniques which now find application in the Human Genome Project.
Human DNA consists of some 30,000 genes which are in turn composed of 20 amino acids represented by letters of a reduced alphabet (ADCEFGHILKMNPQRSTVWY). The total genome is composed of about 3 billion chemical base pairs, or about 100,000 per gene. Finding where a particular string of amino acids fits is an optimization problem that aims to find the optimal alignment of the two strings with respect to a defined set of rules and parameter values for comparing different alignments.
The algorithm is an iterative method in which all possible pairs of amino acids (one from each string) are set up in a 2D matrix and alignments are represented as pathways through this array. The optimum alignment is the path (or paths) connecting maximum scoring values. This approach is an example of dynamic programming, which has also been applied to seismic modeling Darby and Neidell (1966) and travel time computation Schneider et al. (1992).
It is a global optimization process which yields a solution to the problem of pairwise alignment, meaning that we are interested in finding the best fit between only two strings. If alignment of more than two strings is of interest, the problem can be broken down into a cascade of pairwise alignments and thus solved.
In its simplest form, the Needleman-Wunsch algorithm can be summarized by Figure 1. A matrix is formed by placing the two strings, possibly of different length, along the left column and top row. In this step a one is allocated to a cell in the matrix if the letter in each list at this location is the same. Otherwise no entry is made (which is a defacto zero). It is at this stage that the letter-alignment problem becomes purely numerical. In fact, the original string could just as easily consist of integers as letters. The result of this process is the similarity matrix in Figure 1a.