Blast Algorithm
From DrugPedia: A Wikipedia for Drug discovery
Ektapathak (Talk | contribs)
(New page: Blasts stands for ''Basic Local Alignment Search Tool''.Its Based on ''Karlin-Altschul'' statistics. It compares novel sequence with that present in nucleotide & protein databases which ...)
Next diff →
Revision as of 08:48, 13 September 2008
Blasts stands for Basic Local Alignment Search Tool.Its Based on Karlin-Altschul statistics.
It compares novel sequence with that present in nucleotide & protein databases which are already characterized. It finds regions of sequence similarity which will yield functional & evolutionary clues. Regions of similarity can be:
local: where the region of similarity is based on small stretches in the sequence global: regions of similarity are present across the sequence.
Main idea of basic BLAST is that "Homologous sequences are likely to contain a short high scoring similarity region a hit. Each hit gives a seed that BLAST tries to extend on both sides".
How does blast works?
First BLAST optionally filters out low-complexity regions that are not useful for producing meaningful sequence alignments. Then, a three-step layers to sequentially refine the “good alignments”.
1 Seeding step
2 Extension step
3 Evaluation step
The Seeding step
BLAST assumes that significant alignments have words in common. A word refers to number of letters.For example, if 3 letters is a word, then the sequence PQGEFG has words PQG,QGE, GEF and EFG. Protein sequences have word length of 3 and 11 for DNA sequences.
listing of words in a sequence
BLAST cares about only the high-scoring words. The scores are created by comparing the word in the list(eg. PQG) with all the 3-letter words (PQG,QGE, GEF and EFG). By using the scoring matrix (substitution matrix) to score the comparison of each residue pair, there are 20^3 possible match scores for a 3-letter word. For example, the score obtained by comparing PQG with PEG and PQA is 15 and 12, respectively. For DNA words, a match is scored as +5 and a mismatch as -4. After that, a neighborhood word score threshold T is used to reduce the number of possible matching words. The words whose scores are greater than the threshold T will remain in the possible matching words list, while those with lower scores will be discarded. For example, PEG is kept, but PQA is abandoned when T is 13.
Seq Score Seq Score ….. …. ....... ........ PQG aligns with PEG 15 PEG 15 PRG 14 T=13 PRG 14 PSG 13 PSG 13 PQA 12 PQA 12