|
Pairwise
comparison and alignment of protein or nucleic acid sequences is
the foundation upon which most other bioinformatics tools are
built.
There
are three types of alignments we can consider when aligning
sequnces, optimal, global and local alignments:
-
Optimal alignments
The alignment that is the best, given a defined set of rules
and parameter values for comparing different alignments. There
is no such thing as the single best alignment, since
optimality always depends on the assumptions one bases the
alignment on. For example, what penalty should gaps carry? All
sequence alignment procedures make some such assumptions.
-
Global alignment
An alignment that assumes that the two proteins are basically
similar over the entire length of one another. The alignment
attempts to match them to each other from end to end, even
though parts of the alignment are not very convincing. A tiny
example:
NLGPSTKDFGKISESREFDNQ
| |||| |
QLNQLERSFGKINMRLEDALV
-
Local alignment
An alignment that searches for segments of the two sequences
that match well. There is no attempt to force entire sequences
into an alignment, just those parts that appear to have good
similarity, according to some criterion. Using the same
sequences as above, one could get:
NLGPSTKDDFGKILGPSTKDDQ
||||
QNQLERSSNFGKINQLERSSNN
It may seem that one should always use
local alignments. However, it may be difficult to spot an
overall similarity, as opposed to just a domain-to-domain
similarity, if one uses only local alignment, so global
alignment is useful in some cases.
In this section, we will briefly
introduce the concept of dynamic programming, which is the
algorithm that allows for efficient and complete comparison of
two (or more) biological sequences. We will also investigate the
effects of various parameters on the results of these
comparisons and begin to look at database searching. |