Both FASTA and BLAST use a rapid word-based lookup strategy to speed the initial phase of the similarity search. In protein searches, FASTA looks for pairs of aligned identical amino-acids, e.g.
seq1 KDKEAYADRQELQDELRQEREARQKLEMMIKELKLQILKSSKTAKE
. ::. .::..::..::. ::: :.. :::. .:
seq2 NAKEGLEKIEELEEELENERKLRQKSELQRKELESRIEELQDQLET
^^ ^^ ^^ ^^ ^^^ ^^^
With ktup=2, FASTA would ignore a region like:
seq1 LNKKLLNLKQAGEHLKPE
.....:. .. :.:. .
seq2 FEEEFLETREQYEKLQKD
in the initial scanning phase. Thus, searches with ktup=1 can
be more sensitive than searches with ktup=2. However, a more
sensitive algorithm may also raise the scores of unrelated sequences,
so that the statistical significance of an intermediate-distance match
is reduced, while the significance of a very distance match is improved.
BLAST also looks for initial similarities using a word-size (ktup) of 3, but BLAST looks for conservative substitutions as well as identities. Thus, BLAST with a wordsize of 3 is often more sensitive than FASTA with a ktup=2.
For DNA sequences, FASTA uses a ktup=6 by default. DNA searches with ktup=3 are even more sensitive, but ktup=1 is less sensitive (at a given statistical significance threshold) than ktup=3 for DNA. ktup=1 is appropriate when searching for oligonucleotides (< 20 nt).