WU-BLAST2 2006-01-01 review

Download
by rbytes.net on

WU-BLAST is the original BLAST with gapped alignments and statistics, supporting virtually unlimited size sequences and databases, with important features, speed and reliability for the power user.

License: Freeware
OS: Mac OS X
File size: 4K
Developer: Warren Gish
Price: $0.00
Updated: 16 Jan 2006
0 stars award from rbytes.net


WU-BLAST is the original BLAST with gapped alignments and statistics, supporting virtually unlimited size sequences and databases, with important features, speed and reliability for the power user.

Gapped alignments and Sum statistics are provided in all search modes (BLASTP, BLASTN, BLASTX, TBLASTN, TBLASTX), along with segmented sequences, enhanced tabular output, XML output, and much more.

Free for academic and non-profit use.

Here are some key features of "WU BLAST2":
Gapped alignment routines are available (and used by default) in all BLAST search modes: BLASTP, BLASTN, and TBLASTN (Altschul et al., 1990), as well as, BLASTX (Gish and States, 1993) and TBLASTX (Gish, W., 1994, unpublished). Gaps can optionally be turned off in any mode if desired.
Potentially multiple regions of similarity are identified and reported for each database sequence, thus yielding increased sensitivity and selectivity. This feature is essential for finding: all exons in a multi-exon gene sequence, not just the longest or best-matching exon; all complete or partial copies of a repetitive element in a genomic sequence, not just the best matching one; and multiple, discrete domains of similarity between sequences, not just the highest-scoring one.
Karlin and Altschul (1993) "Sum statistics" are available (and used by default) in all search modes, to evaluate the joint probability of multiple regions of similarity, as described by Altschul and Gish (1996). By this technique, sets of similar regions are often found to be statistically significant that individually would be insignificant and go unreported. The combination of well-chosen heuristics and statistics in WU BLAST is often more sensitive/selective than: the full dynamic programming approach of Smith and Waterman (1981), that finds and evaluates the significance of only the highest scoring alignment with each database sequence; and other approaches or BLAST implementations that identify multiple regions of local similarity which are then evaluated individually for statistical significance rather than jointly.
Poisson statistics are available as an option to Karlin-Altschul Sum statistics in all search modes. Simpler Karlin-Altschul (1990) statistics, that do not involve joint probability calculations, are also available as an option.
Using the postsw option, a full Smith-Waterman alignment is performed on query-subject pairs of sequences that are to be reported by BLASTP. The Smith-Waterman scores and alignments are combined with the initial BLAST results and redundancy is removed. This may alter the relative ranking of database matches before output. Use of this option is recommended, although it may be supplanted in the future by other option(s) or by a redefined default behavior.
The execution of WU BLAST 2.0 has been optimized such that gapped searches typically run faster — sometimes many times faster — than the ungapped version 1.4 programs ever did, without reducing sensitivity. An exception to this observation is BLASTN (already quite fast) which typically runs about 10% slower with its default parameters, due to the addition of gapped alignment steps.
The classical 1-hit BLAST algorithm has not been changed in WU BLAST 2.0 and remains the default method for finding ungapped alignments that are then used as "seeds" for finding gapped alignments. WU BLAST 2.0 thus retains sensitivity and control characteristics that users became accustomed to with previous versions of BLAST; the addition of gapped alignments in version 2.0 merely improves sensitivity. When assessed at the same sensitivity level, the optimized, classical BLAST algorithm implemented in WU BLAST 2.0 exhibited nearly the same speed as the 2-hit algorithm implemented at the NCBI (Altschul et al., 1997) and uses significantly less memory. For users who desire still higher speed, a 2-hit algorithm is available in a higher-performance (more efficient and sensitive) WU implementation than that from the NCBI. (See the hitdist option; Users might note that even in the WU version of the 2-hit BLAST algorithm, more memory is required to achieve the same level of sensitivity as the classical 1-hit algorithm). While the classical 1-hit BLAST algorithm remains the default in all WU BLAST search modes, the 2-hit algorithm is available as an option in all search modes, as well, including BLASTN.
WU BLAST 2.0 is a virtual drop-in replacement for version 1.4, utilizing the same inputs and command line arguments, while producing almost the same format of output as before. A parser of version 1.4 output is only prone to breaking on version 2 output if it checks for just a few kinds of consistency in the results, such as equal lengths for the aligned query and subject sequences (which often is not the case when gaps are introduced) or if the parser doesn't accept the hyphens that signify gaps. In any case, if gapped output simply can not be tolerated by one's parser, one can still take advantage of the bug fixes, improved speed, and other features of WU BLAST 2.0 by using the compat1.4 compatibility option.
Gapped alignments in the blastn search mode are evaluated correctly — as is indeed the case for all WU BLAST search modes — using different statistical parameter values (?, K and H) than those used to evaluate the significance of ungapped alignments. If appropriate parameters are unavailable for the particular combination of scoring matrix and gap penalties being used, a prominent WARNING is displayed.
Unique to WU blastn is support for fully-specified scoring matrices, not just simple match/mismatch scoring systems. This allows (for example) transitions to be scored differently than transversions; and permits positive G-A substitution scores for the design of siRNAs (small interfering RNAs) where G-U base pairing is allowed. Scoring matrices may also be tailored to improve the design of PCR primers. Contrary to W. Miller (2001), scoring matrices were first supported in 1994, by the NCBI's ungapped BLASTN version 1.4 (Gish, W., unpublished; see http://blast.wustl.edu/blast-1.4). Support for nucleotide scoring matrices was dropped by the NCBI's blastall 2.0 program first released in 1997, but has been maintained continuously in all WU versions of the software since the migration to Washington University in 1994.
Word lengths (re: the W parameter) as short as 1 have been supported continuously by WU blastn, as are nucleotide neighborhood words, using the neighborhood word score threshold parameter, T. Using neighborhood words, nucleotide sequence similarity can be detected even in the absence of any identical residues between two sequences. Users are cautioned, however, that careless use of the T parameter can result in vast and overwhelming amounts of memory being requested by the software; T should likely be used only in conjunction with very short word lengths.
Information describing “consistent” groups of alignments (HSPs) is provided by licensed BLAST 2.0, when the topcomboN or links options are used. This facility can help with construction of distinct gene structure(s) from a barrage of alignments.
Multiple output formats are available — and can in fact be produced simultaneously from a single program run — including an informative tabular output and XML output that conforms to the NCBI DTD.
Licensed WU BLAST 2.0 supports the eXtended Database Format (XDF), a power user's dream in many ways for working with peptide and nucleotide sequences. Both the NCBI BLAST 2.0 database format and the NCBI implementation of the BLAST search algorithm are restricted to sequences under 16 Mbp in length, whereas human genome contigs exceeded 25 Mbp in the previous century (Hattori et al., 2) and extend to several tens of megabytes today. In contrast, XDF can accurately store individual sequences of up to 1 Gbp (billion bp) with ambiguity codes intact. Other BLAST software, such as the NCBI's, limits database files to 2 gigabytes, whereas WU BLAST's XDF supports databases (and database files) of virtually unlimited size — provided the underlying operating system supports these so-called “large files”, which common operating systems have done for years.

The computing platforms currently supported by BLAST 2.0 (licensed version only) include the following:

• Apple Mac OS X 10.1 through 10.4 for PowerPC G3 and G4; and Mac OS X 10.3 and 10.4 for PowerPC G5, including 64-bit support under 10.4 ("Tiger")
• Compaq Tru64 UNIX 5.0A for generic Alpha and Alpha EV5, EV56, EV6, and EV67
• FreeBSD 4.8 for Intel i686 (PentiumPro/II/III)
• FreeBSD 5.1 for Intel i686 (PentiumPro/II/III)
• Hewlett-Packard HP-UX 11 for Intel IA-64 (Itanium)
• IBM AIX 5.2 for Power3, Power4 and Power5
• Linux kernel version 2.2 for i586 (original Pentium) and i686 (PentiumPro/II/III)
• Linux kernel version 2.4 with Linux threads for 32-bit i686 (PentiumPro/II/III) and i786 (Pentium4); and 64-bit Intel IA-64 (Itanium), Intel EM64T and AMD Opteron AMD64
• Linux kernel version 2.6 with Native POSIX Threads (NPTL) support for 32-bit i686 and i786; and 64-bit Intel IA-64 (Itanium), Intel EM64T and AMD Opteron AMD64
• SGI IRIX 6.5 for MIPS R5, R10, R12, R14 and R16
• Sun Solaris 8 for SPARC and UltraSPARC
• Sun Solaris 9 for Intel i686.

What's New:
More effective use of memory and processors by the 64-bit binaries running on multi-processor (or multi-core) G5 systems configured with more than 2-3 GB memory.
The "span" and "span1" options had no effect, if the "stats" option was also specified on the command line.
When the "nogaps" option was specified, nonsensical values for X, E2 and S2 were mistakenly reported in the Parameters section for the irrelevant case of gapped alignments.

Requirements:
Mac OS X 10.1 or higher (Tiger Compatible);
Darwin 1.4.1 or higher on PowerPC.

WU-BLAST2 2006-01-01 search tags