How to Blast one genome against another

I presume that you have already downloaded and installed Blast (if not, then click here) and downloaded two sets of protein deduced from genomic sequences, one from the genomic sequence of E. coli K-12 and the other from the genomic sequence of either E. coli O157:H7 EDL399 or E. coli O157 Sakai (if not, then click here). (If you don't know which strain to choose, click here)

Blasting the protein of one genome against the protein of another proceeds in two steps. First, you need to let Blast analyze one set of protein to create a database it can understand. Second, you need to run Blast to compare each protein of the OTHER set of protein to that database. You'll make the database from the set of E. coli K12 protein. You'll run the set of proteins from your pathogenic strain against that database.

1. Create a database of E. coli K12 protein

a. Get into a Dos window (Run Command or Cmd)

b. Get into the directory where Blast and the FA files reside (CD \Blast)

c. Type the following command to format the database:

    formatdb -ieck12.FA –pT –oT –nK12-Prot

d. Don't worry about the following kinds of error messages:
[NULL_Caption] WARNING: lcl|1445 has zero-length sequence
[NULL_Caption] WARNING: lcl|2827 has zero-length sequence
[NULL_Caption] WARNING: lcl|3800 has zero-length sequence
[NULL_Caption] WARNING: lcl|3973 has zero-length sequence

These messages mean that the protein sequences b1445, b2827, b3800, and b3873 don't have any amino acids. Which is not very likely. TIGR evidently screwed up, but problems with four out of about four thousand proteins aren't going to hurt us much.


2. Run Blast to compare the set of proteins from the pathogenic strain to the database

a. Type the following command to run Blast:

     blastall –pblastp –dK12-Prot –iEdl-Prot.FA –oEdVsK12.txt –e.001

b. Be prepared to wait a while. It may be a couple of hours, depending on how fast your computer is. The program is done when it brings you back to a DOS prompt (>). It will not ring a bell. It will not print a message.

c. Be prepared to fill your disk drive with LOTS of output, something on the order of 40 megabytes.

d. How do you know whether the program worked? Don't try read it into something like Word (you risk choking it). I don't think that Microsoft has any solution for us, but there is an ancient freeware program from the pre-Windows era that will do the job. Click here to download DR (standing for DiRectory). Put it in the Blast directory.

e. Run DR (>DR) to get a list of files in \Blast, then press the F10 key to sort the files by date of creation, then press the End key to go to the end of the list. You should see the file you just made. Press the Enter key to see the contents of the file (you can scroll through the file using the keys you expect).

f. You should see something like:

BLASTP 2.1.3 [Apr-1-2001]
 

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= thrL, thr operon leader peptide, Escherichia coli O157:H7
(EDL933)
         (27 letters)

Database: K12-Prot
           4289 sequences; 1,355,879 total letters

If so, you win!