How to setup a database with which to run Blast

I presume that you have already downloaded and installed Blast (if not, then click here) and downloaded two sets of protein deduced from genomic sequences, one from the genomic sequence of E. coli K-12 and the other from the genomic sequence of either E. coli O157:H7 EDL399 or E. coli O157 Sakai (if not, then click here). (If you don't know which strain to choose, click here)

Blasting the protein of one genome against the protein of another proceeds in two steps. First, you need to let Blast analyze one set of protein to create a database it can understand. Second, you need to run Blast to compare each protein of the OTHER set of protein to that database. You'll make the database from the set of E. coli K12 protein. You'll run the set of proteins from your pathogenic strain against that database.

1. Create a database of E. coli K12 protein

a. Get into a Dos window (Run Command or Cmd)

b. Get into the directory where Blast and the FA files reside (CD \Blast)

c. Type the following command to format the database:

    formatdb -ieck12.FA –pT –oT –nK12-Prot

d. Don't worry about the following kinds of error messages:
[NULL_Caption] WARNING: lcl|1445 has zero-length sequence
[NULL_Caption] WARNING: lcl|2827 has zero-length sequence
[NULL_Caption] WARNING: lcl|3800 has zero-length sequence
[NULL_Caption] WARNING: lcl|3973 has zero-length sequence

These messages mean that the protein sequences b1445, b2827, b3800, and b3873 don't have any amino acids. Which is not very likely. TIGR evidently screwed up, but problems with four out of about four thousand proteins aren't going to hurt us much.