Programs to manipulate and visualize protein sequences

Biol 591

Introduction to Bioinformatics
Notes for Scenario 7: Protein threading

Fall 2003

I. Our goal

We want to be able to visualize the structure of UDP-glucose dehydrogenase from Mesorhizobium loti. Since no structure has been determined yet for that molecule, we'll take a structure from a similar enzyme and superimpose the M. loti amino sequence on top of it. Then we can superimpose on top of that the amino acid changes found in proteins with greater solubility than wild-type. From their positions in the structure, we may deduce which regions are responsible for the precipitation of overexpressed UDP-glucose dehydrogenase and devise a strategy to minimize precipitation.

To accomplish all this, we need:

The amino acid sequence of UDP-glucose dehydrogenase (UDPGD) from Mesorhizobium loti
An amino acid sequence of some protein similar to UDPGD from M. loti whose 3-dimensional structure has been determined
The coordinates of all atoms of the protein found in Step 2 (in a file in PDB format)
A program that displays the three-dimensional structure of proteins, so that we can see the structure of the protein found in Step 2 and eventually the predicted structure of UDPGD from M. loti and its mutant derivatives
A program that aligns two amino acid sequences, so that we can line up UDPGD from M. loti and the protein found in Step 2 whose structure is known
A program that will facilitate altering the PDB file to superimpose the sequence of the M. loti enzyme onto the sequence of the protein with known 3-dimensional structure
A program that will facilitate altering the PDB file made in Step 6 to highlight the positions of mutations we specify, i.e. those that increase the solubility of UDPGD from M. loti

Let's go through this list one item at a time.

1. Get amino acid sequence of UDP-glucose dehydrogenase from Mesorhizobium loti

The sequence for the enzyme can be found using Entrez at NCBI under the accession number BAB51744. You'll want to save it in FastA file and maybe in default GenBank format as well (so you can keep a record of what's known about the sequence).

2. Find an amino acid sequence of a protein similar to UDPGD from M. loti whose 3-dimensional structure is known

BlastP is the program of choice for this. Go to Blast at NCBI and choose BlastP (under Protein). You can paste in the UDPGD sequence, but it's easier just to type in BAB51744 and ask Blast to look up the sequence itself. Click on the Choose database window and select pdb. This limits the search to sequences within the Protein Data Base, i.e. those proteins whose structures have been determined. The search should net you four reasonable candidates. The first two are labeled "GDP-mannose dehydrogenase" and the second two "UDP-glucose dehydrogenase". I chose the third best hit, because I liked that the enzyme name was what I was looking for. Why not hit #4? The link in the GenBank file to the Medline abstract will give you the answer. From the GenBank file (not the Blast output) you should now have the amino acid sequence to hit #3.

3. Get a PDB file with the coordinates of all atoms of the protein found in Step 2

In the GenBank screen for the protein, go to the search window and select Structure and search for the four-letter ID of the protein structure. One more click and you should be at a MMDB (Molecular Modeling Data Base) page for the protein. If you click on the name of the protein structure (next to PDB:), you'll get to a Structure Explorer page which will give you the opportunity (on the left side of the page) to display/download the file. Choose a PDB text file format with coordinates.

This was a complicated route. If you can't reach the end of it, let it go, there will be later opportunities to get the same file.

4. A program that displays the three-dimensional structure of proteins

We'll use a publically available program called Protein Explorer. It's a very nice program, whose major drawback is that it is not compatible with Netscape 6.0 or higher. It works with Netscape 4.7. I had some problems also with Internet Explorer 6.0, but I didn't try very hard. Even with Netscape 4.7, you'll need to download a plugin called Chime. Chime enables you to display molecules in 3-dimensions. If you don't already have it, trying to access Protein Explorer should bring you to a page directing you to Chime.

If all goes well, you should be able to get to the front door of Protein Explorer, scroll down to Find Any Molecule, type in the 4-letter ID into the box (second bullet), click Go, and end up with a beautiful protein spinning in space. The first thing you'll want to do is to click Explore more with Quick Views, which brings you to a menu-driven list of commands. You should be able to use Protein Explorer off of the web, but you can also download a copy to your own computer if you like.

If you really want to get into the program, you might want to take the 1-hour tour, but it's possible to do without. The menus provided will suffice for most manipulations we'll do, and they're intuitive enough, but we'll still need to work off of the command line on occasion. To learn what is possible to do from the command line, try the RasMol Reference Manual (RasMol was the precursor of Protein Explorer). I'll try to develop and put online a much smaller gloss of the commands we'll actually need.

5. A program that aligns two amino acid sequences

We'll use a popular publically available program, called Clustal X. You'll need to download the program to your own computer.

6. A program that will facilitate altering the PDB file downloaded in Step 3

You can download the Perl program, ThreadProtein.pl, when it becomes available.

7. A program that will facilitate altering the PDB file made in Step 6 to highlight the positions of mutations we specify

You'll modify ThreadProtein.pl to accomplish this.