Biol 591 
Introduction to Bioinformatics
Scenarios
Fall 2003 
Applying structural information from one protein to another

Scientific story (html)

In brief: You are trying to produce an enzyme in E. coli for an industrial process, but at high levels of expression, the protein precipitates in an inactive form. If you knew the three-dimensional structure of the protein, you might be able to predict which amino acids to change to prevent precipitation. You don't, but the structure of a moderately similar protein is available. How can you use the sparse similarity between the two proteins to suggest the unknown three-dimensional structure of the protein you're trying to overproduce?
Bioinformatic tools
Parsing of PDB files
PDB (protein data base) files are the most common way of representing protein structural information. They are used by different publically available programs to facilitate visualization of macromolecular structures.

Protein threading
Points of similarity between a protein with known structure and one whose structure is not known constrains the positioning of the dissimilar region and permits the approximation of the unknown structure.

Molecular biology concepts:
Transcriptional and translational gene fusions
Random mutagenesis

Perl focus: Modules

Presentations and notes

Notes: Overexpression of protein
Notes: Resources for protein threading
Notes: Modifying a program that threads a protein sequence through a structure
Notes: Hints for modifying the program
Programs
ThreadProtein.pl - Superimposes one protein sequence on the structure of another
FastA_module.pm - (Used by ThreadProtein) Reads FastA files
AA_module.pm - (Used by ThreadProtein) Interconverts name formats of amino acids
Data: UDPGD-mutants.txt, identifies amino acid residues of mutant UDP glucose dehydrogenase from M. loti
Data: File aligning UDPGD sequences from Streptoccus pyogenes and M. loti - Make it via Clustal (see below)
Data: PDB-formatted file containing coordinates for UDPGD from S. pyogenes - Find it
Protein Explorer - Visualize proteins in three dimensions (Warning! Doesn't work with Netscape 6 and higher!)
Reference: Rasmol reference manual
Tutorial: 1-hour tour of Protein Explorer
Data: 1GZX.pdb, contains coordinates for three-dimensional structure of oxygenated human hemoglobin
Clustal - Align two or more sequences (DNA or amino acid)
Problem Set 7: (pdf)
Program: Protein_mol_weight.pl, used by Problem 7.3 as shell to test new AA_module
Data: List of molecular weights of amino acids, used by Problem 7.3