BNFO 653 – Pattern Recognition and Gene Finding (2019)          

Projects: Computation to Solve Problems

The following projects guide you through a problem requiring computer programming for its solution. Choose whichever one (or more) that strikes your fancy. Better to do one well than try too many.

  1. What determines the beginning of a gene?
    This is a reasonable choice for those with no experience with BioBIKE or programming, unless one of the other topics particularly attracts you.
     
  2. Where in a bacterial genome are viruses integrated?
    An example of data mining that combines a search of text with a search of protein sequences.
     
  3. Determination of short tandem repeats (STRs)
    STRs are commonly used in forensic application.
     
  4. Analysis of gene expression data
    Extracting useful insights from microarray data sliced by metabolic pathways.
     
  5. CRISPRs in related Streptococci
    How to find them by repetitive structure, and how to compare their locations in different genomes.
     
  6. Finding targets for DNA-binding proteins given known binding sites
    Using position-specific scoring matrices to extend experimental knowledge of the genomic targets of a certain DNA-binding protein to find new, previously unknown sites. Uses a well-studied cyanobacterial transcription factor as an example.
     
  7. Finding targets for DNA-binding proteins given known target genes
    Certain genes are known or suspected of being co-regulated. Perhaps the genes contain a common upstream sequence that is the target of a transcription factor. Uses a motif-finding program (Meme) to investigate, possible regulatory motifs in Streptococcus genes.
     
  8. Identifying protein by pattern
    A family of proteins does not show great sequence similarity, except within certain amino acid motifs. Can these motifs be used to find additional family members? Uses a plant protein involved in floral symmetry as an example.
     
  9. Statistics by the numbers (chi-squared)
    A hypergeometric derivative of the central tendency summed over all conjugate variables... huh? Scrap the statistics and play with a simulation of Mendel's experiment to get at the key question: Are the results likely to have arisen by chance or not?
     
  10. Statistics by the numbers (t-test)
    This is a somewhat more involved simulation, showing that computational simulation can sometimes take the place of statistical tests and can often shed light on what those tests are trying to do.
     
  11. Alignments of viral proteins
    Protein alignments can help you peer into the mind of Nature and see what parts of the proteins she considers important. In this tour, the object is to learn something about proteins from herpesviruses.
     
  12. Analysis of protein structure
    Protein structural information is generally contained in PDB files. Usually applications handle the file for you, but sometimes it's useful to dig into the file yourself.