Tuesday, March 15, 2005

Simple Human Genome Analysis Program

The Latest Issue of PyZine has an article about using python to analyze the Human Genome. As an example it provides a 14 line program that will count the number of TATA sequences in the Human Genome.
Genes are parts of the DNA that code for specific proteins, but there's no 'treasure map' or Rosetta stone that documents where the genes are in a complete DNA setup. (Moreover, Nature seems to have made it a point to preemptively strike down any formulated rule.) But there are some classic tell-tale signs and one of them is called a TATA box. A TATA box occurs upstream of where the actual transcription begins, but not at a specific upstream location. The occurrence of a TATA box is only indicative and not conclusive, as was mentioned earlier, and so it helps to look for them.
It also has an example of searching for protein sequences in a file of all known human proteins.

No comments: