I ran GlimmerHMM on the nucleotide fasta file, but now I'm stuck with GFF3, but want to have the proteins. It looks like this should be possible with BioPerl, but I get stuck.
- Michael Kuhn
I've looked at the Ensembl API, but since my genome is not in Ensembl, I couldn't figure out how to apply it to my problem
- Michael Kuhn
Michael, I could put together a script with Biopython. Could you post a sample of the GFF to Pastebin (http://pastebin.com/) to give a sense of how the output looks?
- Brad Chapman
Thanks Michael: here's a script that does this: http://github.com/chapman.... You need Biopython (http://biopython.org/) installed, and the output is: http://gist.github.com/raw.... The first prediction is strange -- just a stop codon -- and the last one appeared truncated. Does the documentation call that GlimmerHMM file GFF3? Unfortunately, it looks like an invented format.
- Brad Chapman
Yes, you can use this in a cookbook example. This is from Trichinella Spiralis: http://genomeold.wustl.edu/genome... - one of those genome projects that apparently got stalled at the assembly stage (contigs from 2006)
- Michael Kuhn
Michael, here is the GFF3 version of that script: http://github.com/chapman.... I kept it structured identically to the custom output one, which highlights how nice it is to deal with standard formats using standard libraries. The code is much more general now as well, and could handle predictions for multiple contigs. In addition to Biopython, you also need the in-progress GFF parsing library: http://github.com/chapman.... Output is here: http://gist.github.com/321721
- Brad Chapman
well, I'm totally fine with the Biopython version, so I won't solicit another version :)
- Michael Kuhn
The simplest way do this would be to load the data in artemis, Select->All CDS Features, and then File->Write->Amino Acids of Selected Features .
- Morgan Langille