Ruchira S. Datta
Adrian Altenhoff: Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods
There are many orthology methods but have been few comparison studies. - Ruchira S. Datta
Hulsen et al 2006: Benchmarking ortholog identification methods using functional genomics data, Genome Biol, 7(4):R31. Comparison only with respect to function similarity. - Ruchira S. Datta
Chen et al. Roos (2007): Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE: 2:e383. Latent Class Analysis using agreement and disagreement among prediction methods. Altenhoff & Dessimoz thinks this might push the validation problem into the error model used. - Ruchira S. Datta
They wanted to assess the phylogenetic definition of orthology. Also, differences between graph-based and tree-based methods? Graph-based methods are still the most widely used methods. Ensembl was the only large project using tree-based methods, specif. gene-tree/species-tree reconciliation. What is the influence of the clustering method? No grouping, e.g. Roundup; groups of orthologs and in-paralogs (e.g. InParanoid); clusters of orthologs (COG,EggNOG) (he says clustering seems to work, but what kind of biological sense does it make?); strict cliques of orthologs (OMA). - Ruchira S. Datta
projects vary strongly by size and by genome composition (bacteria, archaea, eukaryotes). egNOG, RoundUp, OMA have representation of Bacteria. Also included commonly used methods (not projects), i.e., RBH (reciprocal BLAST hits) - Ruchira S. Datta
for each project, map protein sequences to OMA. 7.16 million sequences in total, 329.2 million orthology relations. Did intersection tests: "pairwise" among all projects, intersection test on subset (intersection of all would be empty) - Ruchira S. Datta
Assessment of Orthologs: For phylogeny, checked species tree discordance and phylogenetic analyses from literature - Ruchira S. Datta
For conserved function: check similarity of gene ontology terms (experimentally verified ones), EC numbers (agreement at all 4 digits), gene expression, genomic context - Ruchira S. Datta
checked species tree discordance, using very unresolved tree. Comb-like tree: homo sapiens - other primates - other mammals - other vertebrates - protostomia - fungi. - Ruchira S. Datta
Not all methods provide a grouping, and orthology is not a transitive relation. - Ruchira S. Datta
Took the orthogroups and checked whether gave this (rather unresolved) species tree. - Ruchira S. Datta
Mean fraction of correct splits. Only possible to compare with OMA. EggNOG got significantly smaller correct splits. - Ruchira S. Datta
Toni: EggNOG provides larger groups, how deal with that? How to deal with in-paralogs? A: If have hierarchy, go all the way to the lowest cluster. - Ruchira S. Datta
Christophe: they sample randomly from many species. - Ruchira S. Datta
Many projects perform indistinguishably well on this test. BBH is surprisingly good. - Ruchira S. Datta
Ensembl should in theory be more powerful, but did not recover higher accuracy, which was quite surprising. - Ruchira S. Datta
This test does not say anything about false negative predictions. - Ruchira S. Datta
Checked also Gene Ontology similarity and PANTHER ontology - Ruchira S. Datta
Albert thinks comparison based on the PANTHER ontology is biased based on cluster size. David Roos suggests sampling only one from each cluster. - Ruchira S. Datta
Saw in the cases of OMA Pairwise and Ensembl Compara builds that subsequent builds had lower fraction of correct splits rather than higher. - Ruchira S. Datta
Toni: Due to low quality genomes. - Ruchira S. Datta
best "method" depends on application. For Phylogeny: OMA, Homologene. For Function: strict: OMA, Homologene; moderate: OrthoMCL (InParanoid, Ensembl Compara); loose: EggNOG - Ruchira S. Datta
In these tests, tree reconciliation methods do not (yet?) outperform pairwise approaches, and BBH is surprisingly good. So, methods should validate against BBH. - Ruchira S. Datta
Future work: provide benchmark as a webservice to the community. - Ruchira S. Datta
Teresa: When comparing with BBH, should also compare with another bidirectional method. BBH is only pairwise, so the multiple methods may be confused by more data. Christophe: you just arrive at a different point on the sensitivity-specificity curve. - Ruchira S. Datta