Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
ISMB/ECCB
HL22: Dannie Durand - Sequence Similarity Network Reveals Common Ancestry of Multidomain Proteins
Multidomain proteins are difficult to categorize because different parts have different histories. - Gabriele Sales
Multidomain homolgs - finding homologs is an important aspect of functional genomics. - Roland Krause
Song et al 2008 PLoS Computational Biology 4(5) - Allyson Lister
Genes that share common ancestry tend to have similar structure and function. - Gabriele Sales
also use to build comparative map of synteny - Ruchira S. Datta
Sequence comparison can be used to identify chromosomal regions that share common ancestry. - Gabriele Sales
this is called "spatial genomics" - Ruchira S. Datta
How multidomain proteins fit this picture? - Gabriele Sales
Example tyrosine kinases associated with many different domains. - Roland Krause
example: protein tyrosine kinases - Ruchira S. Datta
one family with many domain architectures, all sharing a kinase domain - Ruchira S. Datta
Multidomain sequences evolve via gene duplication and domain shuffling. - Gabriele Sales
multidomain sequences evolve via gene duplication and domain shuffling - Ruchira S. Datta
The same domain may appear in multiple, unrelated proteins. - Gabriele Sales
A definition will be presented that is in line with Fitch' proposition of homology. - Roland Krause
can have case where genes share common ancestry, but domain architecture has changed - Ruchira S. Datta
Difference between sequences related by vertical descent and related by domain insertion. - Roland Krause
Two kinds of relations among genomes: relation by vertical descent or relation by domain insertion. - Gabriele Sales
similarly can have the converse: through domain shuffling, genes that are not homologous can come to have the same domain architecture - Ruchira S. Datta
It is possible to distinguish such two cases? - Gabriele Sales
Given two sequences with similarity: Can one distinguish the two szenarios? - Roland Krause
homologs are related by vertical descent - Ruchira S. Datta
orthologs are related by speciation - Ruchira S. Datta
orthologs are a subset of homologs, and homologs intersect with the set of significantly similar sequences - Ruchira S. Datta
also have distant homologs which don't appear to be significantly similar - Ruchira S. Datta
A Venn diagram, including orthologs, homologs, distant homologs and significantly similar sequences with modification. - Roland Krause
inferences that can be drawn from vertical descent (similar molecular functions) and domain insertion (bindng partners) are different - Allyson Lister
Biological interpretation of vertical descent: molecular function; regulation; comparative mapping; processes of gene duplication and genome rearrangement. - Gabriele Sales
Interpretations of domain insertion: protein specialization; ligand specificity; localization; process of domain shuffling. - Gabriele Sales
vertical descent implies similar: molecular function, regulation, comparative mapping, and is useful for processes of duplication and genome rearrangement - Ruchira S. Datta
domain insertion leads to relationships of protein specialization, ligand binding, and cellular localization - Ruchira S. Datta
In animals and plants multidomain sequences become more important than in bacteria. - Gabriele Sales
The more higher eukaryotes will be sequenced, the more the problem needs to be addressed. - Roland Krause
therefore, among similar sequences, want to distinguish which are related by vertical descent, and which by domain insertion - Ruchira S. Datta
people look at sequence similarity E-value, and at alignment coverage - Ruchira S. Datta
Alignment length is typically used to distinguish domain re-arrangements. Needs a decent mode model. - Roland Krause
Good example that sequence similarity or e-values are not capable of distinguishing the two caes. - Roland Krause
The goal of this method is to identify sequence pairs related by VD and DI,and should work on a broad range of families - Allyson Lister
And needs to be computationally feasible. - Roland Krause
To test, they looked at 20 well-studied families related by vertical descent. - Allyson Lister
They had a much larger set of negative examples (40,000). - Allyson Lister
PSI-BLAST performs worse then BLAST for sequences with variable architecture multi-domain proteins(!) as it pulls in non-homologous parts of sequences. - Roland Krause
All methods do well with conserved multidomain proteins. They were more challenged by Variable multidomain, where Psi-BLAST doesn't do as well as BLAST. Both methods are extremely challenged when all the sequences were put into the analysis together. - Allyson Lister
Pairwise comparisons are not sufficient. Try networks instead. - Gabriele Sales
Pairwise sequences might not be enough, use the structure of the similarity networks. - Roland Krause
Two sequences are compared in the context of their respective neighborhoods (i.e. other sequences that show similarity). - Gabriele Sales
Domain architecture is implicitly present in the network. - Allyson Lister
Open question. The model is explicitly based on insertion and deletion. What about de novo sequence formation? - Gabriele Sales
Comment by Kevin Karplus: Use log scale for false positives in the ROC plots. - Roland Krause