Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »

Khader Shameer › Likes

Dan Gezelter
Short-term OpenScience consulting gig with the American Heart Association: http://www.openscience.org/blog...
Paging Michael Nielsen? Daniel Mietchen? MrGunn? - Bill Hooker
I left them a note inquiring whether they would be willing to give an open science spin to the consultancy. For instance, we could define the criteria for the individual points collaboratively, and this could then serve as the basis for other organizations that might consider getting their feet wet in this area as well. - Daniel Mietchen
Their response required clarification as to what I had in mind with "adding an open science component to the project", and I replied "What I had in mind was that the consultant could in principle gather the required information in public, such that (a) others can join in the drafting, or at least comment on it and (b) others can reuse the information." I also submitted a formal expression of interest. - Daniel Mietchen
Reply just came in: "I wanted to let you know that we have hired a consultant whose experience was a better match for our needs at this time." - Daniel Mietchen
I'll be interested to see who that person is. - Bill Hooker
Rajarshi Guha
With Big Data Comes Big Responsibilities - Technology Review - http://www.technologyreview.com/computi...
With Big Data Comes Big Responsibilities - Technology Review
Michael Nielsen
Amira
Interplanetary Transport Network - Wikipedia - http://amira.amplify.com/2010...
Interplanetary Transport Network - Wikipedia
"The Interplanetary Transport Network (ITN) is a collection of gravitationally determined pathways through the solar system that require very little energy for an object to follow. The ITN makes particular use of Lagrange points as locations where trajectories through space can be redirected using little or no energy. These points have the peculiar property of allowing objects to orbit around them, despite the absence of any material object therein. (...) The transfers are so low-energy that they make travel to almost any point in the solar system possible. " - Amira from Bookmarklet
I've seen science fiction writers refer to these paths as "chaotic" trajectories, on account of the complexity of the mapping process. I imagine we're now at a place where there's enough affordable computing power to actually map these paths in real-time. - Steel Penguin Slippy
Rajarshi Guha
New "data scientist" is but old "statistician" writ large - http://cscs.umich.edu/~crshal...
Not to be confused with the "number scientist" a.k.a. mathematician... - Eric Jain
Jan Aerts
Shirley Wu
A co-worker mentioned to me yesterday that a colleague of his is thinking about starting an online journal club type website for scientists. The idea seems to be discussions about papers, data sets, and other web-publishable materials, from any source, in a central location. It would also have discussions about scientific culture, which made me...
It would be a place where people (students, junior faculty, etc) could learn the ropes of academia and science without the pain and misery that traditionally is required. The differences I can see from existing services is the focus on journal club-style discussions and maybe a low barrier to entry - Shirley Wu from twhirl
But obviously, whatever he ends up pursuing should learn from the trials and tribulations of the many related services out there (including services like FF, which is also discussion-oriented) - Shirley Wu from twhirl
It's easy to immediately discount any proposal that sounds like yet another facebook for scientists, but there are still some interesting and potentially good ideas out there. Unfortunately, people who aren't as familiar with the existence of these tools always think of facebook as the ideal and as a brand new idea if applied to the scientist community. Hopefully I convinced my co-worker otherwise, while still encouraging the more innovative aspects of the concept. <end rant> - Shirley Wu from twhirl
Thanks for doing that. - Mr. Gunn
AcaWiki is built around a very similar concept, and John Wilbanks makes an argument for bringing journal clubs online (cf. http://ff.im/airoV ). - Daniel Mietchen
Shirley, Besides AcaWiki (great place to have these discussions, but I'm biased! http://acawiki.org/ ) your colleague also might be interested in GradTurkey, a journal-club discussion wiki originally aimed at grad students: http://gradturkey.fastcoder.net/ - Jodi Schneider
can discussion on AcaWiki be linkable and embeddable for public like you can do on FF? If not, so why don't do journal club on FF? Can't get it - Alexey
my comments on the topic in 08/07 http://pimm.wordpress.com/2007... - Attila Csordas
Knol has many journal features built-in. Here is an example of a successful research journal on H1N1: http://knol.google.com/k... - no name
John Wilbanks mentioned doing journal clubs online in his talk here recently: http://bit.ly/3jxnxr - Walter Jessen
this topic came up during a discussion today with Mike Eisen of PLoS, re: why commenting hasn't really taken off - his thought is that people are more likely to comment if there's a central place to do it rather than individually at each journal website for each paper (how many of us access papers directly through journal websites except through PubMed anyway?). The whole time I was... more... - Shirley Wu from twhirl
can somebody point to the platform for journal club online better then blog post? It's combine everything - presentation (ppt embedded from SlideShare or Gdocs, video embedded from YouTube/Vimeo...) presenter's opinion, discussion section under the post, embedded comments from FF, ranking of the presentation and number of views. Importantly you don't need to register or get account for commenting, it's public and linkable, moderatable . Whole world can participate. What can be better? - Alexey
@Neil Saunders Were you thinking of JournalFire? We recently updated the site and are looking for feedback. I posted about it yesterday: http://friendfeed.com/the-lif... - John Delacruz
Amira
Why we need plant scientists | Bristol University - http://www.bristol.ac.uk/news...
Why we need plant scientists | Bristol University
"‘Plant scientist’ should take its rightful place beside ‘doctor’, ‘lawyer’ and ‘vet’ in the list of top professions to which our most capable young people aspire, according to a hard-hitting letter by an international group of botanists and crop scientists published today. The letter calls for a radical rethink of our approaches to plant science research and underlines how, with the Earth’s growing human population, this often neglected branch of science is crucial to our long-term survival. (...) “Plant scientists are tackling many of the most important challenges facing humanity in the twenty-first century, including climate change, food security, and fossil fuel replacement. Making the best possible progress will require exceptional people. We need to radically change our culture so that ‘plant scientist’ (or, if we can rehabilitate the term, ‘botanist’) can join ‘doctor’, ‘vet’ and ‘lawyer’ in the list of top professions to which our most capable young people aspire.” (...) One... more... - Amira from Bookmarklet
Mr. Gunn
Consultation on scientific information in the digital age - European Commission wants your input! - http://ec.europa.eu/researc...
wish i'd known about this a few weeks ago ... could have had my boss (leading researcher in neuroimaging) respond. he's especially interested in this sort of thing as he's interested in data sharing for his research.... - henry
Rajarshi Guha
ISMB/ECCB
HL20: Philip Kim - Bringing order to protein disorder through comparative genomics and genetic interactions
Bellay et al., collaboration with Chad Myers and Gary Bader. http://genomebiology.com/2011... - Roland Krause
Many proteins contain disorder regions which are difficult to study and tend to neglected. With a focus on domains but interdomain regions contains many relevant motives for binding. - Roland Krause
General systematic survey of disordered interactions. Disorder is correlated with genetic interaction degree, holds for other data set, known for years. - Roland Krause
gene interaction network as way to define functional disorder - Shannon McWeeney
Paradox: disordered proteins evolve fast, disordered hubs are conserved at the gene level. - Roland Krause
Many disordered regions are conserved but not their content. - Roland Krause
detect level of conservation in regions characterized by high level of disorder - Shannon McWeeney
use of disorder conservation score and sequence conservaton score - allow elucidation of 3 classes of disorder - Shannon McWeeney
3 classes: constrained disorder (residue); flexible disorder (residue); non-conserved disorder (residue) - Shannon McWeeney
different functions with each of these classes - Shannon McWeeney
Constrained disordered in chaperone Hsp90 forms a loop on the surface. Might work on the refolding of other proteins. - Roland Krause
cancer driver mutations are enriched in constrained disorder - Shannon McWeeney
Q: Does your work suggest that with usual tours we are not able to conservation of disordered regions? A: Yes. There are different properties of disorder. We look at more data types. - Roland Krause
Q: Hubs vs non-hubs? Previous speakers showed a correlation with multiple functions. Is there a correlation? A: It's basically the same thing. Q: The floppy disorder regions allows it to do more things? A: Hmm, if you have more disordered regions have more opportunity - Roland Krause
Q: Strong signal in the location of the protein, context important. A: Not explicitly looked at. - Roland Krause
Q: Definition of disorder: Is there a number of AAs to define disorder? A: Might be a technical issue. We use Disorderpred2. Some artifacts from sliding window approaches, needs a minimal window. - Roland Krause
ISMB/ECCB
TT20: Christine Chichester - End user tools for data interoperability
AKA: Tools for Nanopublishing. Nanopublications = assertion (including provenance). - Scott Edmunds
Potentially 10 ^14 assertations, but for ones with more provenance thinking about Datacite DOIs. Also looking to link with ORCID IDs. - Scott Edmunds
ISMB/ECCB
PT19: Gunhan Gulsoy - RINQ: Reference-based Indexing for Network Queries
Network alignment are computationally hard problems. - Roland Krause
Reference based indexing: A distance in 2D can be estimated by a reference point with lower and upper bound. - Roland Krause
Multiple reference points will shrink the possible space, so distance might not be required to be measured directly. - Roland Krause
Filter possible networks for comparison. - Roland Krause
Lower bound calculation doable. Upper bound calculation more difficult, fails in some specific cases. - Roland Krause
Creation of reference networks: should be small (local alignment is NP-complete), comprehensive, non-redundant. - Roland Krause
For construction: Use an existing network from the a database C, modify the network by deletion and extension, comparison to existing networks R and include if too similar. - Roland Krause
Results: KEGG gene reg network, used all > 15 nodes. Extracted 6 -8 nodes using random walks. QNet algorithm for comparison. - Roland Krause
Q: Comprehensivity of the reference set? How comprehensive can a reference network be? A: No checking after training due to reduncy removal. - Roland Krause
Q: Webserver or available implementation? A: Work as progress, query can take 5 hours. But code is available. - Roland Krause
Q: How did you change parameters to achieve different accuracies? A: Different number of references. - Roland Krause
Q: Different meanings of networks. Can this be considered in the alignment? A: Interesting question but level of abstraction does not allow currently. Could be expanded. - Roland Krause
ISMB/ECCB
Keynote: Janet Thornton - The Evolution of Enzyme Mechanisms and Functional Diversity
10 year Keynote for ECCB - Shannon McWeeney
Special call-out to Elixir session today at 2:30 Hall F2 www.elixir-europe.org - Shannon McWeeney
Trying to understand life from molecules to systems - Venkata P. Satagopam
She's a "data junkie" -- everything depends on having your data properly organized and being able to extract information from it. - Barb Bryant
Most of our information is still at the parts level, with emerging data on interactions, reactions and pathways - Barb Bryant
at EBI - data doubling every 5 months 12 petabytes of storage currently - Shannon McWeeney
EBI contains presently 12 petabytes of data - Venkata P. Satagopam
We need to look not only at proteins but also at the small molecules, the metabolites. - Barb Bryant
Plants have way more metabolites than we do. - Barb Bryant
Cheminformatics is older but smaller than bioinformatics; largely confined to industry. The tools are not freely available, with notable exceptions. - Barb Bryant
Differences between the proteome and the metabolome, e.g. no evolution and hierarchical structure of metabolites. - Roland Krause
"Way back in the 90s" they were trying to define the reactome - the reactions necessary for life. - Barb Bryant
From the proteome and the metabolome to the reactome: How many reactions are necessary for life? - Roland Krause
Enzymes are important part of biological molecular reasons - Venkata P. Satagopam
Enzymes are called by name and EC number. - Roland Krause
Handling the reactions computationally is a challenge - Venkata P. Satagopam
Predicting enzyme function automatically: most powerful and most popular method is to recognize a homologue and transfer functional annotation. - Vangelis Simeonidis
EC numbers explained: they conform to the following format: C.SC.SSC.SN - Vangelis Simeonidis
The classification of enzymes are four-part: classes, subclasses, sub-subclass, serial number (typically the substrate) - Roland Krause
where: C = Class, SC = Sub-class, SSC = Sub-subclass, SN = Serial number - Vangelis Simeonidis
EC numbers do not capture the mechanism of the enzyme. - Vangelis Simeonidis
Capture only the chemical level, no biological dependence such as co-factors - Roland Krause
There is no one to one relationship between EC numbers and protein families - Venkata P. Satagopam
The reactome contains 4154 reactions - Venkata P. Satagopam
They wanted to build tools that would handle the actual chemistry. - Barb Bryant
There has been a lot of work in the past 10 years in tools to handle the chemistry. Includes Kanehisa 2004, Gasteiger 2008, Aris-De-Sousa 2008, Schomburg 2010. Unfortunately, most of the software isn't freely available, and only tackles part of the problem. - Barb Bryant
There is a huge literature on comparing small molecules to each other. So that's well covered. - Barb Bryant
They also needed to map the atoms from each side of the equation to each other: atom-atom mapping. This works by matching the largest common moiety first, and iterating. The Mesa (?) database of about 300 reactions is a gold standard to check the quality of the mapping. - Barb Bryant
You need to be able to compare reactions to each other - reaction similarity. - Barb Bryant
To describe the changes in the bonds that take place, you use the Dugundji-Ugi model -- you make a matrix showing the bonds for reactants and products; subtracting the matrices gives you the reaction matrix. - Barb Bryant
EC-BLAST created by Syed Asad Rahman; it allows you to compare reactions by bond similarity, reaction centre similarity or substrate structure similarity. - Barb Bryant
Chemicals have several fingerprints bond change, structure, stereo fingerprint - Venkata P. Satagopam
(See KillerApp talk I think Tues 11:45am) - Barb Bryant
CDK (Chemistry development kit) free software, - Venkata P. Satagopam
They looked into redefining the enzyme classification system. - Barb Bryant
Ligases in principle simple, most are 6.1s are amino-acyl-tRNA synthases - Venkata P. Satagopam
The EC-BLAST-server (URL above) is in closed beta. - Roland Krause
Compared two reactions using Tanimoto coefficient - Venkata P. Satagopam
"This heatmap might look good to you, to me it looks fantastic!" Similarity between substrates is now close the EC classification. Differences might be based on the EC classification. - Roland Krause
FunTree - Understanding enzyme families and evolution Poster #Z06 - Venkata P. Satagopam
Why are some structures capable of so many different enzymatic functions? Which are the residues that led to change of function? - Roland Krause
Examples from the Phosphatidylinositol-Phosphodiesterase-Superfamily, a multi-domain protein family. - Roland Krause
They looked at the multi-domain architecture of the phosphatidylinositol-phosphodiesterase superfamily. Adding new domains doesn't add enzyme function to members of this family. - Barb Bryant
One need to understand the evolution to better understand the EC classification - Venkata P. Satagopam
The tree constructed from structure has three main groups. Branches of the tree are distinguished by differences in substrate, product, presence of a metal co-factor, or mechanism. - Barb Bryant
Matrix showing how frequently there are evolutionary changes within and between classes. Evolution tends to create new enzymes within the same class, having the same mechanism but changing the substrate or product. - Barb Bryant
Most of the enzyme evol happening in the last sub class level - Venkata P. Satagopam
Question from the floor: is this an opportunity to abandon the EC classification method and move on to a better one? Answer: no. The EC structure is very sensible. Also, it is powerful because everybody uses it. Also, in the first class we examined, it matches pretty well to the similarity measure we developed. - Barb Bryant
# Best keynote so far - Roland Krause
Question: sometimes you have a huge protein to carry out a single small reaction. Have you noticed any clues to why this happens? A: we have some thoughts related to protein function. First, most proteins are multi-functional. They interact with other proteins and do other sorts of things. Secondly, some of the substrates are quite large. We have a sort of domino theory of enzyme... more... - Barb Bryant
ISMB/ECCB
Keynote: Olga Troyanskaya - Integrating computation and experiments for a molecular-level understanding of human disease
Olga is not here but will have talk via video and then live Q&A remotely - Shannon McWeeney
Introduction by Alfonso Valencia. Olga Troyanskaya cannot be here because of a recent new family member. of hers. (# Congratulations) - Roland Krause
Her lab takes large-scale molecular biology datasets and develops pathway level models - Barb Bryant
Tissue- and developmental-stage-specificity is important. - Barb Bryant
Example: DNA damage repair with many different types of interactions. - Roland Krause
Pathway connections between proteins meaning correlated or connected function as opposed to protein-protein or regulatory interactions. Connection means confidence that the two proteins are working together to accomplish some biological function. - Barb Bryant
Bayesian networks for data integration. - Roland Krause
The graph is context-specific. - Barb Bryant
A case study of mitochondria: Yeasts unlike humans can live without mitochondria. - Roland Krause
Do these networks discovery novel biology? Case study of mitochondria. - Barb Bryant
Goal is to see if we can find proteins previously unknown to be involved in mitochondrial function, and test those predictions by knocking out the gene and looking at phenotype. - Barb Bryant
Two iterations results in finding all predictions that can be tested with single knock-out. - Barb Bryant
Went from 106 to 350 proteins known to be involved in mitochondrial biogenesis, in a few months. - Barb Bryant
109 of these predictions are completely novel; 135 more do have prior literature evidence but hadn't yet made it into Gene Ontology. - Barb Bryant
Instead of this computational candidate approach, they could have done genome-wide knock-out, and tested with that assay; it would have taken them 8 years, so this is a huge time savings (and cost!) - Barb Bryant
More than 50% of the genes have an ortholog in human. - Roland Krause
The newly annotated genes that they find tend to show a quantitative phenotype as opposed to being necessary for any respiration. These are more likely to be relevant to human disease, I think because the ones that are strictly necessary would be lethal mutations... - Barb Bryant
Computational predictions from large collection of genomic data can be accurate despite incomplete or misleading old standards. - Vangelis Simeonidis
So that was yeast, but you can also take an approach of looking directly at human data. - Barb Bryant
Their system allows you to ask what diseases a particular gene is involved in. - Barb Bryant
They use 650 datasets that include 30,000 conditions. I assume this is mainly gene expression profiling data. - Barb Bryant
These predictions can be tested. Hilary Kohler (sp?) at Princeton has tested 7 of the predictions; 6/7 confirmed. - Barb Bryant
Tissue specific gene expression -- in worm, this has been carefully elucidated; see WormBase. - Barb Bryant
With Coleen Murphy at Princeton - taking out whole worm expression and figure out tissue-specific expression from it. ? - Barb Bryant
They do genetic perturbation of genes in the untranslated response pathway and make some interesting findings that were not previously known. - Barb Bryant
I am confused about the finding that they can predict tissue specificity even if they take that tissue out of the compendium. Did I get that right? How can you predict the tissue if you have zero examples of it? - Barb Bryant
# Not sure what the UTR pathway was here either. - Roland Krause
Now moving on to a human example: kidney disease - Barb Bryant
Damage to glomerular filter causes disease. - Barb Bryant
There is no way to microdissect podocytes; at most can dissect glomerular filters. A collaboratler at MI, Kretzler, has some gene expression profiles of this tissue. They're going to try and predict podocyte expression. - Barb Bryant
They take the expression compendium and positive and negative examples of mixed samples (various cell lineages) that include podocytes, I think. - Barb Bryant
They can refer to other datasets like mouse, in situ staining, and so on. - Barb Bryant
They want podocyte-specific genes that are enriched for clinically relevant genes. - Barb Bryant
DACH1 expressed in the human kidney, also expressed in the homologous organ in fish. - Roland Krause
She now talks about follow-up work, unpublished, on some specific genes. - Barb Bryant
One way to assess the predictions is to look at the glomerular filtration rate (GFR) which reflects kidney function; the expression of the predicted genes does relate to GFR. So the predictions are clinically relevant. - Barb Bryant
HOW are these proteins involved in the process? What are these genes doing in the podocytes or in the kidney? (she asks) - Barb Bryant
She looks at the genes in a network built based on what we already know about gene and protein relationships. - Barb Bryant
She shows the brain network and genes known to be involved in Alzheimer's disease. - Barb Bryant
Moving on to the topic of finding genotype-disease associations with functional genomics. This is like GWAS done with genetic data, but here wtih functional genomics. - Barb Bryant
Input: tissue-specific functional relationship networks. + genes involved in specific phenotypes (from mouse data, Jackson Labs). Then an SVM classifier tries to find new genes with evidence of connection to different phenotypes. - Barb Bryant
This works. And it definitely helps to be considering tissue specificity instead of just using global networks. - Barb Bryant
Let's look at a specific prediction: bone mineral density. - Barb Bryant
There are 20 GWAS loci but these only explain 3% of heritability. ! - Barb Bryant
Two genes: Timp2 and Abcg8 are in the top 100 predictions; they don't overlap with other previous studies, and there is a KO model in mouse. - Barb Bryant
So they looked at the bone density phenotype in these knockout (KO) mice. - Barb Bryant
Male mice do have signficiant reduction in BMD. (One graph looks like increase - what am I missing?) - Barb Bryant
With Hess & Huttenhower, infer physical, genetic, regulatory and functional networks, from the functional genomics data. Specific interaction types (like phosphorylation) are hard to predict because of the small amount of gold standard data. - Barb Bryant
They have an ontology of interaction types; this hierarchical relation can improve their systems. - Barb Bryant
I've lost track of what the input data is here; the output predictions are specific types of relationships between pairs of proteins. - Barb Bryant
Example of JAG1, looking for its targets; it is known to be a mediator of bone metastasis in breast cancer. They made a prediction of one target that they hope to confirm. - Barb Bryant
Another example in the NOD-like receptor signaling pathway. They held out some data and showed that they could regenerate it. They also had a novel prediction of an inhibitory relationship, which is consistent with some indirect experimental evidence in the literature, and relevant to a disease. - Barb Bryant
Summary: computational analyses of diverse large datasets, especially using tissue-specific information and modeling, can help pinpoint disease genes and processes that have been missed by other techniques like GWAS. - Barb Bryant
We need to link the micro-level biochemical events with physiological-level events like blood pressure. We also want to do this in the context of individual genomes - personalized medicine - to guide appropriate treatment for each patient. - Barb Bryant
Acknolwedgements: Huttenhower (now at Harvard). Chad Myers, David Hess, Matthew Hibbs. Maria Chikina - worm. Casey Greene - podocytes. Yuanfang Guan - collaborating with Jackson Labs. Chris Park -- pathways work. Many more. - Barb Bryant
function.princeton.edu; open source library C++, highly optimized, developing with Huttenhower, useful for repeating the analyses or applying tools to other datasets. - Barb Bryant
Q: Are the edges in tissue specific networks? A: Both edges and nodes are tissue specific. - Roland Krause
Roland Krause
SS02: Insights & outlook for individual genome interpretation: lessons from the Critical Assessment of Genome Interpretation
Part A. Steven Brenner: Questions can be send online via E-mail, Twitter, etc. (@CAGInews) - Roland Krause
Critical assessment of genome interpretation (CGAI): Blind assessment of phenotypes in the sense of CASP. Determine state of art, identify progress, bottlenecks, focus of efforts, innovation. - Roland Krause
Also, propose new problems and bring the community together. Not a competition but a learning exercise. - Roland Krause
Susanna Repo: Introduction to CAGI2010. - Roland Krause
CAGI2011: 9 datasets available, 3 more pending. Submission closes in fall. - Roland Krause
Personal Genome variants, Risk SNPs, [....], .exome sequencing comparison of patients and healthy individuals. - Roland Krause
Q: p53 double mutant dataset similar to last year. A: Yes, several issues with the data set(?) - Roland Krause
Pauline Ng: Results fromm 2010: Successes and challenges. - Roland Krause
SiFT algorithm - Roland Krause
Dataset 1. Mutations in Cystathinione-beta-synthetase, causing Homocystinuria. - Roland Krause
52 substitutions assay, comparison of high pyridoxine and low pyridoxine conditions. - Roland Krause
23 conditions, comparison of AUC, Accuracy, spearman, z-score, RMSD. - Roland Krause
Successful algorithms performed well in all metrics. - Roland Krause
The best two algorithms used structure, sequence and annotation. Third sequence alone, fourth structure alone. - Roland Krause
Cavets: only one protein, cannot be extrapolated, most algorithms used structure, but not known for other data. - Roland Krause
Some good predictions made but some are wrong by most algorithms. - Roland Krause
Methods are missing activating mutations (2 examples shown) - key area for field - Shannon McWeeney
General problem with activating mutations. Problematic for drug targets in cancer. - Roland Krause
Dataset 2: CHEK mutations. Increasing the risk of breast cancer. Sequenced in 1482 breast cancers and 1089 healthy controls. - Roland Krause
32 AA mutations, 2 double substitutions. 3 pre-termination, 4 frameshifts. - Roland Krause
Predicting phenotype - Karchin Lab video- BN20 approach - focused on significant variants (from association studies) and affected genes - Shannon McWeeney
Sequential bayesian with priors and BN20 approach - for each of 10 individuals - Shannon McWeeney
wiki.chasmsoftware.org - Shannon McWeeney
SNPedia - wiki like website for SNP annotation used in assessment - Shannon McWeeney
Panel discussion: - Michael Kuhn
Russ Altman: prediction: in 10 years, prediction of phenotype of SNIP wi'll be about learning from known cases (parallel: for protein structure prediction, homology modelling is becoming more important), i.e. variants that are rare now will be seen more often due to increased coverage - Michael Kuhn
ISMB/ECCB
HL07: Yanay Ofran - Survival of the Friendly - the Importance of Protein-Protein Interactions in the Evolution of Bacterial Genomes
Lateral gene transfer - how can moving of parts work in other systems and even infer a selective advantage? - Roland Krause
Complexity hypothesis: In order to be beneficial for a new protein it cannot have many interactions. - Roland Krause
Do you have to be a lone wolf to integrate successfully? - Roland Krause
Genes undergoing LGT were found to be less connected. - Roland Krause
The study presented is a new large scale study inspecting the binding interfaces. Developed a method to identify interfaces. - Roland Krause
# Missed the punchline of the presentation due to WLAN problems. Hmrg. - Roland Krause
Roland Krause
OPT11: Virginie Bernard - The functional importance and detection of regulatory sequence variants
Genetic diseases: Identification of causal mutations with exome sequencing. - Roland Krause
Access to upstream and downstream areas around exomes through reads overlapping with exomes. - Roland Krause
Thus using exome sequencing to analyze regulatory regions or splice sites. - Roland Krause
Application of splice sites: Variation to the highly conserved donor and acceptor sites. - Roland Krause
Novel splice sites emerging from variation. - Roland Krause
Variations damaging transcription factor binding sites using PAZAA and JASPAR. - Roland Krause
Not all variation are damaging TFBS, creation of new binding sites. - Roland Krause
Poster A6. - Roland Krause
ISMB/ECCB
HL02: Jacques Colinge - The Central Human Proteome
Which are the commonly expressed in human? - Roland Krause
7 cell lines, 1D gels, 50 bands, Orbitrap MS-MS - Roland Krause
Membrane proteins underrepresented, no other biases for characteristics such as PI, MW found. Abundant proteins. - Roland Krause
Overlap of 45% with Su, PNAS 2002 transcriptome study. Similar processes enriched. - Roland Krause
Human protein atlas (Ponten, MSB, 2009), overlap of 40%, mass sensitivity limited. - Roland Krause
Protein found generally well conserved. More interesting, genes are exon-rich. - Roland Krause
Increased number of interactions. - Roland Krause
The central proteome is central in the interactome (by centrality measures). - Roland Krause
83% enzymes, 77% primary metabolism, significant enrichment in drug targets (176 from DrugBank) - Roland Krause
Specialized functions of the core proteome interwoven by connectors. - Roland Krause
10% of the core proteome are poorly annotated. - Roland Krause
Significant overlap with viral host factors. - Roland Krause
Data publicly available. - Roland Krause
Q: Map of the central proteome against housekeeping genes? - Roland Krause
A: No, not directly but other works defined abundant genes as such. - Roland Krause
Q: Kinases as cancer targets, relevance in the data set? A: 40 kinases in the data set. Overlap with cancer much bigger than only kinases. - Roland Krause
Q: Both expression levels and abundance levels. Correlation? A; Poor overlap between the two suggests expression levels are little correlated. [..] - Roland Krause
Q: Have you looked at recent sequencing studies for transcriptomics [not only Su et al] . A: Studies by Chris Burge report 10.000 genes, much too many to compare because copy number might be very low, comparison very difficult. Unclear functional relevance. - Roland Krause
Q: Centrality. Have you looked at the complexome? A: You recognize proteasome, spiicesosome. Q: How many is in complexes: A: Two thirds, there are not many complexes that work alone. - Roland Krause
Q: Are poorly annotated proteins underrepresented? A: Depends on the number of PubMed abstracts that make a protein underrepresented. - Roland Krause
ISMB/ECCB
SIG: Bio-Ontologies
Kicked off by Olga Tcheremenskaia on OpenTox Predictive Toxicology Framework: toxicological ontology and semantic media wiki-based OpenToxipedia (see: http://www.opentox.org/opentox...) - Scott Edmunds
Keeping on a toxicology/pharmacological theme, Paea Le Pendu next up with "Annotation for Testing Drug Safety signals." - Scott Edmunds
Enrichment analysis for off-label use of drugs, e.g. Avastin (normally cancer drug) – can see what other uses people use it for (Maxular degeneration, etc.). - Scott Edmunds
No show for the 3rd talk (boo), so extended coffee break to let people put up posters. - Scott Edmunds
After a refreshing break Chao Pang (EBI) has started the session talking about the Coriell Cell Line ontology: rapidly developing large ontologies. - Scott Edmunds
>2000 cell lines. 93 organisms map OK to NCBI taxonomy ontology. 11 cell types and 61 anatomy types map OK to EFO ontology, but 337 disease types map a bit with OMIM but don’t have direct ontologies, - Scott Edmunds
Is a large ontology (>28,000), but easy to add new classes/cell-types. - Scott Edmunds
Now up is James Eales with a kidney disease data mining talk: An exercise in kidney factomics: from article titles to RDF knowledge base. - Scott Edmunds
Why titles: succinct, easy to collect, hard to lie, your advert to the world. - Scott Edmunds
Found ~86,000 titles for "renal" or "kidney". - Scott Edmunds
Keynote talk now from Andre Su (Scripps) on cultivating and mining the gene wiki for crowdsourced gene annotation. One of the few wikis and crowdsourcing efforts that works. - Scott Edmunds
Of genes in pubmed: 59% have <5 entries, 38% have none. Poorly annotated because sparsely curated. - Scott Edmunds
Wikipedia best example of how to utilise "long tail" of internet users, and is generally quite accurate. Currently ~10,000 gene “stubs” within Wikipedia. - Scott Edmunds
For something like Fibronectin – 28,000 articles in pubmed that can be integrated into 1 article. Community writing of review articles for every gene would be v powerful – great example is Reelin: http://en.wikipedia.org/wiki.... - Scott Edmunds
To make it more reliable need to rank quality of edits/editors. Novartis wikipedia entry said "company name is derived from old Greek and means "destroyer of birds". [false] - Scott Edmunds
Hairball backlash! Argument these are mostly decorative – AS made a better visualisation linking top 100% genes, most active editors and GO terms. - Scott Edmunds
After lunch first talk from Mary Shinoyama (Rat Genome Database http://rgd.mcw.edu/) on “Using multiple ontologies to annotate and integrate phenotype records from multiple sources.” - Scott Edmunds
Now up is Ben Good – linking genes to diseases with a SNPedia-Gene wiki mashup. - Scott Edmunds
SNPedia http://www.snpedia.com/ has no info on gene function, so combine with Andrews Gene Wiki using mediawiki API. - Scott Edmunds
SNPedia has “Medical Condition” category, but used NCBO annotator as more accepted link to disease ontologies. Enhanced disease pages in gene wiki with tables on related genes and SNPs. - Scott Edmunds
Easy to integrate many of these types of applications as many share the same API - Scott Edmunds
Ravensara Travillian couldn't make it, but has a stand-in on "a vertebrate bridging ontology VBO". See: http://vbo.sourceforge.net/ - Scott Edmunds
Paolo Ciccarese next with DOMEO: a web based tool for semantic annotation of online documents. No more notes for now as battery about to die. - Scott Edmunds
Susanna Asante overcome serious powerpoint conversion issues to present on the Biosharing network: http://www.biosharing.org/. Discussed it's evolution and relationship to ISA-tab and MIBBI - Scott Edmunds
Trish Whezel next with an update on collaborative development of ontologies using webprotege and Bioportal. - Scott Edmunds
Michael Schroeder talk from the morning session rescheduled, so now presenting: Maximum-Entropy for Annotation - Scott Edmunds
Works by MESH terms (4078 for disease, anatomy, etc). Ambiguity = more docs in pubmed and web. - Scott Edmunds
After an interactive session around the ontology tools it’s now time for final flash updates. First up is Astrid Laegreid with Automated Assessment of High Throughput Hypotheses on Gene Regulatory Mechanisms Involved in the Gastrin Response. - Scott Edmunds
Next is Fidel Ramirez on a new search method to mine biological data. Input gene/protein annotations. - Scott Edmunds
Can discover new disease annotations by using functional similarity using OMIM data. See http://biomyn.de - Scott Edmunds
Next is Warren Kibbe on: Coupling disease and disease genes using Disease Ontology (DO), NCBI GeneRIFs and the NCBO annotator service. See: http://doga.nubic.northwestern.edu - Scott Edmunds
Up next is Anna Zhukova on KiSAO: Kinetic Simulation Algorithm Ontology. Follows MIASE guidelines. - Scott Edmunds
Enables accurate & repeatable execution of computational simulation tasks. Page here: http://biomodels.net/kisao/ - Scott Edmunds
Now up is Jon Ison (EBI) on EDAM Ontology for bioinformatics tools and data. EDAM = Embrace Data and Methods Ontology (not classifying Dutch cheese). - Scott Edmunds
Rescheduled talk from Robert Yao on machine learning on a translational biomedical ontology for Alzheimer’s disease. - Scott Edmunds
Internet problems this morning, but already had talks from Matthew Horridge on the state of biomedical ontologies (from a logic based perspective), Robert Stevens on Exploring Gene Ontology Annotations with OWL, and now up is Stefan Schulz on Records and Situations. Integrating contextual aspects in clinical ontologies. - Scott Edmunds
First talk of the final afternoon was Nils Grewe with a talk on “Relating Processees and Events for Granularity-neutral modeling. - Scott Edmunds
Janna Hastings next with a talk on "Processes and Properties". Used heart rate modeling as an example. - Scott Edmunds
Further Flash updates from Janna Hastings (ChEBI) and Julius Jacobsen on an ontology integrating Uniprot-Macie+Catalytic Site Atlas. 5/6 Ontology updates were from the EBI. - Scott Edmunds
Final invited data-mining talk from Andrew Chatr-Aryamontri and Martin Krallinger on detecting associations between scientific articles and ontology terms – the Molecular Interaction Ontology and BioCreative text mining challenges experience. Work on BioGrid database. - Scott Edmunds
See: http://thebiogrid.org/. Lots of similar PPI databases – federated in IMEx consortium. Using compatible models allows sharing of curation load. - Scott Edmunds
ISMB/ECCB
3Dsig: Structural Bioinformatics & Computational Biophysics Satellite Meeting
Yves Dehouch: Prediction of thermal vs. thermodynamic stability changes upon mutagenesis in proteins. Statistical potential for prediction of stability changes upon mutations in proteins. Potential contains terms based on amino acid distance distribution, solvent accessibility and so called coupling (residues have different configurations in protein surface vs. core) - Anne Tuukkanen
Fast method, thousands of mutations evaluated in a second - Anne Tuukkanen
Developing a temperature-independent version of the potential for more accurate/consistent thermal stability vs. melting temperature prediction - Anne Tuukkanen
Anne Goupil: Computational scanning mutagenesis of proteins and protein interactions. Design studio 3.1 contains new protocols: Calculation of effect of mutation on binding affinity in protein complexes and effect of mutation on stability. - Anne Tuukkanen
CHARMM based algorithm using modified CHARMM force field, Generalized Born implicit solvent (also implicit membrane modeling possible) - Anne Tuukkanen
Keynote 2: Rebecca Wade, Insights into molecular recognition from simulation of protein diffusion. Protein-protein docking: Complex structure prediction with SDA method using first rigid body docking with BD, then select representatives, and in the end flexible docking using MD. Experimental contraints used in the process. - Anne Tuukkanen
Multiple protein simulations with SDAMM. Application on hydrophobin I. Found encounter complexes of tetramers that have same shape as crystalized tetramer, but not fully bound conformation. - Anne Tuukkanen
Prosurf: computational toolbox for protein surface docking . BD, QM, MD and experiments combined. - Anne Tuukkanen
Roland L. Dunbrack: Identifying biologically relevant interactions in protein crystals. - Anne Tuukkanen
Biological unit vs. asymmetric unit problem. ProtCID procedure done on whole PDB. Sequences qrouped using PFAM. Chain architectures and pair architectures are compared. Interface comparisons inside each group. Clusterig of interfaces by hierarchical clustering. - Anne Tuukkanen
Guido Capitani: Is it biologically relevant? An evolutionary method for distinguishing biological interfaces from crystal contacts. - Anne Tuukkanen
Biologically relevant interfaces are the result of evolution, but crystal contacts are not. Hence, biological interfaces have a detectable signal. Core-Rim Ka/Ks ratio used as a measure of selection pressure. - Anne Tuukkanen
Christine Orengo talks about enzyme evolution with specificity changes in their binding regions - Hedi Hegyi
Keynote 3: Christine Orengo,Sub-classifying relatives in CATH domain structure superfamilies to explore protein function evolution - Anne Tuukkanen
a lot of effects by a new interacting domain - Hedi Hegyi
Funtree developed by Janet Thornton - Hedi Hegyi
HUP CATH superfamily tree - 6 distinct structural clusters - Hedi Hegyi
CORA and FugueAli str and seq alignment methods - Hedi Hegyi
FLORA - identifies conserved and distinct structural elements - Hedi Hegyi
FLORA algorithm studies common/distinct features within a family - Anne Tuukkanen
Changes in conserved residues in the active sites results in totally different enzymes - Hedi Hegyi
Optimization of FunFams with SFLD Structure-Function linkage database - Hedi Hegyi
FunFam - optimisation and validation using structure function linkage database (SFLD), homology models can be built using this data with low seq. id. - Anne Tuukkanen
what does domain function mean? good point - Hedi Hegyi
Next release of CATH will contain also SNPs, conservation, functional site information - Anne Tuukkanen
Nicholas Furnham: Investigating enzyme evolution in structurally defined superfamilies - Anne Tuukkanen
half of all proteins are enzymes? - Hedi Hegyi
FunTree pipeline overview - Hedi Hegyi
FunTree pipeline - domain view and seq. view combined, phylogenetic tree generated and processed as well as annotated with additional data such as KEGG, catalytic site atlas etc. - Anne Tuukkanen
Similarities based on small molecules (reactions to them or are they metabolites?) - Hedi Hegyi
Julian Gough, Virosphere-specific protein folds - Anne Tuukkanen
the virosphere is too often overlooked - Anne Tuukkanen
several viral specific domain families exist (63), found in all major functional classes of viruses and most fold classes - Anne Tuukkanen
Noah Ollikainen: Structure based prediction of natural residue covariation using computational protein design - Anne Tuukkanen
Residue covariation is a general property in natural protein sequences, want to use this in computationally designed sequences. Backbone flexibility increases similarity between designed and natural covariation. Amino acid composition is similar in designed and natural sequences. - Anne Tuukkanen
Andrew J. Bordner, Orientation-dependent backbone-only scoring functions for protein design - Anne Tuukkanen
Keynote 4: Charlotte Deane, Modelling of Membrane Proteins - Anne Tuukkanen
Medeller outperforms Modeller on membrane proteins. Using membrane protein specific substitution matrix in scoring. - Anne Tuukkanen
Loop modelling done with FREAD and using membrane protein specific database. - Anne Tuukkanen
There are now enough data on membrane protein structures that can be used to parametrize MP specific tools - Anne Tuukkanen
Alpan Raval, Homology model refinement via long all-atom molecular dynamics simulations. They used all-atom CHARMM FF, well-sampled 100 microsecond simulations, explicit solvent model. Study shows that force field errors are still a limitation. - Anne Tuukkanen
Alessandro Pandini: Detection of allosteric signal transmission by informationtheoretic analysis of protein dynamics - Anne Tuukkanen
MD simulation used to sample conformational space of a protein, small fragment conformations (structural aplhabet) observed at different time points, network of correlated fragment changes studied. - Anne Tuukkanen
They found major similarities in dynamics of homologous domains - Anne Tuukkanen
Keynote 6: Ruth Nussinov,Structural proteome scale prediction of protein-protein interactions using interfaces. - Anne Tuukkanen
A step towards adding time dimension in protein-protein interaction networks. Interaction interfaces predicted on monomers,which interactions can take place simultaneously are studied. - Anne Tuukkanen
Hegyi Hedi, The relationship between proteome size, structural disorder and organism complexity. G-value paradox: complexity does not correlate with gene number. Resolved: one should consider I-value (information content). Protein families expanding in evolution are more disorderd. - Anne Tuukkanen
Keynote 7: Torsten Schwede, Sins and Virtues in Protein Structure Homology Modelling - Anne Tuukkanen
Thorsten Schwede's current challenges for Homology Modelling to make the models easily usable for end users: 1) absolute local error estimates; 2) models including cofactors and local detail; 3) oligormers - Andrea
Keynote 7: Thorsten Schwede, Sins and Virtues in Protein Structure Homology Modelling. Why to use automated homology modelling? Modeling experts do not scale very well, you can't fight the data deluge - Anne Tuukkanen
ISMB/ECCB
HiTSEQ 2011 SIG: High Throughput Sequencing
Kick off with Keynote 1 - Pamela Hoodless from Department of Medical Genetics, University of British Columbia. title -"From genomics to embryology " - Saravanamuttu Gnaneshan
Introduction to embryology and especially on liver development followed by need to generate new liver cells for transplantation. - Saravanamuttu Gnaneshan
Comparision of hepatoblast and hepatocyte transcriptomes - 14% genes differentially expressed. - Saravanamuttu Gnaneshan
FoxA2 and HNF4a comparisons - Saravanamuttu Gnaneshan
Bioinformatics challenge from a biologist point of view - Need effective data intergration tools - Saravanamuttu Gnaneshan
Talk: Ali Mortazavi ChIP-seq regulatory analysis using ChIA-PET - Shannon McWeeney
Overview - abundance of directed graphs in biology - Shannon McWeeney
goal: chip-seq to regulatory networks. Can id 1000s of binding sites - which are functional? which genes do they regulate? - Shannon McWeeney
Simonis 2007 - can we connect distal chip-seq peaks to their target? - Shannon McWeeney
CHIA-PET (Nature 2009 - estrogen receptor). Ongoing collaboration with Yijun Ruan @ GIS. Same steps as ChIP-seq in beginning. Then add linkers, liagte them, cut enzymatically and map to look for overlaps. - Shannon McWeeney
CHIA-PET vs 5C vs Hi-C 5C - high resolution/local; HI-C - global but low resolution - Shannon McWeeney
ChIP-seq Regulatory Analysis using connections. Map ChIA-PET between peaks. Convert into graph - Shannon McWeeney
Graphs: datasets form local graph components - no mega-components. Direct connections clear meaning. Indirect less clear. - Shannon McWeeney
TSS of highly expressed genes are more likely to be in the chromain interaction graphs - Shannon McWeeney
Many TSS show high connectivity. Restrict to edges only connected to other TSSS - 30.6% coonnected to 2 or more other TSS - Shannon McWeeney
Change in TSS degree correlates with change in expression - Shannon McWeeney
Example: TSS clique in myogenin locus. 12% of expressed genes in promoter cliques -> transcription factories (genes not transcribed singly but as group) - Shannon McWeeney
Myogenin cliques - idea of "TF depots" - Shannon McWeeney
19% of interactions to TSS are more than one TSS away - Shannon McWeeney
TSS form preferential attachment sites within their chormain interaction graphs (CIGS) - Shannon McWeeney
Q: exact or near cliques? A: exact Q:clarification of attachment sites preference A: based on inference from TSS connectivty patterns - Shannon McWeeney
Talk: Andrew Roth: JointSNVMix : A Probabilistic Model For Accurate Detection Of Somatic Mutations In Normal/Tumour Paired Sample Sequence Data - Shannon McWeeney
germline mutations are often mistaken for somatic due to under-sampling in normal data. algorithms that analyze data individually would be more susceptibe to this. Rationale for joint approach - Shannon McWeeney
joint genotype space (gn, gt) - Shannon McWeeney
Benchmark data set - Metrics: dbSNP concordance (surrogate for true germlines)- can explicitly ask if joint analysis reduces number of germline mutations mistakenly called as somatic - Shannon McWeeney
2nd metric: ROC - is there a gain / loss in performance using joint methods? - Shannon McWeeney
model admixture of normal/tumor in tumor sample needs to be addressed. - Shannon McWeeney
need to address temporal aspect - normal -> primary->metastasis - Shannon McWeeney
ISMB/ECCB
SNP-SIG: Identification and annotation of SNPs in the context of structure, function, and disease
Highlight Speaker: Mauno Vihinen, Tampere University (Finland) Genetic variations: origin, effects and prediction. - Venkata P. Satagopam
Usually we think variations as harmful, while most of them are not - Venkata P. Satagopam
Genome size of gene numbers does not directly correlate with complexity of organism - Venkata P. Satagopam
Less than 5% human genomes codes for proteins, rest just called junk DNA, but it is not, still majority of the genome is expressed - Venkata P. Satagopam
Variations increases the total number of genes and gene products - Venkata P. Satagopam
Total no. of variations in genome is not known - Venkata P. Satagopam
Variation type - chromosome set number variation (euploidy) - Venkata P. Satagopam
and chromosome structure variation - Venkata P. Satagopam
Variations at DNA level - binding site - TFs, promotor, start and stop codons and Genome organisation - Venkata P. Satagopam
at RNA level - most important effects on splice sites, generation of novel splice sites, alt splicing, variation destroying splice sites - Venkata P. Satagopam
at proteins level -- Many ways to affect function # functional sites, sub-cellular localization, stability effects, changes in disorder, aggregation, electrostatic pro, interactions, stearic effects - Venkata P. Satagopam
Large genomes projects - Venkata P. Satagopam
the 1000 genomes project www.1000genomes.org - Venkata P. Satagopam
Data management, storage, analysis, access, ethics, integration with other data is a big challenge, but extremely useful - Venkata P. Satagopam
Human Variome Project www.humanvariomeproject.org - Venkata P. Satagopam
Variome - Variation in a genome - Venkata P. Satagopam
a public catalogue of curated variations in each of 20k genes and associated phenotyes/studies - Venkata P. Satagopam
worldwide agreed standard systems, notations - Venkata P. Satagopam
about 2000 LSDBs exist - Venkata P. Satagopam
HGVS www.hgvs.org - Venkata P. Satagopam
Diseases names - OMIM, ICD10 - www.who.int/classifications/icd/en - Venkata P. Satagopam
Variation nomenclature - www.hgvs.org/ - Venkata P. Satagopam
GEN2PHEN project ...pan european effort - Venkata P. Satagopam
Human have 99.9% identical genomes - Venkata P. Satagopam
Genome consists of about 3 billion bases, this means that there are millions of differences, some of them are harmful, but which ones? - Venkata P. Satagopam
how to interpret variations and their effects? NGS and other methods produce very large datasets. Impossible to investigate all cases with experimental approaches - Venkata P. Satagopam
Majority of variations are not harmful, only few are causing dieases - Venkata P. Satagopam
extensive deletions, frameshift variations, nonsense variations and other cases that clearly destroy the function of the gene or product - Venkata P. Satagopam
paper - Thusberg and Vihinen Hum.Mut.2009 - Venkata P. Satagopam
Performance analysis - test data sets needed, it should be representative, unbiased, large enough, not used for training, and have systematic description - Venkata P. Satagopam
Benchmark datasets available from "VeriBench", other one Varia ontology (VariO), AmiVario - Venkata P. Satagopam
validated prediction tools and described in the paper- Thusberg et al. Hum Mutat 2011 - Venkata P. Satagopam
Performance of protein localization predictors - Laurila and Vihinen BMC Bioinformatics; Laurila and Vihinen Amino acids 2011 - Venkata P. Satagopam
case study : BTK kinase domain Mao et al . JBC 2002, Khan and Vihinen In silico biol 2009 - Venkata P. Satagopam
PON-P (Pathogenic-Or-Not - Pipeline) - meta predictor for variations - Venkata P. Satagopam
-------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Christian Schaefer. Technische Universitat Munchen (Germany) Can we predict structural change upon point mutation? - Venkata P. Satagopam
Checking the effect of a point of mutation for example S7M on local structure of the protein - Venkata P. Satagopam
Proteins chains collected from PDB, strctl. similarity measure used : RMSD - Venkata P. Satagopam
Machine learning - seq based features from PredictProtein (www.predictprotein.org/) used - Venkata P. Satagopam
dataset - what constitutes a structural change? structural neutral : RMSD < 0.2 A (13,646 mutants); structural non-neutral RMSD > 0.4 (12,056 mutants) -- 6,409 sequences - Venkata P. Satagopam
some success in learning local structural change ; no success yet in correlation that to change in stability or function. But some intrinsic relation pertaining to stability - Venkata P. Satagopam
----------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Gilad Wainreb. Tel Aviv University (Israel) Protein stability: A single recorded mutation aids in predicting the effects of other mutations in the same amino acid site. - Venkata P. Satagopam
used Bellkor collaborative filtering model, it is previously applied to ecommerce applications - Venkata P. Satagopam
3 model used - baseline estimation model, the neighborhood model, the latent factor model - Venkata P. Satagopam
Pro-Maya algorithm, precision is very high compared to other tools, it is also performs well in its sequence based prediction scheme - Venkata P. Satagopam
------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Piero Fariselli. University of Bologna (Italy) Predicting cancer-associated germline variations in proteins. - Venkata P. Satagopam
Problem - given a residue mutation (variation) in a protein predict if the mutation can be disease associated or neutral - Venkata P. Satagopam
predictors - mutation type and local environment (physical - chemistry); evolutionary information (HMM, profiles); protein function (GO); systemic properties (interactions, regulation, time-dependence...) - Venkata P. Satagopam
protein function has some problems - over-fitting problem (cross-validation of scores is need); - Venkata P. Satagopam
SNPs & GO - the seq. profile and GO-based predictor - Venkata P. Satagopam
our lab has two predictors - PhD-SNP and SNPs&GO - Venkata P. Satagopam
From uniprot a first of 6478 germline variations associated and listed in OMIM - Venkata P. Satagopam
Using blast as predictor is very crude method - Venkata P. Satagopam
methods : neural networks and SVMs - Venkata P. Satagopam
GO terms enriched are related to Cancer - Venkata P. Satagopam
Problems and future: Reliability of the annotations; experimental validation on real cases; extension of different diseases; definition of a pipeline - Venkata P. Satagopam
------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Alain Laederach. University of North Carolina, Chapel Hill (USA) Effects of disease-associated SNPs on the structure of the transcriptome. - Venkata P. Satagopam
alain@unc.edu - Venkata P. Satagopam
Mutations in 5' UTR of HBB causes beta-thalassemia - Venkata P. Satagopam
disease associated SNPs in UTRs are from HGMD - Biobase - Venkata P. Satagopam
Looked genome wide structure stabilizing haplotype - Venkata P. Satagopam
------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Keynote: Burkhard Rost, Technische Universitat Munchen (Germany) Trivial step from predicting the effects of SNPs to medicine. - Venkata P. Satagopam
www.rostlab.org - Venkata P. Satagopam
Lingo: change of amino acid: ns (non-synonymous SNPs), they are central players of diseases - Venkata P. Satagopam
individual medicine : make it connects, genes, epigenetics - Venkata P. Satagopam
get diseases high -throughput manner: GWAS, many times (patients, drugs); env. factors (virus and bacteria) - Venkata P. Satagopam
many challenges: high volutes of data, communication, annotation; function is known for 10-50% human - Venkata P. Satagopam
prediction of nSNPs - some in silico methods SIFT, PolyPhen, SNPs3D - Venkata P. Satagopam
Misfunction/neutral - for machine learning method need lot of data, collected 80k mutations with known effects on functions -- Y Bomberg and B Rost 2007 NAR 35: 3823-35 - Venkata P. Satagopam
Y Bomberg and B Rost 2008 Bioinformatics 24: i207-212 - Venkata P. Satagopam
New directions - in silico alanine scan; comprehensive in silico mutagenesis; prediction of binding hot spots - Venkata P. Satagopam
nsSNP effect: more detail - secondary structure (helix, strand) robust under random mutation , disorder not. Molecular dynamics (MD) also playing a important role. - Venkata P. Satagopam
MD2: new SNPs causing Parkinson's disease - A Zimprich et al. & T Meitinger & Tm Strom (201) - exome seq reveals mutations in the retromer protein vps35 - Venkata P. Satagopam
Disordered regions - eukaryotes dominated disorder (4 - 10x) - Venkata P. Satagopam
prediction methods used MD, IUPred - eukaryota - 36 - 43%, where as bacteria 10 - 13% - Venkata P. Satagopam
Trivial step from prediction the effects of SNPs to individual medicine - connect many resources, and look at the details, so we have to work together. - Venkata P. Satagopam
------------------------------------------------------------------------------------------------------------------------------------- - - Venkata P. Satagopam
Company Presentation: Frank Schacherer, BIOBASE GmbH. SNP analysis with high quality data - Venkata P. Satagopam
only 56% of known disease mutations are Mis/Non-sense splice mutations - 22% of min/nonsense mutations estimates to affect alt splicing and 10% of all diseases causing mutations are splice site mutations - Venkata P. Satagopam
Biobase contains manually curated data, the quality of data is very high - Venkata P. Satagopam
HGMD professional - completed inherited disease mutations ... - Venkata P. Satagopam
Nice paper listing different mutation databases Baralle et al EMBO report VOL 10 No 8 2009 - Venkata P. Satagopam
email : frank.schacherer@biobase-international.com - Venkata P. Satagopam
----------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Highlight Speaker: Atul J. Butte, Stanford University (USA) Data-driven Personalized Medicine - Venkata P. Satagopam
addressing the medical use of personal genome - Venkata P. Satagopam
Ref genome contains with 2.8 million SNPs, 752 copy number variants - Venkata P. Satagopam
3 genes TMEM43, DSP, MYBPC3 associated with sudden cardia death ... Ashley et al (2010), Lancet 375:1525 - Venkata P. Satagopam
Current SNP-disease DBs are too limited for application to a human genome - Venkata P. Satagopam
Due to the current limitations with the existing dbs (mess of IDs), developed a new dbs from 5k manually curated papers - Venkata P. Satagopam
Curation challenges - most findings are in tables and figures making NLP hard - Venkata P. Satagopam
SNPs were reported as from Positive strand and negative strand - Venkata P. Satagopam
There is a quite difference between Odds ratio (OR) and Likelihood ratio (LR) - Venkata P. Satagopam
Fagen TJ. Nomogram for Bayes theorem N Rngl J Med 1975 Jul 31;293(5):257 - Venkata P. Satagopam
Medical practitioners using LR since ages. - Venkata P. Satagopam
Pacific Biosystems: sequencing human genome in 15 minutes - 20TB in 15 minutes - Venkata P. Satagopam
During famine no body obese, so environment playing a import role along genome - Venkata P. Satagopam
Ontology-driven Indexing of Public Datasets for Translational Bioinformatics N. H. Shah, C. Jonquet, A. P. Chiang, A. J. Butte, R. Chen, M. A. Musen BMC Bioinformatics, Volume 10. Published in 2009 - Venkata P. Satagopam
Family history (genomics) is also very import to understand the diseases - Venkata P. Satagopam
human toxome project - Venkata P. Satagopam
He recommend a book - experimental man by david ewing duncan - Venkata P. Satagopam
----------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Konrad Karczewski. Stanford University (USA). Assessing Functional and Clinical Significance of Regulatory Variants. - Venkata P. Satagopam
SNPs in coding regions slightly high - Venkata P. Satagopam
In whole genome 7.5% SNPs found in TF binding sites, so SNPs can affect the TF binding site .....Karczewski et al 2011 PNAS in press - Venkata P. Satagopam
SNPs in NFkB regions are associated to more disease compared to others regions - Venkata P. Satagopam
Disease ~ SNP ~ Binding ~ Expression ~ Disease .. gave example SNP associated to AIDS susceptibility and Asthma - Venkata P. Satagopam
----------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Joel Dudley. Stanford University (USA). Evolutionary meta-analysis reveals ancient constraints affecting missing heritability and reproducibility in disease association studies. - Venkata P. Satagopam
Varimed : manually curated database - Venkata P. Satagopam
---------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Adam Frankish. Wellcome Trust Sanger Institute, Hinxton (UK) Is Understanding Alternative Splicing IPotential Functional Effects of SNPs? - Venkata P. Satagopam
GENCODE rich in Alt splicing - Venkata P. Satagopam
LoF project - Loss of function varients - Venkata P. Satagopam
Catalogue of LoF variation created from all three 1000 genome pilot projects - Venkata P. Satagopam
Manual annotation reveal .. ~36% (213/597) putative LoF variants present in AS exons - Venkata P. Satagopam
Artefacts - 44 putative genome seq errors. 243 errors in gene models (<5% pre-existing manually annotate models) - Venkata P. Satagopam
MS data, and published evidence can help to determine where the alt splicing is functional - Venkata P. Satagopam
GENCODE manual annotation provides a rich set of alt splice variants - Venkata P. Satagopam
Loss of function should be considered at the level of transcription, not at the locus level - Venkata P. Satagopam
www.gencodegenes.org version v8 - Venkata P. Satagopam
---------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Sudhir Kumar. Arizona State University, Tempe (USA). Comparative Genomics as an Evolutionary Telescope foMedicine to Peer into the Universe of Human Mutations. - Venkata P. Satagopam
Phylomedicine .... Kumar et al 2011 - Trends in Genetics coming in September issue - Venkata P. Satagopam
Genomic Medicine - Healthcare tailored to the individual based on genomic information - Green et al (2011) Nature - Venkata P. Satagopam
Interested in Protein alleles (nSNVs) - Venkata P. Satagopam
$100 exome may come soon - Venkata P. Satagopam
Number of alleles are increasing - Ng et al 2010 Human Mol Biol - Venkata P. Satagopam
Green et al 2011 Nature 470, 204 - 213 ... very nice paper related genomic medicine - Venkata P. Satagopam
At this moment Genome medicine lost in variation space, so the evolutionary history can help, at least provide first glimpse. Phylomedicine can help - Venkata P. Satagopam
Evol. telescope, two pieces of glass - Evol divergence and evol conservation - Venkata P. Satagopam
The evol rate is directly proportional to number of SNPs - Venkata P. Satagopam
Mol evolutionary and phylogenetic information is going to be crucial in improving diagnosis accuracy; Long term evolutionary properties can be estimated with high accuracy a priori - Venkata P. Satagopam
---------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Keynote: Steven Brenner, University of California, Berkeley (USA). CAGI Experiments - Venkata P. Satagopam
CAGI- Critical Assessment of Genome Interpretation - www.genomeinterpretation.org - Venkata P. Satagopam
CAGI 2010 (http://cagi2010.org ) - 6 datasets, 108 prediction submissions etc - Venkata P. Satagopam
Personal genome project (PGP) - predict individual's phenotypes - Venkata P. Satagopam
riskSNPs: Mechanisms underlying Disease Associated Loci - Venkata P. Satagopam
WTCCC Loci, number of loci related to 7 diseases (Bipolar disorder, coronary artery disease, Crohn's disease, Rheumatoid arthritis, Type 1 and Type 2 diabetes) - Venkata P. Satagopam
CAGI 2011 - datasets - Breast cancer cell line pharmacogenomic dataset - 54 cell lines - 54 disease cell lines and several other data sets .... look at www.genomeinterpretation.org - Venkata P. Satagopam
Justin H. Johnson
Ion Torrent's Data Quality Is Pretty Good (and Better Than Ion Claims) http://omicsomics.blogspot.com/2011...
Rajarshi Guha
has anybody visualzed networks with ~ 30K nodes and 1.5M edges? What tools have you used? Were they interactive? Or batch mode? Pointers would be appreciated
I think the most important bit is here how to summarize the networks, rather than visualizing all edges... - Egon Willighagen
Try networkx for summary/exploration. Otherwise Cytoscape might work if you've got a system with tons of memory; I've used it to look at compound similarity across a chemical library. - Donnie Berkholz from BuddyFeed
Not sure that visualization will necessarily but that useful at that scale. Could you do dimensionality reduction by PCA or something? - Mr. Gunn
not sure that dimension reduction or PCA makes sense for this type of data - the connectivity is what I'm after (which I can obviously characterize via various metrics). Might have to go in to get subsets etc. But also, the visualization is more to get a (hopefully) pretty picture than anything else - Rajarshi Guha
I never had to look at any network of that size. I am not sure you can get much from looking it. I would try to reduce the complexity in some way. If you have genes/proteins you could bring it down to complexes/pathways/GO terms. - Pedro Beltrao
I would happily recommend igraph, callable from python, and also very nice integration as an R package. The interactive plotter works only for small graphs; you'll want to experiment with the layouts if your goal is to get a pretty picture (e.g. the Fruchterman-Reingold layout). Looking at network properties (degree distribution, assortativity, clustering coefficient, etc, is also very straightforward.) - Michael Krein
Thanks, I second your recommendtion - I'm using igraph to get the numeric metrics and it's sweet! - Rajarshi Guha
+1 iGraph. Have you tried Gephi http://gephi.org/ Gephi is interactive, I have used it to visualize a similar bignetwork using Gephi. Also you could check the alternate way to visualize the networks as linear networks using Hive http://mkweb.bcgsc.ca/linnet/. - Khader Shameer
Try this one for more information about using Graphviz for very large network graphs http://www2.research.att.com/~yifanh... . Also Python NetworkX (http://networkx.lanl.gov/) is quite robust, some one has already done a Hadoop plug for the NetworkX, I cant find URL though. - Abhishek Tiwari
The 30,000 nodes shouldn't be a problem. Showing the 1.5M edges creates a visual mess at the overview level. I found it best to leave them out of the projection all together, since proximity often shows the same macroscopic relationships Perhaps use semantic zoom and/or edge bundling to show them only when zooming in. For algorithms LGL and OpenOrd would be good. OpenOrd is implemented... more... - Don Pellegrino
@don, thnaks for the pointer. I think the general view that looking at the netire network is likely not useful - subsets are easier to view and work with. igraph is performing like a champ! - Rajarshi Guha
Michael Barton
GitWrite - blogging for nerds - http://gitwrite.com/
looks like a potentially great platform for a lab notebook... - Carl Boettiger
Sounds an awful lot like http://pages.github.com/ GitHub Pages is an awesome platform for hosting documentation. For example, we use it to host http://gunicorn.org from this GitHub branch: http://github.com/benoitc... - Paul J. Davis
There's a great desktop/cloud notebook app to be written using git as a back end. Anybody got into the guts of this and seen what is different if anything under the hood? - Cameron Neylon from twhirl
I know that the Mercurial vs. Git debate seems to have been won by git in the Open Science community, but if you want a saner life you should look at http://hatta-wiki.org/ (Also see http://www.mzlinux.org/node... for a list of Mercurial/Git backed wiki engines.) True nerds should consider an Emacs Org mode backed blog or wiki http://orgmode.org/worg... which can also be combined with revision control. - Matt Leifer
Just thought about an Emacs org-mode + git solution, too. Especially as it can include executable code snippets via Babel: http://orgmode.org/worg... (at the bottom of this page you can find a description how this can be used for reproducible research). - Konrad Förstner
The server software used for display must be available for the content to be truly portable. Would be happy to see even if only basic features were available so far. - Mike Chelen
Hatta looks good, wonder how difficult it is to get set up? - Mike Chelen
Ended up using Github's Git-backed wikis https://github.com/blog.... Now if only there were a way to allow comments similar to blog posts... - Mike Chelen
Cameron Neylon
A collaborative proposal on research metrics - http://cameronneylon.net/blog...
When we talk about open research practice, more efficient research communication, wider diversity of publication we always come up against the same problem. What's in it for the jobbing scientist? This is so prevalent that it has been reformulated at "Singh's Law" (by analogy with Godwin's law) that any discussion of research practice will inevitably end when someone brings up career advancement or tenure. The question is what do we actually do about this? n opportunity has arisen for some funding to support a project here. My proposal is to bring a relevant group of stakeholders together; funders, technologists, scientists, adminstrators, media, publishers, and aggregators, to identify needs and then to actually build some things. - Cameron Neylon
Wonder whether Andrew Treolar might have insight to contribute http://andrew.treloar.net/ - Kubke
AT certainly has experience of the problem and also seems to be collecting a lot of data... - Cameron Neylon
He was strong about making data sets a primary citable object at the data matters meeting where I met him, to make the 'use' of the data as trackable as any publication - Kubke
Yep, I think that's a key theme. Need to make sure have Dryad and Datacite involvement with this. - Cameron Neylon
He suggested giving the 'data' a DOI and tap onto the search/link/tracking that already exists for paper publications as something which could provide an initial viable solution - Kubke
Essentially this is what the DataCite project are doing. I think Dryad is connected with them as well. I worry about using dois for this because I would rather URLs [ducks in case geoff bilder arrives to pummel me (-; ] but the reality is that the scientific mindset seems to be doi = real so I think it's going to be the way we move forward in practice. - Cameron Neylon
The redirection mainly. Essentially it breaks a whole set of fun things that you can do with URLs. None of which to be fair are very commonly done. Also in principle they could be enabled through the doi redirect. Essentially its a minor quibble that it doesn't add anything technically and breaks some stuff. But the reasons for dois are fundamentally social not technical. - Cameron Neylon from twhirl
I'm a big DOI fan. Just yesterday I ran into a few duplicate papers in the Mendeley library. One reason is that there are several URLs associated with a paper: PubMed, Journal HTML page, Journal PDF, institutional repository URL,... Not to mention that many URLs break over time. - Martin Fenner
And DOIs would really help to automatically find science blog posts associated with a paper. The Journal of Neuroscience editorial about supplementary information in August has no DOI. Almost impossible to find all blog posts (there are many) that talk about this editorial. - Martin Fenner
All true but I guess my problem is that fundamentally dois are a special system so that generic web tools (and yes particularly semantic web tools) will break over them and the info won't get incorporated into the wider web. e.g. Martin's problem could be solved by an owl:sameAs which would mean that it would be transparent to other web systems. But with dois someone has to build... more... - Cameron Neylon
None of which changes the reality. DOIs are dominant and are here to stay as far as I can tell so I just need to get over it and move on... - Cameron Neylon
Just added a comment and we need to do more than offering incentives for "re-use". - joergkurtwegner
While I am a big fan of finer-grained measures of contribution (e.g. peer review), I am not convinced that the current proposal should be stretched to cover such measures. It seems to me that what is wanted here is a hard-nosed, outcome-focused project designed to answer hard questions about ROI. For that purpose, a close focus on re-use is probably optimal. - Bill Hooker
Nice proposal. It currently focuses strongly on funder needs. Useful, perhaps necessary, but it means delayed and indirect rewards to the jobbing scientist. Would it make sense to work in a more direct focus on "jobbing scientist needs" too, with possible hacks of automated reuse-metric CVs and reuse-metricful tenure packages? - Heather Piwowar
Joerg - I commented on your comment back at the proposal. Two things really. My view is that the "measuring re-use" idea covers things like review. A good review needs to be "cited" in some way and that re-use measured and rewarded. I'm not quite sure where you're going with the micropayments thing. A kind of micropayments has been tried by EPSRC in the UK where they pay £50 or... more... - Cameron Neylon
Heather, good point. Although the response I've got suggests that people are interested enough in the possibility of getting credit for a more diverse range of things that thats enough at the moment. But yes, the CV is a very good place to realise some benefits. Need to talk to someone inside VIVO I think to see if there is some low hanging fruit there. - Cameron Neylon
@Bill , @Cameron - I understand your views and the ROI question is important. As long as we agree that there is some measure, recognition, and reward scheme I agree. So, a re-use needs to be tracked otherwise, we need to prevent ab-use of re-use. - joergkurtwegner
So if I'm reading you right your core point is that we shouldn't assume that simply by measuring something that the rewards will flow from that. That it is important to consider explicit reward schemes, which might include payment or aggregated micropayment? - Cameron Neylon
Actually it strikes me that both Joerg and Heather are pointing in the same direction here. Keeping up the issue of direct benefits. - Cameron Neylon
Cameron, what is your timeline for the proposal and what sort of additional help would be useful at this point? The current draft doesn't have references: do you need any? Other help? - Heather Piwowar
References (both to literature and to other projects) would be a big help. Contacts with funders would be useful if anyone has them. And proof reading still valuable. I'm currently checking with the funder whether this is heading in the right direction and then I'll start pulling in contributors more. Timeline is really about a week though the quicker the better I suspect. - Cameron Neylon
Also anyone got a good contact on the inside of VIVO? I know that Mackenzie Smith is on their advisory board but don't have any good contacts on the inside. It would be an obvious source of data. - Cameron Neylon
Sounds good. I've added (REF) indications where it seems like references might be appropriate, to facilitate crowdsourcing. Cameron, please edit as needed based on the intended document length and number of refs expected by the funding body? - Heather Piwowar
Yep, still working on those last two questions... ;-) - Cameron Neylon
what came of this? - Claudia Koltzenburg
Got the money, running the workshop: http://beyond-impact.org - Cameron Neylon
Jeremy Leipzig
what fraction of bioinformatics programs use MapReduce: .2%? use MPI: 2%?? are threaded: 20%???
Alejandro Montenegro
Manuel
Personal Genomic Software: A Review of What Is Available -- http://manuelcorpas.com/2011...
Other ways to read this feed:Feed readerFacebook