One well turned phrase is worth a thousand power points....
- Shannon McWeeney
Almost 50 years after he started his undergrad in Cambridge in geology to venture into paleontology. His developing interest in zoology was not matched by the respective department. Was asked to be the chairman of the department.
- Roland Krause
PhD on Drosophila in Cambridge, PostDoc at Caltech.
- Roland Krause
Stresse the importance of God-father for young scientists as role model etc.
- Roland Krause
No problems in funding in the 60s and early 70s.
- Roland Krause
Six month in Bruce Alberts lab. # not so easy to keep up with all the great people that Michael Ashburner worked with
- Roland Krause
No knowledge of repetitive DNA, formulation of the C-value paradox.
- Roland Krause
Work on Drosophila alcohol dehydrogenase spanning 20 years.
- Roland Krause
Sequencing of ADH of 4 species using radionucleotides led to a PhD and a Nature paper. Almost no software available, major hardware incompatibility. Only one ARPAnet node in Europe at the University London.
- Roland Krause
No relational integrity, lots of integrity. Moaning about it led to a position on the advisory board.
- Roland Krause
Promoter of the establishment of the EBI at the Cambridge site. No genomics at the EMBL in the 80s. Raised 30 M GPB to convince the EMBL council.
- Roland Krause
Flybase establishment. Built in Sybase, output in files, distribution via Gopher. Later, contact with Amos Bairoch and Expasy led to use of a webserver.
- Roland Krause
We need to look not only at proteins but also at the small molecules, the metabolites.
- Barb Bryant
Plants have way more metabolites than we do.
- Barb Bryant
Cheminformatics is older but smaller than bioinformatics; largely confined to industry. The tools are not freely available, with notable exceptions.
- Barb Bryant
Differences between the proteome and the metabolome, e.g. no evolution and hierarchical structure of metabolites.
- Roland Krause
"Way back in the 90s" they were trying to define the reactome - the reactions necessary for life.
- Barb Bryant
From the proteome and the metabolome to the reactome: How many reactions are necessary for life?
- Roland Krause
Enzymes are important part of biological molecular reasons
- Venkata P. Satagopam
Enzymes are called by name and EC number.
- Roland Krause
Predicting enzyme function automatically: most powerful and most popular method is to recognize a homologue and transfer functional annotation.
- Vangelis Simeonidis
EC numbers explained: they conform to the following format: C.SC.SSC.SN
- Vangelis Simeonidis
The classification of enzymes are four-part: classes, subclasses, sub-subclass, serial number (typically the substrate)
- Roland Krause
where: C = Class, SC = Sub-class, SSC = Sub-subclass, SN = Serial number
- Vangelis Simeonidis
EC numbers do not capture the mechanism of the enzyme.
- Vangelis Simeonidis
Capture only the chemical level, no biological dependence such as co-factors
- Roland Krause
There is no one to one relationship between EC numbers and protein families
- Venkata P. Satagopam
They wanted to build tools that would handle the actual chemistry.
- Barb Bryant
There has been a lot of work in the past 10 years in tools to handle the chemistry. Includes Kanehisa 2004, Gasteiger 2008, Aris-De-Sousa 2008, Schomburg 2010. Unfortunately, most of the software isn't freely available, and only tackles part of the problem.
- Barb Bryant
There is a huge literature on comparing small molecules to each other. So that's well covered.
- Barb Bryant
They also needed to map the atoms from each side of the equation to each other: atom-atom mapping. This works by matching the largest common moiety first, and iterating. The Mesa (?) database of about 300 reactions is a gold standard to check the quality of the mapping.
- Barb Bryant
You need to be able to compare reactions to each other - reaction similarity.
- Barb Bryant
To describe the changes in the bonds that take place, you use the Dugundji-Ugi model -- you make a matrix showing the bonds for reactants and products; subtracting the matrices gives you the reaction matrix.
- Barb Bryant
EC-BLAST created by Syed Asad Rahman; it allows you to compare reactions by bond similarity, reaction centre similarity or substrate structure similarity.
- Barb Bryant
Chemicals have several fingerprints bond change, structure, stereo fingerprint
- Venkata P. Satagopam
(See KillerApp talk I think Tues 11:45am)
- Barb Bryant
"This heatmap might look good to you, to me it looks fantastic!" Similarity between substrates is now close the EC classification. Differences might be based on the EC classification.
- Roland Krause
Why are some structures capable of so many different enzymatic functions? Which are the residues that led to change of function?
- Roland Krause
Examples from the Phosphatidylinositol-Phosphodiesterase-Superfamily, a multi-domain protein family.
- Roland Krause
They looked at the multi-domain architecture of the phosphatidylinositol-phosphodiesterase superfamily. Adding new domains doesn't add enzyme function to members of this family.
- Barb Bryant
One need to understand the evolution to better understand the EC classification
- Venkata P. Satagopam
The tree constructed from structure has three main groups. Branches of the tree are distinguished by differences in substrate, product, presence of a metal co-factor, or mechanism.
- Barb Bryant
Matrix showing how frequently there are evolutionary changes within and between classes. Evolution tends to create new enzymes within the same class, having the same mechanism but changing the substrate or product.
- Barb Bryant
Most of the enzyme evol happening in the last sub class level
- Venkata P. Satagopam
Question from the floor: is this an opportunity to abandon the EC classification method and move on to a better one? Answer: no. The EC structure is very sensible. Also, it is powerful because everybody uses it. Also, in the first class we examined, it matches pretty well to the similarity measure we developed.
- Barb Bryant
Question: sometimes you have a huge protein to carry out a single small reaction. Have you noticed any clues to why this happens? A: we have some thoughts related to protein function. First, most proteins are multi-functional. They interact with other proteins and do other sorts of things. Secondly, some of the substrates are quite large. We have a sort of domino theory of enzyme...
more...
- Barb Bryant
Redundancy in genomics can be exploited --> CaBLAST. Works on compressed data. Size of compressed DB is proportional to the size of non-redundant data
- Iddo Friedberg
coarse analysis on compressed data - refined analysis on relevant regions
- Shannon McWeeney
Kicked off by Olga Tcheremenskaia on OpenTox Predictive Toxicology Framework: toxicological ontology and semantic media wiki-based OpenToxipedia (see: http://www.opentox.org/opentox...)
- Scott Edmunds
Keeping on a toxicology/pharmacological theme, Paea Le Pendu next up with "Annotation for Testing Drug Safety signals."
- Scott Edmunds
Enrichment analysis for off-label use of drugs, e.g. Avastin (normally cancer drug) – can see what other uses people use it for (Maxular degeneration, etc.).
- Scott Edmunds
No show for the 3rd talk (boo), so extended coffee break to let people put up posters.
- Scott Edmunds
After a refreshing break Chao Pang (EBI) has started the session talking about the Coriell Cell Line ontology: rapidly developing large ontologies.
- Scott Edmunds
>2000 cell lines. 93 organisms map OK to NCBI taxonomy ontology. 11 cell types and 61 anatomy types map OK to EFO ontology, but 337 disease types map a bit with OMIM but don’t have direct ontologies,
- Scott Edmunds
Is a large ontology (>28,000), but easy to add new classes/cell-types.
- Scott Edmunds
Now up is James Eales with a kidney disease data mining talk: An exercise in kidney factomics: from article titles to RDF knowledge base.
- Scott Edmunds
Why titles: succinct, easy to collect, hard to lie, your advert to the world.
- Scott Edmunds
Found ~86,000 titles for "renal" or "kidney".
- Scott Edmunds
Keynote talk now from Andre Su (Scripps) on cultivating and mining the gene wiki for crowdsourced gene annotation. One of the few wikis and crowdsourcing efforts that works.
- Scott Edmunds
Of genes in pubmed: 59% have <5 entries, 38% have none. Poorly annotated because sparsely curated.
- Scott Edmunds
Wikipedia best example of how to utilise "long tail" of internet users, and is generally quite accurate. Currently ~10,000 gene “stubs” within Wikipedia.
- Scott Edmunds
For something like Fibronectin – 28,000 articles in pubmed that can be integrated into 1 article. Community writing of review articles for every gene would be v powerful – great example is Reelin: http://en.wikipedia.org/wiki....
- Scott Edmunds
To make it more reliable need to rank quality of edits/editors. Novartis wikipedia entry said "company name is derived from old Greek and means "destroyer of birds". [false]
- Scott Edmunds
Hairball backlash! Argument these are mostly decorative – AS made a better visualisation linking top 100% genes, most active editors and GO terms.
- Scott Edmunds
After lunch first talk from Mary Shinoyama (Rat Genome Database http://rgd.mcw.edu/) on “Using multiple ontologies to annotate and integrate phenotype records from multiple sources.”
- Scott Edmunds
Now up is Ben Good – linking genes to diseases with a SNPedia-Gene wiki mashup.
- Scott Edmunds
SNPedia has “Medical Condition” category, but used NCBO annotator as more accepted link to disease ontologies. Enhanced disease pages in gene wiki with tables on related genes and SNPs.
- Scott Edmunds
Easy to integrate many of these types of applications as many share the same API
- Scott Edmunds
Paolo Ciccarese next with DOMEO: a web based tool for semantic annotation of online documents. No more notes for now as battery about to die.
- Scott Edmunds
Susanna Asante overcome serious powerpoint conversion issues to present on the Biosharing network: http://www.biosharing.org/. Discussed it's evolution and relationship to ISA-tab and MIBBI
- Scott Edmunds
Trish Whezel next with an update on collaborative development of ontologies using webprotege and Bioportal.
- Scott Edmunds
Michael Schroeder talk from the morning session rescheduled, so now presenting: Maximum-Entropy for Annotation
- Scott Edmunds
Works by MESH terms (4078 for disease, anatomy, etc). Ambiguity = more docs in pubmed and web.
- Scott Edmunds
After an interactive session around the ontology tools it’s now time for final flash updates. First up is Astrid Laegreid with Automated Assessment of High Throughput Hypotheses on Gene Regulatory Mechanisms Involved in the Gastrin Response.
- Scott Edmunds
Next is Fidel Ramirez on a new search method to mine biological data. Input gene/protein annotations.
- Scott Edmunds
Can discover new disease annotations by using functional similarity using OMIM data. See http://biomyn.de
- Scott Edmunds
Next is Warren Kibbe on: Coupling disease and disease genes using Disease Ontology (DO), NCBI GeneRIFs and the NCBO annotator service. See: http://doga.nubic.northwestern.edu
- Scott Edmunds
Up next is Anna Zhukova on KiSAO: Kinetic Simulation Algorithm Ontology. Follows MIASE guidelines.
- Scott Edmunds
Now up is Jon Ison (EBI) on EDAM Ontology for bioinformatics tools and data. EDAM = Embrace Data and Methods Ontology (not classifying Dutch cheese).
- Scott Edmunds
Rescheduled talk from Robert Yao on machine learning on a translational biomedical ontology for Alzheimer’s disease.
- Scott Edmunds
Internet problems this morning, but already had talks from Matthew Horridge on the state of biomedical ontologies (from a logic based perspective), Robert Stevens on Exploring Gene Ontology Annotations with OWL, and now up is Stefan Schulz on Records and Situations. Integrating contextual aspects in clinical ontologies.
- Scott Edmunds
First talk of the final afternoon was Nils Grewe with a talk on “Relating Processees and Events for Granularity-neutral modeling.
- Scott Edmunds
Janna Hastings next with a talk on "Processes and Properties". Used heart rate modeling as an example.
- Scott Edmunds
Further Flash updates from Janna Hastings (ChEBI) and Julius Jacobsen on an ontology integrating Uniprot-Macie+Catalytic Site Atlas. 5/6 Ontology updates were from the EBI.
- Scott Edmunds
Final invited data-mining talk from Andrew Chatr-Aryamontri and Martin Krallinger on detecting associations between scientific articles and ontology terms – the Molecular Interaction Ontology and BioCreative text mining challenges experience. Work on BioGrid database.
- Scott Edmunds
See: http://thebiogrid.org/. Lots of similar PPI databases – federated in IMEx consortium. Using compatible models allows sharing of curation load.
- Scott Edmunds
Yves Dehouch: Prediction of thermal vs. thermodynamic stability changes upon mutagenesis in proteins. Statistical potential for prediction of stability changes upon mutations in proteins. Potential contains terms based on amino acid distance distribution, solvent accessibility and so called coupling (residues have different configurations in protein surface vs. core)
- Anne Tuukkanen
Fast method, thousands of mutations evaluated in a second
- Anne Tuukkanen
Developing a temperature-independent version of the potential for more accurate/consistent thermal stability vs. melting temperature prediction
- Anne Tuukkanen
Anne Goupil: Computational scanning mutagenesis of proteins and protein interactions. Design studio 3.1 contains new protocols: Calculation of effect of mutation on binding affinity in protein complexes and effect of mutation on stability.
- Anne Tuukkanen
CHARMM based algorithm using modified CHARMM force field, Generalized Born implicit solvent (also implicit membrane modeling possible)
- Anne Tuukkanen
Keynote 2: Rebecca Wade, Insights into molecular recognition from simulation of protein diffusion. Protein-protein docking: Complex structure prediction with SDA method using first rigid body docking with BD, then select representatives, and in the end flexible docking using MD. Experimental contraints used in the process.
- Anne Tuukkanen
Multiple protein simulations with SDAMM. Application on hydrophobin I. Found encounter complexes of tetramers that have same shape as crystalized tetramer, but not fully bound conformation.
- Anne Tuukkanen
Prosurf: computational toolbox for protein surface docking . BD, QM, MD and experiments combined.
- Anne Tuukkanen
Roland L. Dunbrack: Identifying biologically relevant interactions in protein crystals.
- Anne Tuukkanen
Biological unit vs. asymmetric unit problem. ProtCID procedure done on whole PDB. Sequences qrouped using PFAM. Chain architectures and pair architectures are compared. Interface comparisons inside each group. Clusterig of interfaces by hierarchical clustering.
- Anne Tuukkanen
Guido Capitani: Is it biologically relevant? An evolutionary method for distinguishing biological interfaces from crystal contacts.
- Anne Tuukkanen
Biologically relevant interfaces are the result of evolution, but crystal contacts are not. Hence, biological interfaces have a detectable signal. Core-Rim Ka/Ks ratio used as a measure of selection pressure.
- Anne Tuukkanen
Christine Orengo talks about enzyme evolution with specificity changes in their binding regions
- Hedi Hegyi
Keynote 3: Christine Orengo,Sub-classifying relatives in CATH domain structure superfamilies to explore protein function evolution
- Anne Tuukkanen
a lot of effects by a new interacting domain
- Hedi Hegyi
HUP CATH superfamily tree - 6 distinct structural clusters
- Hedi Hegyi
CORA and FugueAli str and seq alignment methods
- Hedi Hegyi
FLORA - identifies conserved and distinct structural elements
- Hedi Hegyi
FLORA algorithm studies common/distinct features within a family
- Anne Tuukkanen
Changes in conserved residues in the active sites results in totally different enzymes
- Hedi Hegyi
Optimization of FunFams with SFLD Structure-Function linkage database
- Hedi Hegyi
FunFam - optimisation and validation using structure function linkage database (SFLD), homology models can be built using this data with low seq. id.
- Anne Tuukkanen
what does domain function mean? good point
- Hedi Hegyi
Next release of CATH will contain also SNPs, conservation, functional site information
- Anne Tuukkanen
Nicholas Furnham: Investigating enzyme evolution in structurally defined superfamilies
- Anne Tuukkanen
FunTree pipeline - domain view and seq. view combined, phylogenetic tree generated and processed as well as annotated with additional data such as KEGG, catalytic site atlas etc.
- Anne Tuukkanen
Similarities based on small molecules (reactions to them or are they metabolites?)
- Hedi Hegyi
Julian Gough, Virosphere-specific protein folds
- Anne Tuukkanen
several viral specific domain families exist (63), found in all major functional classes of viruses and most fold classes
- Anne Tuukkanen
Noah Ollikainen: Structure based prediction of natural residue covariation using computational protein design
- Anne Tuukkanen
Residue covariation is a general property in natural protein sequences, want to use this in computationally designed sequences. Backbone flexibility increases similarity between designed and natural covariation. Amino acid composition is similar in designed and natural sequences.
- Anne Tuukkanen
Andrew J. Bordner, Orientation-dependent backbone-only scoring functions for protein design
- Anne Tuukkanen
Keynote 4: Charlotte Deane, Modelling of Membrane Proteins
- Anne Tuukkanen
Medeller outperforms Modeller on membrane proteins. Using membrane protein specific substitution matrix in scoring.
- Anne Tuukkanen
Loop modelling done with FREAD and using membrane protein specific database.
- Anne Tuukkanen
There are now enough data on membrane protein structures that can be used to parametrize MP specific tools
- Anne Tuukkanen
Alpan Raval, Homology model refinement via long all-atom molecular dynamics simulations. They used all-atom CHARMM FF, well-sampled 100 microsecond simulations, explicit solvent model. Study shows that force field errors are still a limitation.
- Anne Tuukkanen
Alessandro Pandini: Detection of allosteric signal transmission by informationtheoretic analysis of protein dynamics
- Anne Tuukkanen
MD simulation used to sample conformational space of a protein, small fragment conformations (structural aplhabet) observed at different time points, network of correlated fragment changes studied.
- Anne Tuukkanen
They found major similarities in dynamics of homologous domains
- Anne Tuukkanen
Keynote 6: Ruth Nussinov,Structural proteome scale prediction of protein-protein interactions using interfaces.
- Anne Tuukkanen
A step towards adding time dimension in protein-protein interaction networks. Interaction interfaces predicted on monomers,which interactions can take place simultaneously are studied.
- Anne Tuukkanen
Hegyi Hedi, The relationship between proteome size, structural disorder and organism complexity. G-value paradox: complexity does not correlate with gene number. Resolved: one should consider I-value (information content). Protein families expanding in evolution are more disorderd.
- Anne Tuukkanen
Keynote 7: Torsten Schwede, Sins and Virtues in Protein Structure Homology Modelling
- Anne Tuukkanen
Thorsten Schwede's current challenges for Homology Modelling to make the models easily usable for end users: 1) absolute local error estimates; 2) models including cofactors and local detail; 3) oligormers
- Andrea
Keynote 7: Thorsten Schwede, Sins and Virtues in Protein Structure Homology Modelling. Why to use automated homology modelling? Modeling experts do not scale very well, you can't fight the data deluge
- Anne Tuukkanen
Kick off with Keynote 1 - Pamela Hoodless from Department of Medical Genetics, University of British Columbia. title -"From genomics to embryology "
- Saravanamuttu Gnaneshan
Introduction to embryology and especially on liver development followed by need to generate new liver cells for transplantation.
- Saravanamuttu Gnaneshan
Comparision of hepatoblast and hepatocyte transcriptomes - 14% genes differentially expressed.
- Saravanamuttu Gnaneshan
Bioinformatics challenge from a biologist point of view - Need effective data intergration tools
- Saravanamuttu Gnaneshan
Talk: Ali Mortazavi ChIP-seq regulatory analysis using ChIA-PET
- Shannon McWeeney
Overview - abundance of directed graphs in biology
- Shannon McWeeney
goal: chip-seq to regulatory networks. Can id 1000s of binding sites - which are functional? which genes do they regulate?
- Shannon McWeeney
Simonis 2007 - can we connect distal chip-seq peaks to their target?
- Shannon McWeeney
CHIA-PET (Nature 2009 - estrogen receptor). Ongoing collaboration with Yijun Ruan @ GIS. Same steps as ChIP-seq in beginning. Then add linkers, liagte them, cut enzymatically and map to look for overlaps.
- Shannon McWeeney
CHIA-PET vs 5C vs Hi-C 5C - high resolution/local; HI-C - global but low resolution
- Shannon McWeeney
ChIP-seq Regulatory Analysis using connections. Map ChIA-PET between peaks. Convert into graph
- Shannon McWeeney
Graphs: datasets form local graph components - no mega-components. Direct connections clear meaning. Indirect less clear.
- Shannon McWeeney
TSS of highly expressed genes are more likely to be in the chromain interaction graphs
- Shannon McWeeney
Many TSS show high connectivity. Restrict to edges only connected to other TSSS - 30.6% coonnected to 2 or more other TSS
- Shannon McWeeney
Change in TSS degree correlates with change in expression
- Shannon McWeeney
Example: TSS clique in myogenin locus. 12% of expressed genes in promoter cliques -> transcription factories (genes not transcribed singly but as group)
- Shannon McWeeney
19% of interactions to TSS are more than one TSS away
- Shannon McWeeney
TSS form preferential attachment sites within their chormain interaction graphs (CIGS)
- Shannon McWeeney
Q: exact or near cliques? A: exact Q:clarification of attachment sites preference A: based on inference from TSS connectivty patterns
- Shannon McWeeney
Talk: Andrew Roth: JointSNVMix : A Probabilistic Model For Accurate Detection Of Somatic Mutations In Normal/Tumour Paired Sample Sequence Data
- Shannon McWeeney
germline mutations are often mistaken for somatic due to under-sampling in normal data. algorithms that analyze data individually would be more susceptibe to this. Rationale for joint approach
- Shannon McWeeney
Benchmark data set - Metrics: dbSNP concordance (surrogate for true germlines)- can explicitly ask if joint analysis reduces number of germline mutations mistakenly called as somatic
- Shannon McWeeney
2nd metric: ROC - is there a gain / loss in performance using joint methods?
- Shannon McWeeney
model admixture of normal/tumor in tumor sample needs to be addressed.
- Shannon McWeeney
need to address temporal aspect - normal -> primary->metastasis
- Shannon McWeeney
Goal of computational cell map. Review of Systems Biology Pyramid (Cary et al 2005)
- Shannon McWeeney
Collection of pathway resources continually growing (pathguide.org)
- Shannon McWeeney
Increased activity in utilizing community standards. Data integration and sharing other key steps. Example of Pathway Commons (pathwaycommons.org)
- Shannon McWeeney
cpath^2 - next generation demo - www.pathwaycommons.org/pc2-demo (emphasis primarily on web services)
- Shannon McWeeney
Cytoscape 3 is under development. Complete re-architecture. OSGI - everything is a plug-in. Wide developer pre-release in fall.
- Shannon McWeeney
Q: Issue of consensus in community? A: group must have goal in mind that they want consensus. takes time but worth it and will result in standards people are happy with in community
- Shannon McWeeney
Talk: Guanming Wu – “Reactome Functional Interaction (FI) Cytoscape plugin: A network module-based tool for cancer data analysis”
- Shannon McWeeney
Coverage problem major issue. solutions: import pathways from pathway databases and import pair-wise relationships
- Shannon McWeeney
must address pair-wise problem (data model issue - to addresss can convert reactions in pathways into pair wise relationships "functional interactions" ; False positive - scoring system needed to score reliability for protein-protein interactions using Naive Bayes Classifier)
- Shannon McWeeney
FI network : can signifcantly increase coverage
- Shannon McWeeney
FI cytoscape plug-in. Network clustering analysis included (Newman PNSA 2006); Cancer gene index annotation; Survival analysis
- Shannon McWeeney
Need for overall scoring system . However underylying data fundamentally different. Created framework to make different scoring systems comparable
- Shannon McWeeney
PSISCORE - registry model similar to PSICQUIC
- Shannon McWeeney
Talk: Ian Donaldson – “iRefScape: A Cytoscape plugin for visualization and data mining of protein interaction data from iRefIndex”
- Shannon McWeeney
IRefIndex -consolidation of approx 10 different databases - feedback to parent databases; lineage and provenance; meta-data regarding transformations
- Shannon McWeeney
Go back to sequence data and can assign keys in order to verify that different accessions are indeed referring to the same entity. Exact sequence matches
- Shannon McWeeney
n-ary data representations - different models: spoke, bipartite, etc. Has gone to lengths to keep this data distinct.
- Shannon McWeeney
Panel: Daniela Nitsch, Esti Yeger Iotem, John Pinney, Jing Li : “Networks and Gene Lists”
- Shannon McWeeney
John Pinney - Disease gene identification. Use seed gene approach. Specifically random walks with restarts (Kohler et al approach)
- Shannon McWeeney
variation on approach- using different data types individually. Use of semantic similarity for annotations
- Shannon McWeeney
Perform "data fusion" - average random walks across data types
- Shannon McWeeney
Given some networks more informative and have seed genes - can use cross-validation to measure how much weight should be given to each source of evidence
- Shannon McWeeney
Use pre-computed random walks. Web interface diana - disease association by network analysis
- Shannon McWeeney
Jing Li - gene relationship measure via diffusion kernel calculation for each network. 3 diffusion kernel matrices are then normalized so they can be compared directly
- Shannon McWeeney
compared with Endeavour (Aerts et al 2006, Tranchevent et al 2008) - mulitple sources, Kohler, plus traditonal network algorithms
- Shannon McWeeney
Esti Yeger Iotem - goal is to understand cellular response to stimulus (genetic screens and mRNA profiling)
- Shannon McWeeney
ResponseNet - identify regulatory pathways leverging PPI data
- Shannon McWeeney
If connected 2 genes lists via PPI = hairball. Used min-cost flow algorithm to maximize the connectivity while maintaining sparse solution
- Shannon McWeeney
Keynote 3: Trey Ideker: Asessing networks that are both structural and functional
- Shannon McWeeney
understand how networks faciliate information flow in addition to how information is structured
- Shannon McWeeney
"Active modules" (originally proposed in 2002) - co-cluster in unsupervised way to pull out enriched subnetworks
- Shannon McWeeney
Example - HIV infection associated active modules (collaboration with Sumit Chanda)
- Shannon McWeeney
supervised network biomarkers approach - Han-Yu Chuang 2007 - subnetworks with average expression predictive of progression
- Shannon McWeeney
Network-guided decision tree induction (protein network biomarker application) - Dutkowksi et al Plos Comp Bio in press (network guided random forests)
- Shannon McWeeney
cross-comparison of networks - conserved regions in +/- stimulus or across different species - network alignment + interaction scores= high scoring desnse conserved complexes
- Shannon McWeeney
new data - atlas of combinatorial interactons among TF;human and mouse; TF-TF interactions; 1200 TF assays; 762 TF-TF interactions in human; 877 TF-TF in mouse; qRTPCR measurements of TF abundance across 34 adult tissues
- Shannon McWeeney
Highly conserved networks between human and mouse - example: TF-TF in Brain
- Shannon McWeeney
Compliment approach- specifically target differences in genetic networks across conditions
- Shannon McWeeney
measuring genetic interactions with "product " model - if deviation from expectation (product of two) can id significant postive or negative interactions
- Shannon McWeeney
Challenge with DNA damage - does network remodel? 418 genes examined. 182 TF 111 Kinases 37 phosphatases 35 chromtin remodelers, as well as the known DNA damage genes
- Shannon McWeeney
Untreated network; treated network and then 3rd network (ratio/subtraction) = differential
- Shannon McWeeney
Noted that differential networks capture DNA damage response networks; Static networks do not.
- Shannon McWeeney
Problem appears to be "housekeeping interactions"
- Shannon McWeeney
Relationship between genetic and physical. Explored idea they are not overlapping but instead orthogonal. Gene A is part of Complex B mindset. Look for sets tightly connected with each other and then pairs aross sets. Cytoscape plugin Srivas et al Nature protocols In Press
- Shannon McWeeney
Network assembly via framework of filters (removing interactions) and integrators
- Shannon McWeeney
Talk: Barbara Mirel – “Gaining knowledge and coherence from complex networks and interactive activity trails”
- Shannon McWeeney
Issues of usability and exploratory analysis -need for coherence and orientation
- Shannon McWeeney
VisTrails - implementation for complex data not well structured www.vistrails.org/
- Shannon McWeeney
Shows picture of stages of cancer progression (ref Vogelstein, colon); poses the question of how metastasis occurs -- does this involve genetic or epigenetic changes?
- Barb Bryant
Tan Ince cultured two kinds of normal human mammary epithelial cells. He transformed them with oncogenes, resulting in different types of tumors.
- Barb Bryant
Concludes that the nature of the normal cell of origin is a strong determinant of the phenotype of the primary tumor, and whether it metastasizes. The playing field is tilted in the beginning.
- Barb Bryant
Self-renewing stem cells produce either more stem cells or transit amplifying cells which in turn lead to post-mitotic differentiated cells. Only the self-renewing stem cell could seed a new tumor.
- Barb Bryant
How do cancer cells acquire all of these capabilities (invasion, intravasastion, transport, metastasis...) Are there addiitonal mutations required? Is it epigenetic?
- Barb Bryant
epithelial-mesenchymal transition -- cells on the perimeter of the tumor are mesenchymal. This may be due to signals from the surrounding stroma.
- Barb Bryant
There are probably 1000 proteins that shift in EMT. Vaious transcription factors (TFs) induce EMTs.
- Barb Bryant
EMT program highly complex and occurs normally during development.
- Mickey Kosloff
from iPod
It seems likely that most of the invasion-metastasis program can happen without need for additional mutations; rather use signaling from microenvironment.
- Barb Bryant
P. Gupta transformed human primary melanocytes (pigmentation in the skin) with a cocktail of oncogenes. Found that in contrast to transformed epithelial cells, there was much higher likelihood of metastasis. Again, cell of origin is important in future behavior.
- Barb Bryant
One TF, Slug, was found to enable melanoma metastasis. (Even though the primary tumors grew a little faster.)
- Barb Bryant
Another TF, FOXC2, when expressed in epithelial cells induces migration and invasion. A subset of breast cancers have high levels of nuclear FOXC2, and these are more aggressive breast cancers.
- Barb Bryant
Speculates that different networks of EMT-inducing factors might program metastasis in different cell types./
- Barb Bryant
Stem cells identified by high CD44 and low CD24. (CD's are markers on cell surface which can be assayed fairly easily.)
- Barb Bryant
There are various ways to make cells acquire stem cell characteristics.
- Barb Bryant
Mentions Kornelia Polyak. There are stem-like cells in primary human breast samples. The stem cell program in normal human mammary gland is coopted by cancer cells.
- Barb Bryant
More proof that EMT creates stem cells.
- Barb Bryant
Most current chemotherapies preferentially kill non-cancer-stem-cells. The remaining stem cells can repopulate the tumor and are often more resistant to therapies.
- Barb Bryant
Gupta & Onder tested CSCs and non_CSCs with a bunch of drugs. There are some CSC-targeted agents (Salinomycin, Abamectin). Of 16,000 compounds only about a dozen preferentially killed CSCs as opposed to non_CSCs. Many were the other way round.
- Barb Bryant
This probably won't be the "answer". Christine Chaffer noticed that there were some floating cells in 2D cultured human mammary epithelial cells. She grew these up; these look more like CSCs.
- Barb Bryant
Interestingly, she found that non-CSCs could generate CSCs.
- Barb Bryant
Hm, isn't this kind of pouring cold water on the excitement about CSCs as drug targets? Or maybe you have to target both CSCs and non-CSCs simultaneously.
- Barb Bryant
Q: cancer biologists like to study druggable genome. But transcription factors seem most important. A: expression of TFs is controlled by cytoplasmic factors. Might want to go after those. Drugging the TF itself might be hard, but the signaling pathways might be more druggable.
- Barb Bryant
Q: has it been shown that change in the two forms of cadherins match the change in CD expression, and are these correlated with morphology? A: I showed that: CD44 high cells shut down E-cadherin; they expression vimentin, and other mesenchymal markers. I don't know whether CD44 is useful for non-mammary epithelial tissues.
- Barb Bryant
Q: So do normal non-SCs generate SCs? A: Yes. Same differences as in cancer.
- Barb Bryant
Spontaneous de-differentiation into SCs. Interesting phenomenon.
- Steve Chervitz Trutane
motivation is to understand genetic basis of human diseases
- Dawei lin
Genetic basis of human diseases - important disease mechanisms and bio pathways remain unidentified
- Venkata P. Satagopam
gap in knowledge of human disease biology contribute to high failure rates in drug development
- Dawei lin
Why understanding genetic mechanisms ? (1) Important mechanism remain unidentified (ii) Gaps in knowledge causes failure rate in drug development
- arne
It will be a long way to know if the two motivating hypotheses are true
- Dawei lin
one of the most research on T2D. It scaned 100k people for 10 yrs
- Dawei lin
10 years later 50% progressed to have the disease
- Dawei lin
10years of diabetic research - the out come is - 50% of people with good lifestyle improved
- Venkata P. Satagopam
lifestyle has a bigger impact than Metformin
- Dawei lin
Diabetes study with 10-year follow-up of diabetes incidence and weight loss, "T2D". Randomized into treatments: lifestyle, metformin, placebo. Best drug makes relatively little difference in incidence; lifestyle intervention is better than drug but still doesn't help a whole lot.
- Barb Bryant
best prevention was extensive lifestyle changes (50% -> 40% incidence)
- Mickey Kosloff
Diabetes is not only a matter of life style
- arne
success rate in current pharma industry is <5% of molecules entering the clinical trails
- Venkata P. Satagopam
key attributes of genetic mapping: (1) unbiased by prior assumptions about pathways (2) saturation mutagenesis reveal pathways
- Dawei lin
many mutants -> reveals coherence of pathways
- Ted Laderas
These days we have other methods that are unbiased like expression profiling, but genetic mapping has some unique characteristics relative to these (he’ll explain in a minute).
- Barb Bryant
Drosophola's mutations looked initially random, years they almost all related to pathways.
- Dawei lin
bottleneck is functional determination - biochemical approaches
- Ted Laderas
A lot of current knowledge can track back to genetic mapping
- Dawei lin
A slide based on Galzier et al, Science 2002
- Dawei lin
genetic mapping of human single gene disorders ...over 15 years Botstein paper in 1980, first genetic map in 1985 ....
- Venkata P. Satagopam
It took 10 year to find maker for Huntington disease
- Dawei lin
Once you find a linked region from genetic mapping, it still takes a long time to find the specific gene responsible.
- Barb Bryant
in the 1990's the idea was that common diseases were caused by rare mutations with large effects
- arne
"Chromosome shlepping" - Eic Lander's term for the identification of a very gene in some genomic region.
- Roland Krause
It is robust to find mendelian disease but to not common diseases
- Dawei lin
another approach: population genetics - QTL approach
- Ted Laderas
phenotypic variation is often continuous and may involve variation in many genes
- Dawei lin
Galton invented regression analysis to analyze the measuring of phenotypic data (heights of parents and offspring).
- Roland Krause
The biometric unit --- almost nothing was Mendelian
- arne
Most traits are continuously variable
- Ted Laderas
Francis Galton was a cousin of Darwin. Darwin didn’t explain the source of variation. Galton focused on this; he measured the heights of parents and their offspring, and found a relationship. He invented regression analysis to draw the line. The slope of the line is related to the inheritability of the disease.
- Barb Bryant
It was studied by the cousin of Darwin, Francis Galton (1885)
- Dawei lin
phenotypic variation is often continuous ... some history ... Francis Galton (1885), Ronald Fisher (1918), Hermann Muller (1920)
- Venkata P. Satagopam
This gave rise to the biometric movement – measure every living thing. Traits were related to genetic relatedness; and it wasn’t Mendelian. This led to the biometric-Mendelian debate.
- Barb Bryant
Ronald Fisher, was actually a geneticist, who also invented p-value and Fisher exact test
- Dawei lin
Ronald Fisher (the one with the exact test) was also a geneticist.
- Roland Krause
Solved by assuming that phenotype often is an effect of several Mendelian genes.
- arne
Fisher: individual genes are mendelian, effects of genes additive
- Ted Laderas
Hermann Muller 1920 (Nobel Prize for X-ray induced mutations). PhD thesis not Mendelian trait, but truncate wing. Wasn’t Mendelian. Did genetic mapping.
- Barb Bryant
Hermann Muller decided to use broken wing of fruit fly to study non-Mendelian diseases
- Dawei lin
Muller 1920 paper: 4 chromosomes in fly – 3 contain genes that influence the trait truncate wing. Muller wrote about implications for human traits, like psychological traits. Said that traits were going to be too complicated. Said you could figure out by looking at population, but not looking at Mendelian inheritance in families.
- Barb Bryant
Muller 1920 suggested that it needed to do study on a population.
- Dawei lin
mendelian fallacy - sub-populations are easily divisible in terms of risk
- Ted Laderas
Prediction will only be useful if there is an intervention that you would not use without the prediction. Otherwise, you should use the intervention anyway.
- Roland Krause
Huntington will not be a representative example - for most diseases/people identified risk will be <<100% even with full genetic information
- Mickey Kosloff
Cautionary tale - PSA prediction results in over-treatment, hasn't been shown that people live longer because of test
- Mickey Kosloff
Very cautious about PSA - no improvements on the mortality but many operations performed.
- Roland Krause
genetics offers a path to discover the underlying biology of human diseases ; the great value will drive from pathophysiology and treatment
- Venkata P. Satagopam
When grouping mutations into pathways up to 85% of GBM have a muation in the most important pathways, while individual genes are down to a few %
- arne
Each oncogene may have relatively low frequency across patients; but when you group genes across pathways, a pathway may explain a large fraction of patients with a given type of cancer.
- Barb Bryant
can see a change in pathway activation between primary tumor and mets
- Mickey Kosloff
Dominant alterations changes between cancer types and states.
- Roland Krause
GBM: copy number is rare (and noisier) Ovarian: more regular and higher
- arne
profiles of copy numbre variations differ between types of cancers
- Mickey Kosloff
Metastatic tumor samples have more copy number changes than primary tumors. Not surprising. But maybe primary samples with more copy number changes than others are more likely to metastasize? Generally, better outcome with fewer somatic copy number changes.
- Barb Bryant
BRCA1 and BRCA2 mutations convey germline inherited cancer risk
- Barb Bryant
These genes act in the homologous repair pathway. Half of all patients have mutations in some homologous repair pathway gene.
- Barb Bryant
and more generally, homologous repair genes are altered in > 50% of ovarian cancer
- Mickey Kosloff
Tumor suppressor genes can be inactivated in various ways: germline mutation, somatic mutation, epigenetic silencing, etc.
- Barb Bryant
There are drugs under development that might work particularly well in patients with defects in this particular pathway.
- Barb Bryant
Cancer genomics portal: www.cbio.mskcc.org/cancergenomics
- Barb Bryant
Instead of going through all the models that are possible, you derive statistical properties across a set of good models for each of the Wij weights in the model.
- Barb Bryant
This is sort of like partition functions in statistical physics
- Barb Bryant
after step 1 - generation of probability distributions then step 2- decimation
- Shannon McWeeney
So you have a probability distribution for each Wij, which represents the interaction between element i and element j. I'm not really getting how you "update" these probability distributions in the iterative steps. I do understand that at the end you take the most "certain" (narrowest) distribution and fix its value (some Wij) at the most probable value, then update all the other Wij's given this fixation. And so on. To get your final model in a sort of greedy fashion.
- Barb Bryant
And by the way, the underlying model is a simple differential equation sort of thing: change of one variable xi is a sigmoidal function of weighted (Wij) sum of all variables xj, less a decay term.
- Barb Bryant
Question: Interacting network tend to be modular, with strongly-interacting subnetworks that interact weakly with each other. ...
- Barb Bryant
Chris: Is the modular approach really useful in confronting the data? [Is that what he said?]
- Barb Bryant
Question: can you get at causal relationships?
- Barb Bryant
Chris: yes - if the network model allows you to predict correctly the result of a particular perturbation applied to a particular node, then you can simulate using that model.
- Barb Bryant
Question: with a big network, how many experiments will you need to model?
- Barb Bryant
Chris: Good question. Could use an entropy measure. Help us figure this out. Help us design the experiments. It's important because of the costs of experiment. This is going to be broadly applicable in cell biology.
- Barb Bryant
bb - he said one should see if approach is useful by confronting with real data
- Shannon McWeeney
from BuddyFeed
Chris gets at the difference between a model that tells a story and a model that is truly predictive.
- Barb Bryant
Question: yes, but, what are the semantics of the graph? What kinds of interaction? Answer: The semantics are in the mathematics of your model.
- Barb Bryant
Question: mean field approach is interesting. Compared to Monte Carlo approach, you are assuming some decoupling. Loss of posterior coupling between weights - is that an issue?
- Barb Bryant
Chris: If you look at a coupled system overall, the extent to which the algorithms work depends on correlations within the system. Long-range (in terms of network distance) correlations are problematic. There are some clever approaches to handle some of this. Mentions non-ergotic space; deal with parts of space separately or iteratively.
- Barb Bryant
Manolis Kellis: Systems level view of transcription. Regulatory networks across species using conservation. Effects on top of nucleosome positioning: The histone code leads to a multitude of combinations.
- Roland Krause
[...] Signatures of transcription factor binding and nucleosomes in different cell lines. Dips in chromatin signal hints TF sites, associated with conservation. Many cell type specific dips.
- Roland Krause
William Stafford Noble: Segmentation for chromatin states.Very general talk with lots of colorful plots and no formula (regrettably if you ask me).
- Roland Krause
Chris Bock:Biomarker development from epigenomics
- Roland Krause
Epigenetic aberrations that lead to cancer could be reversed, a handful examples are in the clinic.
- Roland Krause
Epigenetic biomarkers are detectable earlier than genetic changes and possibly from blood.
- Roland Krause
Showcase example SEPT9 promoter methylation (diagnostic). MGMT promoter methylation as therapy selection. LINE repeat meth to monitor effect of demeth drugs.
- Roland Krause
Search for biomarkers promising. Bioinformatics challenge in distinguishing tissues, not mechanistic inferences. A variety of technologies exist. Four selected for this studies: MeDIP, MethylCap, RRBS (sequencing based, bisulfite) and Infimium (apapted microarray, bisulfite).
- Roland Krause
Benchmarking: Good agreement between bisulfite method. Enrichment methods display sequence biases with low correlations. Repeats show spurious hits.
- Roland Krause
Linear models can correct for sequence biases.
- Roland Krause
Differentially methylated regions detection using Fisher's exact test.
- Roland Krause
Developed in the process of high-throughput screening at the Broad Institute.300,000 compunds are screened (somehow) to produce a ranked list, a distribution across a response variable.
- Roland Krause
Find threshold to verify the top n compounds on the list.
- Roland Krause
How many hits should be sent to confirmatory experiments?
- Roland Krause
Most commonly, people simply guess, often arbitrary and unfair. FDR is typically preferred but it's argued that it's just as arbitrary.
- Roland Krause
Two flaws of FDR: Need to pick a FDR cutoff and FDR assumes that everything different than the negative controls is interesting, often not the case in the compound testing.
- Roland Krause
Key question: What is the most profitable number of confirmed activities to produce? Think supply-demand curves.
- Roland Krause
Determine the supply cost (how many positive are produced) and the demand curve by the cost of finding one more active compound.
- Roland Krause
(Unpublished work in scaffold disovery)
- Roland Krause
Q: Assumptions on distributions. A: Less assumptions than FDR. Only assumption is that there is some signal in the original test.
- Roland Krause
Q: Power comes from option to back to the screen. Not possible easily in microarray experiments. A: Testing happens in MA analysis and recommends to use similar verifying settings there, too.
- Roland Krause
Q: How find the demand curve? A. Should only be controlled by the costs (funding, importance). Does not need explicitly modeled.
- Roland Krause
Main improvement is to tie follow up experiments to real costs rather than some rather abstract statistic.
- Roland Krause
# The late breaking talks seem to be very interesting, room 305 is packed with ~80 people.
- Roland Krause
# Moving the LBR track to 201; constantly been packed before
- Oliver Hofmann
Cameron, contributors cloud done and posted :)
- Lars Juhl Jensen
I see a homologous word in the two Wordles! Latin data = Sanskrit datta, "given". :-)
- Ruchira S. Datta
@Oliver - to answer your question there are a number of "@" names in the wordle, if you look closely, but yeah, it's very nice of @Lars to do both wordles! I can see @allyson and @oliver for instance
- Allyson Lister
Looks like we should ramp up our vocabulary of verbs - use, using and used are all prominent items by themselves.
- Roland Krause
I'm amazed by the difference between the Wordle based on abstracts (http://larsjuhljensen.tumblr.com/post...) and this one. The abstracts seem to focus on methods whereas the FriendFeed comments focus on data.
- Lars Juhl Jensen
At least partially because it is much easier to quickly describes test data set than a complex method that might require notation or a schema, I think
- Oliver Hofmann
from iPhone
Frank, me too! I wonder if this Wordle excluded the non talk specific things (e.g., FF shut us down)
- Ruchira S. Datta
No it was just a quick'n'dirty hack: download all comments as JSON in batches of 30 topics, extract FriendFeed user names, paste into Wordle, submit to FriendFeed ;-)
- Lars Juhl Jensen
well, that would probably explain it then :-)
- Ruchira S. Datta
I don't think so Ruchira - the distance from Ally to you is more than 150 comments. I doubt you made *that* many non-talk comments (but I haven't checked).
- Lars Juhl Jensen
Simon, thanks - I won't pester people with a v3 of this cloud, though ;-)
- Lars Juhl Jensen
Think Allyson (and to some extent I) switched to the blog posts and stopped pasting over comments to ff when coverage was already good -- and Ruchira, your coverage was fast and incredibly thorough :)
- Oliver Hofmann
I demand that the analysis is normalized by comment length! And mean attendance! And complexity of the presented material! And smileys!
- Roland Krause
@Roland you are very funny! If you want to know my hypotheses, then: @Ruchira should be first for FF comments, as she definitely ramped up her commenting over the week, even including the SIG comments (which I guess were included :)); also, as @Oliver says, both of us mainly switched to blog posts as the week went on, especially for talks where other FFers were about. I'm not demanding that Lars re-write to pull down word counts from comments or blog posts, though :) (But it would be interesting! :D)
- Allyson Lister