Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »

Venkata P. Satagopam › Comments

ISMB/ECCB
Keynote: Michael Ashburner - From sequences to ontologies - adventures in informatics
"The father of ontologies in biology" speaks without slides. - Roland Krause
One well turned phrase is worth a thousand power points.... - Shannon McWeeney
Almost 50 years after he started his undergrad in Cambridge in geology to venture into paleontology. His developing interest in zoology was not matched by the respective department. Was asked to be the chairman of the department. - Roland Krause
PhD on Drosophila in Cambridge, PostDoc at Caltech. - Roland Krause
Stresse the importance of God-father for young scientists as role model etc. - Roland Krause
No problems in funding in the 60s and early 70s. - Roland Krause
Six month in Bruce Alberts lab. # not so easy to keep up with all the great people that Michael Ashburner worked with - Roland Krause
No knowledge of repetitive DNA, formulation of the C-value paradox. - Roland Krause
Work on Drosophila alcohol dehydrogenase spanning 20 years. - Roland Krause
Sequencing of ADH of 4 species using radionucleotides led to a PhD and a Nature paper. Almost no software available, major hardware incompatibility. Only one ARPAnet node in Europe at the University London. - Roland Krause
Initially 120 baud bandwidth. - Roland Krause
1983, first version of the EMBL database, came on magnetic tape 60kB. First thing: Print and read annotations. - Roland Krause
at that time size is 60kb - Venkata P. Satagopam
No relational integrity, lots of integrity. Moaning about it led to a position on the advisory board. - Roland Krause
Promoter of the establishment of the EBI at the Cambridge site. No genomics at the EMBL in the 80s. Raised 30 M GPB to convince the EMBL council. - Roland Krause
Gopher! - Roland Krause
Flybase establishment. Built in Sybase, output in files, distribution via Gopher. Later, contact with Amos Bairoch and Expasy led to use of a webserver. - Roland Krause
AceDB memories. - Roland Krause
Hierarchical, structured language use in FlyBase, extension to other model organisms. - Roland Krause
Presented in 1997 at the ISMB in Greece. - Roland Krause
Whitepaper for the 1998 in Montreal. Lot's of resistance from various groups. - Roland Krause
Gene Ontology sealed in 1999, support from the yeast and mouse databases. - Roland Krause
Incyte had a patent on controlled vocabulary. (Utter not reproducable here) - Roland Krause
Drosophila genome project, finalized in 6 month, genome annotation jamboree. - Roland Krause
ISMB/ECCB
Keynote: Luis Serrano - M pneumoniae (Towards a full quantitive understanding of a free-living system)
Fully understanding of Micoplasma pneumoniae - - Venkata P. Satagopam
Can we successfully analyze and integrate large number of data types in order to predict changes in system in face of any perturbation? Use case: M pneumoniae - Shannon McWeeney
It contains 689 ORFs + 44 RNSs , 10TFs, kinases and one phosphatase - Venkata P. Satagopam
10 TF ( classical) 2 kinases, 1 phosphatase, full chemical signaling repertoire - Shannon McWeeney
EM tomography used to get the cytoskeleton - Venkata P. Satagopam
Related papers from this work: Yus et al 2009; Güell et al 2009 - Shannon McWeeney
Metabolome - it was difficult to draw the metabolic network, mined last 20years of literature and final network contains 129 enzymes, 140 genes - Venkata P. Satagopam
Strong argument that structural analysis is key - Shannon McWeeney
Modeled using flux balance analysis (FBA) - Venkata P. Satagopam
Motabolomics analysis done using MS and NMR ... developed complete validated map, but missing parameters for reactions, regulatory loops, effect of post translational modifications, enzymes for reactions, - Venkata P. Satagopam
Transcriptome, initially used microarrays, then time-dependent tilling array, then did single strand sequencing - Venkata P. Satagopam
Expression profiling: ncRNA everywhere. expression appears to decay along the operon - "staircase behavior" - Barb Bryant
take home points non-coding RNA abundant; staircase behaviour of operons (steps coincide with end of gene in operon) ? of how this is regulated. - Shannon McWeeney
more than 1 promoter for one operon seems common - Shannon McWeeney
There is a whole huge world of translational regulation, and we "don't have a clue" - Barb Bryant
suggestion to replace operon model - Shannon McWeeney
Cool - there are different RNA polymerase complexes that recognize different operon-start locations; each one also recognizes particular stop signals, and so each may transcribe a different operon. - Barb Bryant
how to explain complexity with so few tf? - Shannon McWeeney
other proteins can act as TF - Shannon McWeeney
possible role of noncoding - Shannon McWeeney
found new putative RNA/DNA binding proteins - Venkata P. Satagopam
Tiny RNAs accurately mark the transcription start sites of genes, about 40 bases long. Have also been found in eukaryotes. Don't know function. - Barb Bryant
Summary - transcriptional complexity in M. pneumoniae could be explained by new TFs, ncRNA, tiny RNAs, new RNA/DNA binding proteins, etc. - Barb Bryant
Missing - who regulates the TFs and DNA binding proteins - Venkata P. Satagopam
Analysis of protein complexes - Sebestian Kuhner et al Science 326, 1235 (2009) - Venkata P. Satagopam
Electron tomography: can visualize large complexes. Allows you to count the number of complexes per cell. - Barb Bryant
Full quantification of proteins and transcripts to get copies per cell, with half-life modeling, and taking into consideration point in growth curve. From this, create full computer model. - Barb Bryant
Statistically more abundant proteins are more essential like Ecoli - Venkata P. Satagopam
Some mRNAs present at (way) less than 1 copy per cell. - Barb Bryant
Poor correlation between mRNA and proteins (~0.5) - Barb Bryant
low copy number of noise - low translation efficiency and long protein half-life eliminates noise - Venkata P. Satagopam
150 ribosomes, 400 promoters, 140 RNA polymerase - Venkata P. Satagopam
There are probably different ribosomes, with different subunit compositions. - Barb Bryant
protein/volume ratio is a magic number - a universal constant: 200 g/l. This is true for 3 bacteria -- is it true for human? - Barb Bryant
They have observed post-translational modifications, but we don't yet know how they are placed, or their function. - Barb Bryant
It's tough to convince students and post-docs to dive into the badly needed research on one or a few proteins when omics experiments are so much faster and often publishable in better journals. - Barb Bryant
ISMB/ECCB
Keynote: Janet Thornton - The Evolution of Enzyme Mechanisms and Functional Diversity
10 year Keynote for ECCB - Shannon McWeeney
Special call-out to Elixir session today at 2:30 Hall F2 www.elixir-europe.org - Shannon McWeeney
Trying to understand life from molecules to systems - Venkata P. Satagopam
She's a "data junkie" -- everything depends on having your data properly organized and being able to extract information from it. - Barb Bryant
Most of our information is still at the parts level, with emerging data on interactions, reactions and pathways - Barb Bryant
at EBI - data doubling every 5 months 12 petabytes of storage currently - Shannon McWeeney
EBI contains presently 12 petabytes of data - Venkata P. Satagopam
We need to look not only at proteins but also at the small molecules, the metabolites. - Barb Bryant
Plants have way more metabolites than we do. - Barb Bryant
Cheminformatics is older but smaller than bioinformatics; largely confined to industry. The tools are not freely available, with notable exceptions. - Barb Bryant
Differences between the proteome and the metabolome, e.g. no evolution and hierarchical structure of metabolites. - Roland Krause
"Way back in the 90s" they were trying to define the reactome - the reactions necessary for life. - Barb Bryant
From the proteome and the metabolome to the reactome: How many reactions are necessary for life? - Roland Krause
Enzymes are important part of biological molecular reasons - Venkata P. Satagopam
Enzymes are called by name and EC number. - Roland Krause
Handling the reactions computationally is a challenge - Venkata P. Satagopam
Predicting enzyme function automatically: most powerful and most popular method is to recognize a homologue and transfer functional annotation. - Vangelis Simeonidis
EC numbers explained: they conform to the following format: C.SC.SSC.SN - Vangelis Simeonidis
The classification of enzymes are four-part: classes, subclasses, sub-subclass, serial number (typically the substrate) - Roland Krause
where: C = Class, SC = Sub-class, SSC = Sub-subclass, SN = Serial number - Vangelis Simeonidis
EC numbers do not capture the mechanism of the enzyme. - Vangelis Simeonidis
Capture only the chemical level, no biological dependence such as co-factors - Roland Krause
There is no one to one relationship between EC numbers and protein families - Venkata P. Satagopam
The reactome contains 4154 reactions - Venkata P. Satagopam
They wanted to build tools that would handle the actual chemistry. - Barb Bryant
There has been a lot of work in the past 10 years in tools to handle the chemistry. Includes Kanehisa 2004, Gasteiger 2008, Aris-De-Sousa 2008, Schomburg 2010. Unfortunately, most of the software isn't freely available, and only tackles part of the problem. - Barb Bryant
There is a huge literature on comparing small molecules to each other. So that's well covered. - Barb Bryant
They also needed to map the atoms from each side of the equation to each other: atom-atom mapping. This works by matching the largest common moiety first, and iterating. The Mesa (?) database of about 300 reactions is a gold standard to check the quality of the mapping. - Barb Bryant
You need to be able to compare reactions to each other - reaction similarity. - Barb Bryant
To describe the changes in the bonds that take place, you use the Dugundji-Ugi model -- you make a matrix showing the bonds for reactants and products; subtracting the matrices gives you the reaction matrix. - Barb Bryant
EC-BLAST created by Syed Asad Rahman; it allows you to compare reactions by bond similarity, reaction centre similarity or substrate structure similarity. - Barb Bryant
Chemicals have several fingerprints bond change, structure, stereo fingerprint - Venkata P. Satagopam
(See KillerApp talk I think Tues 11:45am) - Barb Bryant
CDK (Chemistry development kit) free software, - Venkata P. Satagopam
They looked into redefining the enzyme classification system. - Barb Bryant
Ligases in principle simple, most are 6.1s are amino-acyl-tRNA synthases - Venkata P. Satagopam
The EC-BLAST-server (URL above) is in closed beta. - Roland Krause
Compared two reactions using Tanimoto coefficient - Venkata P. Satagopam
"This heatmap might look good to you, to me it looks fantastic!" Similarity between substrates is now close the EC classification. Differences might be based on the EC classification. - Roland Krause
FunTree - Understanding enzyme families and evolution Poster #Z06 - Venkata P. Satagopam
Why are some structures capable of so many different enzymatic functions? Which are the residues that led to change of function? - Roland Krause
Examples from the Phosphatidylinositol-Phosphodiesterase-Superfamily, a multi-domain protein family. - Roland Krause
They looked at the multi-domain architecture of the phosphatidylinositol-phosphodiesterase superfamily. Adding new domains doesn't add enzyme function to members of this family. - Barb Bryant
One need to understand the evolution to better understand the EC classification - Venkata P. Satagopam
The tree constructed from structure has three main groups. Branches of the tree are distinguished by differences in substrate, product, presence of a metal co-factor, or mechanism. - Barb Bryant
Matrix showing how frequently there are evolutionary changes within and between classes. Evolution tends to create new enzymes within the same class, having the same mechanism but changing the substrate or product. - Barb Bryant
Most of the enzyme evol happening in the last sub class level - Venkata P. Satagopam
Question from the floor: is this an opportunity to abandon the EC classification method and move on to a better one? Answer: no. The EC structure is very sensible. Also, it is powerful because everybody uses it. Also, in the first class we examined, it matches pretty well to the similarity measure we developed. - Barb Bryant
# Best keynote so far - Roland Krause
Question: sometimes you have a huge protein to carry out a single small reaction. Have you noticed any clues to why this happens? A: we have some thoughts related to protein function. First, most proteins are multi-functional. They interact with other proteins and do other sorts of things. Secondly, some of the substrates are quite large. We have a sort of domino theory of enzyme... more... - Barb Bryant
ISMB/ECCB
Keynote: Bonnie Berger - Computational biology in the 21st century: making sense out of massive data
ISMB/ECCB 2011 kicked off, Michal Linial is introducing the first keynote speaker - Venkata P. Satagopam
algorithmic challenges to increasingly massive amounts of data - how to avoid situation becoming intractable? - Shannon McWeeney
Berger showing a graph where data are growing faster than computational powere can handle (MIPS vs. bases / day) - Iddo Friedberg
10 fold increase in sequencing vs doubling in computing capacity - Shannon McWeeney
Not just a bigger cloud -- > need better algorithms - Iddo Friedberg
3 challenges areas - compression, signal from noise, patterns across species - Shannon McWeeney
Need to exploit fact that "new" data is similar - utilize redundancy "compressive genomics" - Shannon McWeeney
Work directly on compressed data rather than compress --> decompress - Iddo Friedberg
use case from fly genomes - Shannon McWeeney
Compression accelerated blast caBLAST - Shannon McWeeney
Redundancy in genomics can be exploited --> CaBLAST. Works on compressed data. Size of compressed DB is proportional to the size of non-redundant data - Iddo Friedberg
coarse analysis on compressed data - refined analysis on relevant regions - Shannon McWeeney
Run time much faster - Iddo Friedberg
never have to uncompress - potential for huge gains - Shannon McWeeney
use case 2: signal from noice (medical genomics) - Shannon McWeeney
Can't find cablast online... - Iddo Friedberg
Berger moving to NCBI GEO and medical trasncriptomics - Iddo Friedberg
Indexing GEO using UMLS -- Unified Medical Language System from NCBI - Iddo Friedberg
UMLS is an ontology of medical concepts.http://www.nlm.nih.gov/researc... - Iddo Friedberg
Concept enrichment in umls tool: concordia - Iddo Friedberg
Using UML and Concordia to analyze tumor origin - Iddo Friedberg
Thanks for the coverage! - Ruchira S. Datta
Lab has several interesting tools like IsoRank, IsoRankN, Struct2Net, RNAicut, Mangoose and more - Venkata P. Satagopam
(network connection is bit bad) - Venkata P. Satagopam
ISMB/ECCB
SNP-SIG: Identification and annotation of SNPs in the context of structure, function, and disease
Highlight Speaker: Mauno Vihinen, Tampere University (Finland) Genetic variations: origin, effects and prediction. - Venkata P. Satagopam
Usually we think variations as harmful, while most of them are not - Venkata P. Satagopam
Genome size of gene numbers does not directly correlate with complexity of organism - Venkata P. Satagopam
Less than 5% human genomes codes for proteins, rest just called junk DNA, but it is not, still majority of the genome is expressed - Venkata P. Satagopam
Variations increases the total number of genes and gene products - Venkata P. Satagopam
Total no. of variations in genome is not known - Venkata P. Satagopam
Variation type - chromosome set number variation (euploidy) - Venkata P. Satagopam
and chromosome structure variation - Venkata P. Satagopam
Variations at DNA level - binding site - TFs, promotor, start and stop codons and Genome organisation - Venkata P. Satagopam
at RNA level - most important effects on splice sites, generation of novel splice sites, alt splicing, variation destroying splice sites - Venkata P. Satagopam
at proteins level -- Many ways to affect function # functional sites, sub-cellular localization, stability effects, changes in disorder, aggregation, electrostatic pro, interactions, stearic effects - Venkata P. Satagopam
Large genomes projects - Venkata P. Satagopam
the 1000 genomes project www.1000genomes.org - Venkata P. Satagopam
Data management, storage, analysis, access, ethics, integration with other data is a big challenge, but extremely useful - Venkata P. Satagopam
Human Variome Project www.humanvariomeproject.org - Venkata P. Satagopam
Variome - Variation in a genome - Venkata P. Satagopam
a public catalogue of curated variations in each of 20k genes and associated phenotyes/studies - Venkata P. Satagopam
worldwide agreed standard systems, notations - Venkata P. Satagopam
about 2000 LSDBs exist - Venkata P. Satagopam
HGVS www.hgvs.org - Venkata P. Satagopam
Diseases names - OMIM, ICD10 - www.who.int/classifications/icd/en - Venkata P. Satagopam
Variation nomenclature - www.hgvs.org/ - Venkata P. Satagopam
GEN2PHEN project ...pan european effort - Venkata P. Satagopam
Human have 99.9% identical genomes - Venkata P. Satagopam
Genome consists of about 3 billion bases, this means that there are millions of differences, some of them are harmful, but which ones? - Venkata P. Satagopam
how to interpret variations and their effects? NGS and other methods produce very large datasets. Impossible to investigate all cases with experimental approaches - Venkata P. Satagopam
Majority of variations are not harmful, only few are causing dieases - Venkata P. Satagopam
extensive deletions, frameshift variations, nonsense variations and other cases that clearly destroy the function of the gene or product - Venkata P. Satagopam
paper - Thusberg and Vihinen Hum.Mut.2009 - Venkata P. Satagopam
Performance analysis - test data sets needed, it should be representative, unbiased, large enough, not used for training, and have systematic description - Venkata P. Satagopam
Benchmark datasets available from "VeriBench", other one Varia ontology (VariO), AmiVario - Venkata P. Satagopam
validated prediction tools and described in the paper- Thusberg et al. Hum Mutat 2011 - Venkata P. Satagopam
Performance of protein localization predictors - Laurila and Vihinen BMC Bioinformatics; Laurila and Vihinen Amino acids 2011 - Venkata P. Satagopam
case study : BTK kinase domain Mao et al . JBC 2002, Khan and Vihinen In silico biol 2009 - Venkata P. Satagopam
PON-P (Pathogenic-Or-Not - Pipeline) - meta predictor for variations - Venkata P. Satagopam
-------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Christian Schaefer. Technische Universitat Munchen (Germany) Can we predict structural change upon point mutation? - Venkata P. Satagopam
Checking the effect of a point of mutation for example S7M on local structure of the protein - Venkata P. Satagopam
Proteins chains collected from PDB, strctl. similarity measure used : RMSD - Venkata P. Satagopam
Machine learning - seq based features from PredictProtein (www.predictprotein.org/) used - Venkata P. Satagopam
dataset - what constitutes a structural change? structural neutral : RMSD < 0.2 A (13,646 mutants); structural non-neutral RMSD > 0.4 (12,056 mutants) -- 6,409 sequences - Venkata P. Satagopam
some success in learning local structural change ; no success yet in correlation that to change in stability or function. But some intrinsic relation pertaining to stability - Venkata P. Satagopam
----------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Gilad Wainreb. Tel Aviv University (Israel) Protein stability: A single recorded mutation aids in predicting the effects of other mutations in the same amino acid site. - Venkata P. Satagopam
used Bellkor collaborative filtering model, it is previously applied to ecommerce applications - Venkata P. Satagopam
3 model used - baseline estimation model, the neighborhood model, the latent factor model - Venkata P. Satagopam
Pro-Maya algorithm, precision is very high compared to other tools, it is also performs well in its sequence based prediction scheme - Venkata P. Satagopam
------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Piero Fariselli. University of Bologna (Italy) Predicting cancer-associated germline variations in proteins. - Venkata P. Satagopam
Problem - given a residue mutation (variation) in a protein predict if the mutation can be disease associated or neutral - Venkata P. Satagopam
predictors - mutation type and local environment (physical - chemistry); evolutionary information (HMM, profiles); protein function (GO); systemic properties (interactions, regulation, time-dependence...) - Venkata P. Satagopam
protein function has some problems - over-fitting problem (cross-validation of scores is need); - Venkata P. Satagopam
SNPs & GO - the seq. profile and GO-based predictor - Venkata P. Satagopam
our lab has two predictors - PhD-SNP and SNPs&GO - Venkata P. Satagopam
From uniprot a first of 6478 germline variations associated and listed in OMIM - Venkata P. Satagopam
Using blast as predictor is very crude method - Venkata P. Satagopam
methods : neural networks and SVMs - Venkata P. Satagopam
GO terms enriched are related to Cancer - Venkata P. Satagopam
Problems and future: Reliability of the annotations; experimental validation on real cases; extension of different diseases; definition of a pipeline - Venkata P. Satagopam
------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Alain Laederach. University of North Carolina, Chapel Hill (USA) Effects of disease-associated SNPs on the structure of the transcriptome. - Venkata P. Satagopam
alain@unc.edu - Venkata P. Satagopam
Mutations in 5' UTR of HBB causes beta-thalassemia - Venkata P. Satagopam
disease associated SNPs in UTRs are from HGMD - Biobase - Venkata P. Satagopam
Looked genome wide structure stabilizing haplotype - Venkata P. Satagopam
------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Keynote: Burkhard Rost, Technische Universitat Munchen (Germany) Trivial step from predicting the effects of SNPs to medicine. - Venkata P. Satagopam
www.rostlab.org - Venkata P. Satagopam
Lingo: change of amino acid: ns (non-synonymous SNPs), they are central players of diseases - Venkata P. Satagopam
individual medicine : make it connects, genes, epigenetics - Venkata P. Satagopam
get diseases high -throughput manner: GWAS, many times (patients, drugs); env. factors (virus and bacteria) - Venkata P. Satagopam
many challenges: high volutes of data, communication, annotation; function is known for 10-50% human - Venkata P. Satagopam
prediction of nSNPs - some in silico methods SIFT, PolyPhen, SNPs3D - Venkata P. Satagopam
Misfunction/neutral - for machine learning method need lot of data, collected 80k mutations with known effects on functions -- Y Bomberg and B Rost 2007 NAR 35: 3823-35 - Venkata P. Satagopam
Y Bomberg and B Rost 2008 Bioinformatics 24: i207-212 - Venkata P. Satagopam
New directions - in silico alanine scan; comprehensive in silico mutagenesis; prediction of binding hot spots - Venkata P. Satagopam
nsSNP effect: more detail - secondary structure (helix, strand) robust under random mutation , disorder not. Molecular dynamics (MD) also playing a important role. - Venkata P. Satagopam
MD2: new SNPs causing Parkinson's disease - A Zimprich et al. & T Meitinger & Tm Strom (201) - exome seq reveals mutations in the retromer protein vps35 - Venkata P. Satagopam
Disordered regions - eukaryotes dominated disorder (4 - 10x) - Venkata P. Satagopam
prediction methods used MD, IUPred - eukaryota - 36 - 43%, where as bacteria 10 - 13% - Venkata P. Satagopam
Trivial step from prediction the effects of SNPs to individual medicine - connect many resources, and look at the details, so we have to work together. - Venkata P. Satagopam
------------------------------------------------------------------------------------------------------------------------------------- - - Venkata P. Satagopam
Company Presentation: Frank Schacherer, BIOBASE GmbH. SNP analysis with high quality data - Venkata P. Satagopam
only 56% of known disease mutations are Mis/Non-sense splice mutations - 22% of min/nonsense mutations estimates to affect alt splicing and 10% of all diseases causing mutations are splice site mutations - Venkata P. Satagopam
Biobase contains manually curated data, the quality of data is very high - Venkata P. Satagopam
HGMD professional - completed inherited disease mutations ... - Venkata P. Satagopam
Nice paper listing different mutation databases Baralle et al EMBO report VOL 10 No 8 2009 - Venkata P. Satagopam
email : frank.schacherer@biobase-international.com - Venkata P. Satagopam
----------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Highlight Speaker: Atul J. Butte, Stanford University (USA) Data-driven Personalized Medicine - Venkata P. Satagopam
addressing the medical use of personal genome - Venkata P. Satagopam
Ref genome contains with 2.8 million SNPs, 752 copy number variants - Venkata P. Satagopam
3 genes TMEM43, DSP, MYBPC3 associated with sudden cardia death ... Ashley et al (2010), Lancet 375:1525 - Venkata P. Satagopam
Current SNP-disease DBs are too limited for application to a human genome - Venkata P. Satagopam
Due to the current limitations with the existing dbs (mess of IDs), developed a new dbs from 5k manually curated papers - Venkata P. Satagopam
Curation challenges - most findings are in tables and figures making NLP hard - Venkata P. Satagopam
SNPs were reported as from Positive strand and negative strand - Venkata P. Satagopam
There is a quite difference between Odds ratio (OR) and Likelihood ratio (LR) - Venkata P. Satagopam
Fagen TJ. Nomogram for Bayes theorem N Rngl J Med 1975 Jul 31;293(5):257 - Venkata P. Satagopam
Medical practitioners using LR since ages. - Venkata P. Satagopam
Pacific Biosystems: sequencing human genome in 15 minutes - 20TB in 15 minutes - Venkata P. Satagopam
During famine no body obese, so environment playing a import role along genome - Venkata P. Satagopam
Ontology-driven Indexing of Public Datasets for Translational Bioinformatics N. H. Shah, C. Jonquet, A. P. Chiang, A. J. Butte, R. Chen, M. A. Musen BMC Bioinformatics, Volume 10. Published in 2009 - Venkata P. Satagopam
Family history (genomics) is also very import to understand the diseases - Venkata P. Satagopam
human toxome project - Venkata P. Satagopam
He recommend a book - experimental man by david ewing duncan - Venkata P. Satagopam
----------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Konrad Karczewski. Stanford University (USA). Assessing Functional and Clinical Significance of Regulatory Variants. - Venkata P. Satagopam
SNPs in coding regions slightly high - Venkata P. Satagopam
In whole genome 7.5% SNPs found in TF binding sites, so SNPs can affect the TF binding site .....Karczewski et al 2011 PNAS in press - Venkata P. Satagopam
SNPs in NFkB regions are associated to more disease compared to others regions - Venkata P. Satagopam
Disease ~ SNP ~ Binding ~ Expression ~ Disease .. gave example SNP associated to AIDS susceptibility and Asthma - Venkata P. Satagopam
----------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Joel Dudley. Stanford University (USA). Evolutionary meta-analysis reveals ancient constraints affecting missing heritability and reproducibility in disease association studies. - Venkata P. Satagopam
Varimed : manually curated database - Venkata P. Satagopam
---------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Adam Frankish. Wellcome Trust Sanger Institute, Hinxton (UK) Is Understanding Alternative Splicing IPotential Functional Effects of SNPs? - Venkata P. Satagopam
GENCODE rich in Alt splicing - Venkata P. Satagopam
LoF project - Loss of function varients - Venkata P. Satagopam
Catalogue of LoF variation created from all three 1000 genome pilot projects - Venkata P. Satagopam
Manual annotation reveal .. ~36% (213/597) putative LoF variants present in AS exons - Venkata P. Satagopam
Artefacts - 44 putative genome seq errors. 243 errors in gene models (<5% pre-existing manually annotate models) - Venkata P. Satagopam
MS data, and published evidence can help to determine where the alt splicing is functional - Venkata P. Satagopam
GENCODE manual annotation provides a rich set of alt splice variants - Venkata P. Satagopam
Loss of function should be considered at the level of transcription, not at the locus level - Venkata P. Satagopam
www.gencodegenes.org version v8 - Venkata P. Satagopam
---------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Sudhir Kumar. Arizona State University, Tempe (USA). Comparative Genomics as an Evolutionary Telescope foMedicine to Peer into the Universe of Human Mutations. - Venkata P. Satagopam
Phylomedicine .... Kumar et al 2011 - Trends in Genetics coming in September issue - Venkata P. Satagopam
Genomic Medicine - Healthcare tailored to the individual based on genomic information - Green et al (2011) Nature - Venkata P. Satagopam
Interested in Protein alleles (nSNVs) - Venkata P. Satagopam
$100 exome may come soon - Venkata P. Satagopam
Number of alleles are increasing - Ng et al 2010 Human Mol Biol - Venkata P. Satagopam
Green et al 2011 Nature 470, 204 - 213 ... very nice paper related genomic medicine - Venkata P. Satagopam
At this moment Genome medicine lost in variation space, so the evolutionary history can help, at least provide first glimpse. Phylomedicine can help - Venkata P. Satagopam
Evol. telescope, two pieces of glass - Evol divergence and evol conservation - Venkata P. Satagopam
The evol rate is directly proportional to number of SNPs - Venkata P. Satagopam
Mol evolutionary and phylogenetic information is going to be crucial in improving diagnosis accuracy; Long term evolutionary properties can be estimated with high accuracy a priori - Venkata P. Satagopam
---------------------------------------------------------------------------------------------------------------------------------------- - Venkata P. Satagopam
Keynote: Steven Brenner, University of California, Berkeley (USA). CAGI Experiments - Venkata P. Satagopam
CAGI- Critical Assessment of Genome Interpretation - www.genomeinterpretation.org - Venkata P. Satagopam
CAGI 2010 (http://cagi2010.org ) - 6 datasets, 108 prediction submissions etc - Venkata P. Satagopam
Personal genome project (PGP) - predict individual's phenotypes - Venkata P. Satagopam
riskSNPs: Mechanisms underlying Disease Associated Loci - Venkata P. Satagopam
WTCCC Loci, number of loci related to 7 diseases (Bipolar disorder, coronary artery disease, Crohn's disease, Rheumatoid arthritis, Type 1 and Type 2 diabetes) - Venkata P. Satagopam
CAGI 2011 - datasets - Breast cancer cell line pharmacogenomic dataset - 54 cell lines - 54 disease cell lines and several other data sets .... look at www.genomeinterpretation.org - Venkata P. Satagopam
ISMB
Keynote: David Altshuler - Genomic Variation and the Inherited Basis of Common Disease
Altshuler is an expert on diabetes type II. - Dawei lin
It is said that he is also a good dancer. - Dawei lin
Tap, ballroom, or tango? - Ted Laderas
Slide dancing - Dawei lin
motivation is to understand genetic basis of human diseases - Dawei lin
Genetic basis of human diseases - important disease mechanisms and bio pathways remain unidentified - Venkata P. Satagopam
gap in knowledge of human disease biology contribute to high failure rates in drug development - Dawei lin
Why understanding genetic mechanisms ? (1) Important mechanism remain unidentified (ii) Gaps in knowledge causes failure rate in drug development - arne
It will be a long way to know if the two motivating hypotheses are true - Dawei lin
one of the most research on T2D. It scaned 100k people for 10 yrs - Dawei lin
10 years later 50% progressed to have the disease - Dawei lin
10years of diabetic research - the out come is - 50% of people with good lifestyle improved - Venkata P. Satagopam
lifestyle has a bigger impact than Metformin - Dawei lin
Diabetes study with 10-year follow-up of diabetes incidence and weight loss, "T2D". Randomized into treatments: lifestyle, metformin, placebo. Best drug makes relatively little difference in incidence; lifestyle intervention is better than drug but still doesn't help a whole lot. - Barb Bryant
best prevention was extensive lifestyle changes (50% -> 40% incidence) - Mickey Kosloff
Diabetes is not only a matter of life style - arne
success rate in current pharma industry is <5% of molecules entering the clinical trails - Venkata P. Satagopam
This is bad !! - arne
mentions well known number of >95% failure rate of new compounds - Mickey Kosloff
because there are still 40% people got the disease after the lifestyle change, it seems that people do not know the course of the disease - Dawei lin
Genetic mapping started in 1913 - Dawei lin
genetic map came in 1913 - Venkata P. Satagopam
Morgan and Sturtevant 1913 - arne
emphasizes he advocates a genetecist's approach (rather than a genomic approach) - Mickey Kosloff
And tells you to skip undergraduate work if you have something better to do - arne
key attributes of genetic mapping - unbiased by prior assumptions about pathways - Venkata P. Satagopam
saturation mutagenesis reveals pathways - Venkata P. Satagopam
key attributes of genetic mapping: (1) unbiased by prior assumptions about pathways (2) saturation mutagenesis reveal pathways - Dawei lin
many mutants -> reveals coherence of pathways - Ted Laderas
These days we have other methods that are unbiased like expression profiling, but genetic mapping has some unique characteristics relative to these (he’ll explain in a minute). - Barb Bryant
Drosophola's mutations looked initially random, years they almost all related to pathways. - Dawei lin
bottleneck is functional determination - biochemical approaches - Ted Laderas
A lot of current knowledge can track back to genetic mapping - Dawei lin
Botstein and Fink Science 1988 .... - Venkata P. Satagopam
A slide based on Galzier et al, Science 2002 - Dawei lin
genetic mapping of human single gene disorders ...over 15 years Botstein paper in 1980, first genetic map in 1985 .... - Venkata P. Satagopam
It took 10 year to find maker for Huntington disease - Dawei lin
Once you find a linked region from genetic mapping, it still takes a long time to find the specific gene responsible. - Barb Bryant
in the 1990's the idea was that common diseases were caused by rare mutations with large effects - arne
"Chromosome shlepping" - Eic Lander's term for the identification of a very gene in some genomic region. - Roland Krause
It is robust to find mendelian disease but to not common diseases - Dawei lin
another approach: population genetics - QTL approach - Ted Laderas
phenotypic variation is often continuous and may involve variation in many genes - Dawei lin
Galton invented regression analysis to analyze the measuring of phenotypic data (heights of parents and offspring). - Roland Krause
The biometric unit --- almost nothing was Mendelian - arne
Most traits are continuously variable - Ted Laderas
Francis Galton was a cousin of Darwin. Darwin didn’t explain the source of variation. Galton focused on this; he measured the heights of parents and their offspring, and found a relationship. He invented regression analysis to draw the line. The slope of the line is related to the inheritability of the disease. - Barb Bryant
It was studied by the cousin of Darwin, Francis Galton (1885) - Dawei lin
phenotypic variation is often continuous ... some history ... Francis Galton (1885), Ronald Fisher (1918), Hermann Muller (1920) - Venkata P. Satagopam
This gave rise to the biometric movement – measure every living thing. Traits were related to genetic relatedness; and it wasn’t Mendelian. This led to the biometric-Mendelian debate. - Barb Bryant
Ronald Fisher, was actually a geneticist, who also invented p-value and Fisher exact test - Dawei lin
Ronald Fisher (the one with the exact test) was also a geneticist. - Roland Krause
Solved by assuming that phenotype often is an effect of several Mendelian genes. - arne
Fisher: individual genes are mendelian, effects of genes additive - Ted Laderas
Hermann Muller 1920 (Nobel Prize for X-ray induced mutations). PhD thesis not Mendelian trait, but truncate wing. Wasn’t Mendelian. Did genetic mapping. - Barb Bryant
Hermann Muller decided to use broken wing of fruit fly to study non-Mendelian diseases - Dawei lin
Muller 1920 paper: 4 chromosomes in fly – 3 contain genes that influence the trait truncate wing. Muller wrote about implications for human traits, like psychological traits. Said that traits were going to be too complicated. Said you could figure out by looking at population, but not looking at Mendelian inheritance in families. - Barb Bryant
Muller 1920 suggested that it needed to do study on a population. - Dawei lin
Muller: Truncate wing - 3 genes influence effect of phenotype - Ted Laderas
Mullers thesis included the notion of surveying complex phenotypes in the population rather than families. - Roland Krause
Muller: traits are too complex to observe in families, but can observe in population - Ted Laderas
characterization and catalogue human seq variation is a decade of work .. i.e international HapMap project - Venkata P. Satagopam
Another decade-long failure: the candidate gene approach. Instead, we need a genome-wide, unbiased approach. - Barb Bryant
Testing candidate genes was not successful. Only 10-20 successes. - Dawei lin
779 GWA published for 148 traits - Mickey Kosloff
out come - 779 published GWA for 148 trails - Venkata P. Satagopam
For common diseases, GWA was needed - Ted Laderas
but "correlation does not imply causality" - Mickey Kosloff
There have been 779 genome-wide association studies (or regions/genes found?) for 148 traits, with p < 5x10^-8 - Barb Bryant
"correlation does not imply causality" .... - Venkata P. Satagopam
But correlation does not imply causality. - Barb Bryant
The reasons of "Correlation does not imply causality": irreproducibility, lack of randomization, confounding, arrow of time. - Dawei lin
If you can't randomize the experiment you can never prove causality as opposed to just being correlated to the underlying cause. - Barb Bryant
FF lag results in all these duplicate posts - Mickey Kosloff
a lot of efforts are on finding correlation between rare variation and diseases - Dawei lin
rare variation is defined as has <5% in population - Dawei lin
95% of variations is already present in the database - arne
Identified 50 regions that are associated with T2D - arne
with in next few years ... the role of rare and less common variants will be characterized in a variety of diseases - Venkata P. Satagopam
next topic - can we obtains new insights into the basis of disease? - Venkata P. Satagopam
one example - sickle cell anemia - Venkata P. Satagopam
Sankaran et al Science 2008 - Venkata P. Satagopam
Lettre et al PNAS 2008 - Venkata P. Satagopam
Uda et al PNAS 2008 - Venkata P. Satagopam
Crohn's disease: 15 years, no idea what was happening. Now many genes and 3 pathways are identified to be relevant. - Dawei lin
96 loci explain ~25% of cholesterol levels - Mickey Kosloff
Lipid GWAS found 60 loci that are previous unknown. Some of the positives are drug targets already. - Dawei lin
Global lipids consortium, forthcoming Nature paper (Nature paper is mentioned about 20 times !!!) - arne
is there a way to automate validation/function determination? - Ted Laderas
prediction -- will prediction prove useful --this is depending on the clinical testing and the genetic test - Venkata P. Satagopam
prediction will be useful when there's a proven intervention - Mickey Kosloff
BRCA1/2 risk for cancer as an example - Mickey Kosloff
seq tech will increase the reach of genetic methods - Venkata P. Satagopam
mendelian fallacy - sub-populations are easily divisible in terms of risk - Ted Laderas
Prediction will only be useful if there is an intervention that you would not use without the prediction. Otherwise, you should use the intervention anyway. - Roland Krause
Huntington will not be a representative example - for most diseases/people identified risk will be <<100% even with full genetic information - Mickey Kosloff
Cautionary tale - PSA prediction results in over-treatment, hasn't been shown that people live longer because of test - Mickey Kosloff
Very cautious about PSA - no improvements on the mortality but many operations performed. - Roland Krause
genetics offers a path to discover the underlying biology of human diseases ; the great value will drive from pathophysiology and treatment - Venkata P. Satagopam
ISMB
Keynote: Chris Sander - Systems Biology of Cancer Cells
An interview with Chris Sander ... http://www.mskcc.org/mskcc... - Venkata P. Satagopam
Kabsch and Sander paper - over 6000 citations - http://www.ncbi.nlm.nih.gov/pubmed... - Shannon McWeeney
Note the subliminal message in the announcement slide - Iddo Friedberg from Android
Prediction by transparency - no computation necessary story - Shannon McWeeney
Awards should be shared: People working with Chris includes: Burkhard Rost, Alfonso Valencia, Liisa Holm and many more - arne
Announcement of unpublished and new work. A good trend at this ISMB. - Roland Krause
Cancer genome atlas: TCGA - arne
Mapping of molecular alterations (cpy number variation) to 200 glioblastoma samples. http://www.ncbi.nlm.nih.gov/pmc... - Roland Krause
Difference between patients is huge - arne
extract network, find relevant modules. - Roland Krause
illustration of netbox algorithm - Shannon McWeeney
When grouping mutations into pathways up to 85% of GBM have a muation in the most important pathways, while individual genes are down to a few % - arne
Each oncogene may have relatively low frequency across patients; but when you group genes across pathways, a pathway may explain a large fraction of patients with a given type of cancer. - Barb Bryant
"Network pharmacology" - Barb Bryant
can see a change in pathway activation between primary tumor and mets - Mickey Kosloff
Dominant alterations changes between cancer types and states. - Roland Krause
GBM: copy number is rare (and noisier) Ovarian: more regular and higher - arne
profiles of copy numbre variations differ between types of cancers - Mickey Kosloff
Metastatic tumor samples have more copy number changes than primary tumors. Not surprising. But maybe primary samples with more copy number changes than others are more likely to metastasize? Generally, better outcome with fewer somatic copy number changes. - Barb Bryant
BRCA1 and BRCA2 mutations convey germline inherited cancer risk - Barb Bryant
These genes act in the homologous repair pathway. Half of all patients have mutations in some homologous repair pathway gene. - Barb Bryant
and more generally, homologous repair genes are altered in > 50% of ovarian cancer - Mickey Kosloff
Tumor suppressor genes can be inactivated in various ways: germline mutation, somatic mutation, epigenetic silencing, etc. - Barb Bryant
There are drugs under development that might work particularly well in patients with defects in this particular pathway. - Barb Bryant
Cancer genomics portal: www.cbio.mskcc.org/cancergenomics - Barb Bryant
mutationassessor.org - Barb Bryant
Topic shift: now, perturbation cell biology. "and belief propagation". (eh?) - Barb Bryant
Perturbation Cell Biology - arne
In recent past, says Chris, you make a few perturbations: overexpress or knock down a gene; inhibit with a compound, etc. - Barb Bryant
use network inference algorithms - Mickey Kosloff
goal = predictive models for therapy - Mickey Kosloff
with only 200 datapoints -> derive validated (known) pathways - Mickey Kosloff
Prediction of networks does not scale to larger networks - arne
Large data generation with the number of pertubation > than proteins. - Roland Krause
Still prohibitively large number of networks even for small number of nodes. - Roland Krause
Use statistical physics methods to tackle combinatorial explosion of possible networks. - Barb Bryant
Inference using belief propagation known from statistical physics. - Roland Krause
Ah, here is where "belief" comes in. Network inference using belief propagation. Reference Riccardo Zecchina et al. http://users.ictp.it/~zecchi... - Barb Bryant
Instead of going through all the models that are possible, you derive statistical properties across a set of good models for each of the Wij weights in the model. - Barb Bryant
This is sort of like partition functions in statistical physics - Barb Bryant
evolving work on Wij (transition from Nelander et al 2008- http://www.nature.com/msb...) - Shannon McWeeney
Cavity approach - optimize locally on global background iteratively cover all local cavities - Shannon McWeeney
Mm, this is rather opaque to me. - Barb Bryant
"Let me give you some intuition about how this all works." Yes, I'd like that. - Barb Bryant
Nice results on toy experiment - constraints from 10 experiments with 5 interactions (the nodes W in factor graph). - Shannon McWeeney
Almost looks too good - arne from iPhone
after step 1 - generation of probability distributions then step 2- decimation - Shannon McWeeney
So you have a probability distribution for each Wij, which represents the interaction between element i and element j. I'm not really getting how you "update" these probability distributions in the iterative steps. I do understand that at the end you take the most "certain" (narrowest) distribution and fix its value (some Wij) at the most probable value, then update all the other Wij's given this fixation. And so on. To get your final model in a sort of greedy fashion. - Barb Bryant
And by the way, the underlying model is a simple differential equation sort of thing: change of one variable xi is a sigmoidal function of weighted (Wij) sum of all variables xj, less a decay term. - Barb Bryant
thanks for the summary bb - Michael Jones
Mike! - Barb Bryant
Mentions bunches of other stuff in passing. Like bioPAX: paper in press. - Barb Bryant
bioPAX is community project on pathways, ontology, and exchange format. - Barb Bryant
"no science without people; science for the people; ask good questions" - Shannon McWeeney
Biopax.org - arne from iPhone
Ask good questions !!!!! - arne from iPhone
Question: Interacting network tend to be modular, with strongly-interacting subnetworks that interact weakly with each other. ... - Barb Bryant
Chris: Is the modular approach really useful in confronting the data? [Is that what he said?] - Barb Bryant
Question: can you get at causal relationships? - Barb Bryant
Chris: yes - if the network model allows you to predict correctly the result of a particular perturbation applied to a particular node, then you can simulate using that model. - Barb Bryant
Question: with a big network, how many experiments will you need to model? - Barb Bryant
Chris: Good question. Could use an entropy measure. Help us figure this out. Help us design the experiments. It's important because of the costs of experiment. This is going to be broadly applicable in cell biology. - Barb Bryant
bb - he said one should see if approach is useful by confronting with real data - Shannon McWeeney from BuddyFeed
Ah, thx - Barb Bryant
Chris gets at the difference between a model that tells a story and a model that is truly predictive. - Barb Bryant
Question: yes, but, what are the semantics of the graph? What kinds of interaction? Answer: The semantics are in the mathematics of your model. - Barb Bryant
Question: mean field approach is interesting. Compared to Monte Carlo approach, you are assuming some decoupling. Loss of posterior coupling between weights - is that an issue? - Barb Bryant
Chris: If you look at a coupled system overall, the extent to which the algorithms work depends on correlations within the system. Long-range (in terms of network distance) correlations are problematic. There are some clever approaches to handle some of this. Mentions non-ergotic space; deal with parts of space separately or iteratively. - Barb Bryant
ISMB
Keynote: Svante Pääbo - Analyses of Pleistocene Genomes
This will probably be a very interesting talk. Just can't wait. - Tomasz Puton
Not just interesting, but most likely great. Svante is a fantastic speaker - arne
If you’re interested in human history, the genome is a great source of information. To reconstruct history, we compare sequences of people (and other species) living today. We use models of how DNA changes over time to understand the differences that exist today. This is an indirect way to study history, because we are reconstructing from the present what we think has happened in the past. - Barb Bryant
specimens are highly contaminated, .... - Venkata P. Satagopam
mtDNA - advantage of many copies per cell - Mickey Kosloff
original work from 1984 on egyptian mummy - http://www.sciencedirect.com/science... - Shannon McWeeney
Replacement (out of africa theory) vs assimilation (i.e. geneflow from modern humans) - arne
mtDNA is extracted from a specimen from neanderthal - Venkata P. Satagopam
Started with the original neanderthal specimen - arne
The variation in human population origins before the split (as measured by mtDNA) of modern and neanderthals - arne
extract dna from skull, skip PCR and directly sequence - Mickey Kosloff
only 3.5% actually from neanderthal genome - Shannon McWeeney
Average length 50 nucleotides - arne
Vindija Cave, Croatia .... 3 bones - Venkata P. Satagopam
only about 3.5% of dna came from human - Mickey Kosloff
3 billion fragments - again most from bacteria - Shannon McWeeney
most dna is bacterial contaminants - Mickey Kosloff
avg genome cover is 1.5X - Venkata P. Satagopam
most DNA extracted is female look at Y chrom % as contaminant - Ted Laderas
Three females samples (and therefore Y chromosome contamination can be used to calculate noise). Total risk is below 1% risk of contaimination - arne
at any particular position - 1% chance contamination (broken down by source - 3 measures) - Shannon McWeeney
consistant nucleotide chemical changes at 5' and 3' ends - Mickey Kosloff
try to correct by alignments to human and chimpanzee genomes - Mickey Kosloff
Details on bioinformatics and alignment issues (led by Ed Green) can be found in Science paper - http://www.sciencemag.org/cgi... - Shannon McWeeney
55% chance of seeing a position covered by at least 1 read - Ted Laderas
Divergence to human reference genome 12% highest among human is in San 10% - arne
typical european (French) 8% divergence to human reference compared with 12% in neanderthal - Shannon McWeeney
78 amino acid substitutions ... a catalog of novel fixed features in the human genome - Venkata P. Satagopam
But this number will change - arne
novel fixed features in human genome - 78 aa substitutions (in paper) - now down to 50 - Shannon McWeeney
Three out of six proteins with 2 changes are skin expressed - arne
next focused on SNPs - Mickey Kosloff
detection of selective sweeps - look for snps in human, chimps, neanderthals - r egions where neanderthal looks all ancestral. - Shannon McWeeney
S vs cM plot - visual inspection for widest spread - Shannon McWeeney
Most extreme case in THADA, http://www.genecards.org/cgi-bin... Transport and diabeted related - arne
Thada is risk allele for type 2 diabetes - implications for metabolism - Shannon McWeeney
detection of insertion in intron in Thada (not fixed in humans as initially thought in paper) - Shannon McWeeney
3-4% in europe has the neanderthal version (and are protected against Diabetes Type II) - arne
interesting follow-up research here - positive selection yet cost with risk allele - Shannon McWeeney
RUNX2: Mutations cause CCD (Cleidocranial dysplasia) - arne
annotation of others associated with autism and other diseases including CCD - Shannon McWeeney
CCD of interest due to skull morphology phenotype - Shannon McWeeney
Now comes the most surprising result. - arne
focusing on - Interbreeding with modern humans? - Venkata P. Satagopam
Work by Rasmus Nielsen http://ib.berkeley.edu/researc... - arne
Is Craig Venter a "fully modern human" ? - arne
analysis of self-identified neanderthals who write to Svante - predominantly men. - Shannon McWeeney
Comparisons to genomes of humans from different continents suggests interbreeding occured in middle east, before geographic expansion - Mickey Kosloff
:) - arne
45% men who are neandertals, 1% women are neandertals.... - Venkata P. Satagopam
future 10-20x coverage of genome - Mickey Kosloff
Future: (i) Better coverage (10-20x coverage) (ii) Functional analyses of candidate genes Exemplified by FoxP2 http://en.wikipedia.org/wiki... - arne
next topic - functional analysis of genes - foxp2 - Venkata P. Satagopam
FoxP2 is the same in human and neanderthal. - arne
hope to identify backmutations in humans -cheaper to find these people because of low cost of sequencing - Ted Laderas
easier to check phenotypes in mice - Mickey Kosloff
Human FoxP2 in mouse: The mouse can not speak ! Large scale phenotype study (323 phenotypic traits). -> More cautious in a novel area (stays close to the wall). No difference after 3 minutes. Second phenotype: Altered vocalization !!! - arne
323 phenotypic traits ... studied .. - Venkata P. Satagopam
movement more cautious in humanized mice - Venkata P. Satagopam
next one is altered vocalization - Venkata P. Satagopam
Enard et al Cell 2009 - Venkata P. Satagopam
mice with human foxp2 grew longer neurons - Mickey Kosloff
Other hominid forms........ - arne
Denisova individual 1 Myears (400 diffs in mtDNA) - arne
very good keynote - Mickey Kosloff
ISMB
Keynote: Susan Lindquist - Protein Folding and Environmental Stress REDRAW the Relationship between Genotype and Phenotype
Inherited Environmentally acquired traits - lamarck wasn't so insane - Ted Laderas
Protein folding - environment is very important ... showed videos - Venkata P. Satagopam
experiments in heat shock tolerance -initial small shock allowed for survival -hsp proteins are made in massive amounts - role in protein folding - Ted Laderas
Hsp90 a special chaperone. - John Greene from fftogo
in excess in cell - acts as homeostasis buffer - Ted Laderas
Hsp90 a special chaperone. - John Greene from fftogo
hsp70 helps early stage folding and works with a number of proteins, but not hsp90. - Dawei lin
HSP 90 a special chaperone ... because very abundant, it induced by two folds, it has extra folding capability .... acts as a buffer - Venkata P. Satagopam
Signal transduction networks & HSP90 ... Hanahan and Weinberg, cell 2000 - Venkata P. Satagopam
showed signal transduction network involved by hsp90. It seems pretty spread. - Dawei lin
hsp90's function is found by an accident. - Dawei lin
Hsp90 mutations in fruit flies leads to death of flies - Venkata P. Satagopam
some mutated fruit flies survived revealed that hidden genetic variant. Hsp90 does not destabilize development. - Dawei lin
raise fly at high temperature can reduce the amount hsp90 level, easier than did it in a genetic way - Dawei lin
Hsp90 a special chaperone. - John Greene from fftogo
acts as a capacitor for some variation .... it also acts as a potentiator for other variation - Venkata P. Satagopam
both fly and arabidopsis experiments show that hsp90 acts a buffer for variations. - Dawei lin
hsp90 can complex with inactive hormone receptors and oncogenic kinases - Dawei lin
hsp90 helps mutated kinase, which lost the ability to inhibit itself. - Dawei lin
same thing happened in human diseases. - Dawei lin
so hsp90 inhibitor can be used for a drug. - Dawei lin
there are a few fungi drugs available clinically - Dawei lin
remove level hsp90 buffer level completely removed the drug resistance evolution - Dawei lin
Raise temperature again eliminated the drug resistant development. Should the patient be put in fever stage. - Dawei lin
talked about some unpublished data - Dawei lin
Where is the bioinformatics ? - arne
when added hsp90 inhibitor, some traits disappeared but some showed up - Dawei lin
ame, people who analyzed the data, :-) - Dawei lin
it should be a huge work to make genotype to phenotype map - Dawei lin
the polymorphism in NFS1 that required for tRNA modifications caused the phenotype charge - Dawei lin
polymorphisms in 3' UTR of HNI1 is also affected by hsp90, but not directly instead through the proteins binding to that region. - Dawei lin
has Hsp90 left an imprint on genomes that exist today? - Dawei lin
hsp90 affects polymorphisms throughout the genome even non-coding, in combinatorial way - Dawei lin
it is benefit to human is the big reason to continue the research. - Dawei lin
yeast prions - genetic element based on protein conformation - Saravanamuttu Gnaneshan
prion also switches on with environmental stress - Saravanamuttu Gnaneshan
prion has similar behavior of hsp90. It generates new phenotypes. - Dawei lin
ISMB
Keynote: Steven Brenner - Ultraconserved nonsense: gene regulation by splicing & RNA surveillance
ISMB2010 just kicked off - Venkata P. Satagopam
Prof Søren Brunak introducing Steven Brenner, ISCB overton prize winner - Venkata P. Satagopam
Brenner contributed to many fields in bioinformations, starting in structureal biology ober RNA to metagenomics. - Roland Krause
A short biography, summarizing Soren Brunaks kind introduction http://compbio.berkeley.edu/people... - Roland Krause
The morphology of steves paper: http://www.improbable.com/airchiv... - Shannon McWeeney
Intro: The ultraconservative (as seen from Berkely) and nonsense (as found in Through the Looking Glass - Roland Krause
The jabberwocky poem does have meanings and is elegantly crafted. - Roland Krause
Generally, nonsens in biology is bad. - Roland Krause
Nonsense is generally bad, even in a codon - Venkata P. Satagopam
Truncated proteins might interfere with physiological function (dominant negative). The cell removes such transcripts through nonsense-mediated decay (NMD). - Roland Krause
Good example for NMD: Sox10 - Roland Krause
Mutations early in the gene leads to less severe phenotypes than later ones - Roland Krause
NMD is an mRNA surveillance system - Venkata P. Satagopam
NMD important to development of the immunesystem and cleans up other transcriptional errors. - Roland Krause
We do not know how NMD works outside the mammals. - Roland Krause
The mechanism involves the splicing machinery. If a stop is found wwithin 50nt upstream of the exon junction complex, it is removed.. - Roland Krause
50 nucleotide rule - translated normally or degraded by NMD - Venkata P. Satagopam
brilliant nytimes article title - surviving on low number of genes - Shannon McWeeney
splicing can introduce PTC - premature termination codon - Venkata P. Satagopam
AS as mechanism to introduce PTCs - can lead to unproductive splicing - Shannon McWeeney
these isoforms often have PTC - Venkata P. Satagopam
Humans have fewer genes but better genes, due to AS. - John Greene from fftogo
Are PTC splice forms funcitonal? - Venkata P. Satagopam
Many PTC mRNAs are noise - Venkata P. Satagopam
analgous mechanism to shrinter: http://www.geeky-gadgets.com/cool-ga... - Shannon McWeeney
Humans have fewer genes but better genes, due to AS. - John Greene from fftogo
Alt splicing can yield isoforms differentially subjected to NMD - Venkata P. Satagopam
SR protein - 11 in human which are serine and arginine rich - Venkata P. Satagopam
SR proteins have premature stop codons. - Roland Krause
SR genes has mRNAs with premature termination codons - Venkata P. Satagopam
AS of PTC isoforms is mechanism for autoregulation of proteins - Ted Laderas
NMD has a large effect on isoform abundance - Venkata P. Satagopam
NMD has impact on isoform abundance - example of NMD clearing the major isoform - Shannon McWeeney
minor isoforms are only shared 25% of time - modrek and lee 2003 - Shannon McWeeney
Not just anecdotal stories, splice patterns are conserved in mouse, implying functional significance. - Roland Krause
(Unpulbished work) - Roland Krause
All the SR proteins are talking to each other - Venkata P. Satagopam
SR proteins 'compensate' for each other via coupling via AS and NMD - Ted Laderas
SR genes have ultraconversed elements .. Bejerano et al 2004 Science 304: 1321 - Venkata P. Satagopam
Most ultraconserved regions are in intergenic regions, the regions in SR within genes. - Roland Krause
question of why conserved - not protein coding, no obvious significant RNA secondary structure - Shannon McWeeney
No proteins are produced from these genes. - Roland Krause
The reason why SR sequences are highly conserved - most of the seq are not protein coding, - Venkata P. Satagopam
no repetitive elements - Venkata P. Satagopam
why conserved part 2 - no overrepresentation of binding / regulatory elements - Shannon McWeeney
No simple explanations e.g. from miRNA binding etc. - Roland Krause
no similarity elsewhere in genome except retropseudogenes - Venkata P. Satagopam
analysis on origin of unproductive splicing - Shannon McWeeney
No sequence similarity between the conserved elements. Seems to have been introduced mutliple times. - Roland Krause
mouse and human SRp55 conserved but changing - Venkata P. Satagopam
working on chordate SR proteins - Venkata P. Satagopam
here intron and exon structure is more informative - Venkata P. Satagopam
at this point - he has requested no further blogging - unpublished work - Shannon McWeeney
no blog slides may be over - Burkhard Rost
# Looks like interesting work. - Roland Krause
wonderful talk - Shannon McWeeney
Tells a (hard to blog) story about the successful treatment of collaborator with novel treatment based on genotyping. - Roland Krause
Wow - what a conclusion! Fantatic talk... - John Greene from fftogo
# Certainly great work. The talk was nice too, and he only bitched at other reseachers in person once, another step up. - Roland Krause
"Ultraconversed elements in SR genes ONLY show similarity to retropseudogenes" - what does this mean? Any takers? - Saravanamuttu Gnaneshan
Venkata P. Satagopam
Ivica Letunic (http://vizbi.org/2010...) Phylogenetics
Drawing trees ... euclidean space is used by the majority of tree viz tools - Venkata P. Satagopam
tree layouts .... determined by presence of the root and presence of the evolutionary timing info - Venkata P. Satagopam
standard formats : rectangular cladograms and phylograms - Venkata P. Satagopam
which is most commonly used, suitable from small trees, and suitable for extensive annotation - Venkata P. Satagopam
other format is circular cladograms and phylograms ... becoming more popular, suitable for medium to large trees and suitable for extensive annotation - Venkata P. Satagopam
circular formats are much more convenient compared to rectangular format - Venkata P. Satagopam
various other layout: slanted, curved , inverted circular phylograms - Venkata P. Satagopam
drawing trees in hyperbolic space : circumference and area increase exponentially instead of geometrically - Venkata P. Satagopam
trees based on hyperbolic space - limited selection, often not primary phylogenetic tools - Venkata P. Satagopam
standard tree viewers using euclidean space .. TreeDyn, Dendroscape, FigTree, Archaeopteryx - Venkata P. Satagopam
Annotators : iTOL (http://itol.embl.de) .. - Venkata P. Satagopam
FigTree - small, fast - modern tree viewer ...support basic automatic annotation based on confidence values - Venkata P. Satagopam
Annotating trees with additional data - trees are used as a backbone in various non-phylogenetic studies and scientific fields, iTOL was one of this kind of tools - Venkata P. Satagopam
it is web based tool ...supports tree annotation with 10 different data types from simple bar-chats to head maps and protein domain architectues - Venkata P. Satagopam
iTOL provides person user accounts for free storage and organization - Venkata P. Satagopam
challenges : displaying large trees; annotation with data (iTOL address this issue); trees on the web - Venkata P. Satagopam
Venkata P. Satagopam
Julie D. Thompson (http://vizbi.org/2010...) Gene & Function Evolution
alignment base studies ... identification of genes responsible for BBS . a rare recessive autosomic genetic desease - Venkata P. Satagopam
multiple seq alignment based analysis of the new gene .... BBS10 indicates chaperonin-like fold .. - Venkata P. Satagopam
look Procter et al Nature methods 2010...for MSA viz tools ... - Venkata P. Satagopam
Jaeschke at al Knowledge and visualization 200 - Venkata P. Satagopam
challenges : from interaction network to MSA of each node - Venkata P. Satagopam
alternative viz approaches : partial order alignments - Venkata P. Satagopam
Venkata P. Satagopam
Geoff Barton (http://vizbi.org/2010...) Visualization of Sequence Alignments
combined multiple seq alignments citations 60,000, blast 52,000 - Venkata P. Satagopam
1984 multiple sequence alignment stared with writing the sequences on paper - Venkata P. Satagopam
1991 jalview project stared ...widely used java applet, current version 2.4 coming with lot of functionality - Venkata P. Satagopam
challenges today ... larger protein families has > 100, 000 sequences - Venkata P. Satagopam
whole genome alignments - Venkata P. Satagopam
jalview used to visualized NGS data - Venkata P. Satagopam
future : printing on very big sheets of paper?; need new viz hardware; fast high resolution screens - Venkata P. Satagopam
Jim Procter
Helen Saibil (http://vizbi.org/2010...) Electron Microscopy & Multi-scale Assemblies
EM maps are stored in EMDatabank.org - Venkata P. Satagopam
registration between em maps and protein structure is a recent issue that has been addressed - Jim Procter
tools for em visualization are very powerful, but extremely complicated for the uninitiated user (scripts and modules are hard to access, and software is also expensive) - Jim Procter
it would be nice to incorporate movies into publications to convey this complicated data to community - Venkata P. Satagopam
Jim Procter
Rebecca Wade (http://vizbi.org/2010...) Molecular Dynamics
md visualization challenges: examples with haloalkane dehalogenase - Jim Procter
large no. of degrees of freedom - Venkata P. Satagopam
long timescales ... i.e. large no. of timesteps - Venkata P. Satagopam
pymol movie - simulation. Demonstrates the protein simulated within its solvent environment. - Jim Procter
viewing -- Stereo-viewing ...with glasses - Venkata P. Satagopam
Virtual reality - CAVE - Venkata P. Satagopam
advantages and disadvantages of 3d projection/immersion for molecular simulation - Jim Procter
some devices to feel ... Haptic devices and 3D printing - Venkata P. Satagopam
haptic devices are difficult to adapt because inter-atomic forces have a very large dynamic range (repulsion is extremely strong, so moving parts of a molecule through itself is very hard) - Jim Procter
MoDEL; Dynameomics for protein molecular dynamics simulation trajectories on the web - Venkata P. Satagopam
also 'dynameomics' ... technical issues need to be addressed for distributing trajectories (MoDEL provides QoS specification for the user) - Jim Procter
DSMM - db of simulated molecular motions - Venkata P. Satagopam
elastic network models -- GNM of DBMM server, ANM server, - Venkata P. Satagopam
Jim Procter
Visualization is insight. Not images. (Hamming) - Jim Procter
every molecule in RNA is negatively charged - Venkata P. Satagopam
sequence alignment annotation showing watson crick base pair as (..) (term representation) - Jim Procter
colour represents form of secondary structure. hairpin, helix, helix with bulge probability.. etc. - Jim Procter
triangular representation of RNA interactions (hoogsteen, watson-crick or sugar edge) - Jim Procter
6 ways of cis base-paring and 6 ways of trans base-pairing - Venkata P. Satagopam
enables the annotation of structure alignments with interaction type - eg highlight non-wc interactions - Jim Procter
and reduction of 3d structure to 'condensed' 3d structure representation as a textual diagram (which also properly represents pseudoknots) - Jim Procter
tools ....S2s (http://bioinformatics.org/s2s) and ASSEMBLE : from 1D to 2D to 3D - Venkata P. Satagopam
hypervariable insertions are common in rna structure alignments (loops varying from 6 or 7 bases to 100's) making it difficult to reproduce alignments using conventional sequence alignment methods. - Jim Procter
alignment vis of interactions provided by s2s is a 'wrapped' view - where interactions between positions in different blocks are unwrapped and visualized as lines between successive alignment block columns (eg. block 1 containing cols 1:20 interacts with block 2 containing 21:40) - Jim Procter
Venkata P. Satagopam
Roman Laskowski (http://vizbi.org/2010...) Ligand Binding Sites
we don't know function for 3rd of known proteins structures - Venkata P. Satagopam
tool AutoLigand used to find binding sites - Venkata P. Satagopam
protein ligand interactions identified with the help of tool LIGPLOT - Venkata P. Satagopam
other tools MOE, PoseView, AstexViwer - Venkata P. Satagopam
comparing binding of diff ligands to same protein ... some times binding to different parts of protein ...catalytic site, allosteric site etc ...hence they can function as an inhibitor or an enhancer - Venkata P. Satagopam
one can find ligand clusters from PDBsum - Venkata P. Satagopam
STITCH: protein-drug interaction networks .... nice resource - Venkata P. Satagopam
another important resource DrugBank ... it contains information about target also .... further more it tells which part of protein (eg domain) the drug is binding - Venkata P. Satagopam
structure information coming from pdb some times can be peptide or domain, may not be full protein - Venkata P. Satagopam
archSchema - relate protein architecture as defined by pfam domain content to characterised protein structures - Jim Procter
and to viz and query proteins having similar domain architectures, gives information about uniprot seq containing domains and there by show related pdb structures - Venkata P. Satagopam
Venkata P. Satagopam
VizBi 3rd day Michael Nilges (http://vizbi.org/2010...) Proteins
NMR structure determination steps - NMR experiment; resonance assignment, structural restraints; structure calculation and validation - Venkata P. Satagopam
difficulty in structure calculation is many degrees of freedom - Venkata P. Satagopam
data integration platform - ARIA - Venkata P. Satagopam
PDB is the primary source of structures currently contains around 65k structures - Venkata P. Satagopam
ensembles in NMR structure reflects incompleteness and inconsistency of data ... - Venkata P. Satagopam
majority of soluble and membrane-bound proteins in modern cells are symmetrical oligomeric complexes with two or more subunits - Venkata P. Satagopam
Some imp tools ... VMD, PyMol, MolMol, gene V, Pilus - Venkata P. Satagopam
Jim Procter
Inna Dubchak (http://vizbi.org/2010...) Comparative Genomics
gallery of tools showing different approaches for visualizing aligned genomes. - Jim Procter
visualization of whole genome alignment ... for this purpose vista is not useful but Pill-shaped ideogram representation of chromosomes became very popular - Venkata P. Satagopam
why dot plots are still useful. Rearrangements. - Jim Procter
VISTA-Dot based on google map technology .. for complete genome alignments - Venkata P. Satagopam
other tools SynView, SynBrowse, Phigs, Ensembl, Apollo from Ensemble, VISTA synteny viewer, Populus genome (science 2006), CIRCOS : http://mkweb.bcgsc.ca/circos/ - Venkata P. Satagopam
populus genome: 2d alignment trace (rather than 1d projection) way of embedding alignment traces with minimum crossing of the syntenic region indications - Jim Procter
Jim Procter
Ting Wang (http://vizbi.org/2010...) Genome Browsers
UCSC genome browser developed from a viewer created to view the draft human genome to something that is capable of querying and displaying genomic annotation and the community's own data - Jim Procter
post genome omics. postomics ?!? - Jim Procter
2000 custom tracks on the UCSC browser model is not useful, but needed for the cancer genomics projects - Jim Procter
challenges for "next-gen" genome browser ... data volume, security, data type and presentation - Venkata P. Satagopam
it contains data tracks like genome browser, the difference here ... the cancer genome browser contains genome heatmap and clinical heatmap - Venkata P. Satagopam
heatmaps as query interfaces - Jim Procter
and as annotated matrices- append t-tests (heatmaps == alignments in this context? or maybe not) - Jim Procter
extending genome browser toward epigenomic browser - Venkata P. Satagopam
'epigenomic landscape' is inherently multidimensional - better filtering approaches are needed to view relevant slices of this space - Jim Procter
Jim Procter
David Gordon (http://vizbi.org/2010...) Sequencing & Assembly Finishing
need to know in visualization - individual base calls are not interesting for next gen sequencing - typically these are automatically filtered. - Jim Procter
NGS challenges - too much data, generation of data is faster than it can be stored to disk, viz programs can't hold all the data in memory at once - Venkata P. Satagopam
1979 ... wrote a program overlap to align reads - Venkata P. Satagopam
sequencher - first example of 3-frame translation - Jim Procter
then "GCG", then 1991 "Xdap", then "DNA star", later came a software "Sequencer" - Venkata P. Satagopam
some other important tools - Consed, M.A.Q viewer, - Venkata P. Satagopam
packing technique - where reads are collapsed onto one line gives a 2:1 reduction in area required... still not enough for next gen. - Jim Procter
common convention from consed ? - topstrand marked by an arrow indicating direction. - Jim Procter
"Finishing" - after assembly, the resulting seq is often not correct, finishing helps to correct it - Venkata P. Satagopam
misassembly is very difficult to manually detect, highlighting incomplete mate pairs can reveal incorrect assembly - Jim Procter
Jim Procter
Chris North (http://vizbi.org/2010...) Usability & Evaluation
Speaker changed his title from Usability to Usomics - Venkata P. Satagopam
Myths about Usability ---- Usability = Voodoo; Usability = Learnability - Venkata P. Satagopam
Science of Usability -- Phenomenon --> Measurement --> Modeling .. analogy to biology - Venkata P. Satagopam
Usability = Simple task performance; Usability = Expensive - Venkata P. Satagopam
suggested to read Sense-Making loop for analysts by Pirilli & Card, PARC - Venkata P. Satagopam
Jim Procter
Hiroaki Kitano (http://vizbi.org/2010...) Biochemical Networks
pathways are visualizations of *curated* data - there is no raw data. Significant difference to the rest of the vis'n. applications in the sysbio session. - Jim Procter
Cancer related pathways .... Hanahan and Weinberg, Cell, 2000 - Venkata P. Satagopam
EGF Receptor pathway ... Oda, et al Mol. sys. biol. 2005 - Venkata P. Satagopam
very nice definition of the multi-resolution vis'n requirements - detailed view (molecular interaction map) vs high-level pathway maps. - Jim Procter
the canonical map: represent a snapshot of knowledge at the time, and omissions are necessary to make points clear - Jim Procter
nuances: abstract block (and arrow) diagrams do not provide a recipe for reconstruction, where as a circuit diagram encodes the composition of the system precisely. - Jim Procter
kohn map - localisation by box partitions, detailed (complex) routing with labels indicating interaction. - Jim Procter
need lot of annotations to avoid loss of information - Venkata P. Satagopam
standards of model building - SBML, CellML, SBGN - Venkata P. Satagopam
sbgn.org - standard graphical notation - Jim Procter
sbgn is three faceted (process, entity, activity) - some analogies with UML - Jim Procter
Three languages in SBGN - Process description, Entity relationship, Activity flow - Venkata P. Satagopam
SBGN software supported by around 180 data providers - Venkata P. Satagopam
PANTHER is using cellDesigner to display the pathways - Venkata P. Satagopam
PAYAO - connected PathText - Venkata P. Satagopam
collaborative annotation and curation is essential for pathways. PAYAO provides community annotation support for sbml and reference databases. - Jim Procter
WikiPathways .... another source of pathways - Venkata P. Satagopam
challenges: flexible multiresolution - presumably focus+context approaches have already been applied... - Jim Procter
..and exploiting physical interactions and kinetic information - Jim Procter
Jim Procter
Nitin Baliga (http://vizbi.org/2010...) Rapid Inference and Re-engineering of Biological Circuits
visualization tools are components (interfaces perhaps) to knowledge bases. - Jim Procter
'star clocks' - polar plots indicating fitness for a particular environment (context of tfb gene family expansion and combinatorial control) - Jim Procter
implication ... conditionally active internal promoters - Venkata P. Satagopam
approach to deconstruct biological circuits: perturb, observe and model - Venkata P. Satagopam
cMonkey : to discover co-regulated genes - Venkata P. Satagopam
gaggle based integration enabling manual inspection of many different exploratory analysis tools (and web interfaces) that provide a high-level window on the analysis of the raw data. - Jim Procter
Firegoose another tool from the same group - Venkata P. Satagopam
Firegoose is a firefox extension - Venkata P. Satagopam
Jim Procter
Alexander Goesmann (http://vizbi.org/2010...) Metabolomics Data
placing visualization in context - top level of most analysis software architectures, and have to bridge datasets from different specific fields (genomics, physical observation) - Jim Procter
genome analysis with SAMS, GenDB, Carmen, and EDGAR - Venkata P. Satagopam
GenDB - distributed genome annotation - Venkata P. Satagopam
metabolomics provides a way to validate the presence of a pathway (or alternate pathway). - Jim Procter
CARMEN is useful to compare new genome with already curated genome wide metabolic pathway - Venkata P. Satagopam
Motabolomics field started in 2003, now pubmed contains around 1500 papers related to this topic - Venkata P. Satagopam
Ryals, Metabolon 2007 - Venkata P. Satagopam
greyscale fingerprint of chromatography profile - stacking independent runs demonstrates drift through column aging, indicating re-alignment(registration) is required - Jim Procter
interesting tie in with visible cell - visualization of metabolomic correlations should corroborate pathway localisations (where they are known, or exist) in visible cell. - Jim Procter
ProMeTra --- multiomics data integration platform ... http://prometra.cebitec.uni-bielefeld.de - Venkata P. Satagopam
Neuweger et al BMC Syst biol. 2009 - Venkata P. Satagopam
other tools used -- mzMine2 - Venkata P. Satagopam
XCMS & XCMS2 - Venkata P. Satagopam
The MelfDB software platform - Venkata P. Satagopam
Jim Procter
Oliver Kohlbacher (http://vizbi.org/2010...) From Spectra to Networks - Visualizing Proteomics Data
bottleneck in proteomics has now shifted to the analysis and interpretation of results. - Jim Procter
analysis of data ..... roughly 200GB per run - Venkata P. Satagopam
overview of shotgun proteomics: large raw dataset - boiled down to a small amount of refined quantified identifications. Typical end result that is desired is an excel spreadsheet. - Jim Procter
100GB raw data to 1KB above mentioned spreadsheet - Venkata P. Satagopam
Visualization is used for quality control, manual analysis and also for the validation of automatic analysis. - Jim Procter
Tools ... Peiptide ID - Venkata P. Satagopam
typically, instrument software supports basic diagnostic and baseline correction visualization, but are currently not designed for high-throughput data collection. - Jim Procter
Maps - stacking the spectra yields maps - Venkata P. Satagopam
Linsen et al visual analysis of proteomics data - Venkata P. Satagopam
Maps and Features - VIP ... Giannopoulou et at VIP: viz of integrated proteomics data - Venkata P. Satagopam
nice overview of peak cleaning process : demonstrates the labour required to curate spectra via a map, just for a single protein. - Jim Procter
few tools --- STRAP --- Bhatia et al - Venkata P. Satagopam
BiNA - Biological network analyzer - Venkata P. Satagopam
Strap was browsed through a little fast. Better to review video to see how the go and network visualization is achieved by these tools. - Jim Procter
Challenges -- data volume - Venkata P. Satagopam
varying levels of details , usability; Integration with other omics data and networks...etc - Venkata P. Satagopam
questions ... post translational modifications is a problem in proteomics - Venkata P. Satagopam
question from the chair: can web based analysis systems help ? problem is raw data is Gb^n - meaning that its difficult to ship to distributed facilities (currently) - Jim Procter
Jim Procter
Day2: Matt Hibbs (http://vizbi.org/2010...) Visualization of transcriptomics data
introduction to transcriptomics - what's measured, and how. - Jim Procter
then and now: cDNA 1 and 2 colour microarrays, and RNA-seq - visualization requirements are different for these two technologies. - Jim Procter
uArrays: diagnostics (normalisation, box-whiskers), and then exploration: heatmaps and parallel coordinates. - Jim Procter
clustering is intrinsically necessary for making sense of heatmaps (does this follow for parallel coordinates ?)... and the most important aspect is the distance metric used to cluster. - Jim Procter
heatmap software: one example is java treeview. - Jim Procter
Parallel coordinates: great for interacting and subselecting based on specific interval criteria, but they get very cluttered. - Jim Procter
Dimensionality reduction: Nice overview of SVD - Jim Procter
Sequence based visualization - copy number variation - Venkata P. Satagopam
Mentions relative insignificance of principle component (all noise) vs second and third dimensions from SVD... is this true for the other approaches ? - Jim Procter
visualization tools MeV from TIGR - Venkata P. Satagopam
Commercial tools like spotfire, genespring - Venkata P. Satagopam
.. tools for comparative viz - provide multimodal visualization of single or multiple datasets. - Jim Procter
hidra: side by side javatreeview like displays, with built in go enrichment test. - Jim Procter
notes to tool designers and developers: if a wet bench scientist can't access it, why bother ? - Jim Procter
future ... incorporate common statistical analysis techniques with viz ...eg differential expression tests, GO enrichments etc .... isoforms & splice variants ... - Venkata P. Satagopam
and the jackson lab is hiring! - Jim Procter
ISMB/ECCB
Keynote: Webb Miller - Bioinformatics Methods to Study Species Extinctions
10 Steps to Success in Bioinformatics - sebi
has been 40 yrs in computers, 20 yrs in bioinformatics: but doesn't have enough money to retire (tongue-in-cheek) - Michael Kuhn
Step 1: Become a biologist. - sebi
extinction - Venkata P. Satagopam
Extinction: How to save endangered species - Peter Menzel
which species are in trouble? - Venkata P. Satagopam
Here's the article "10 steps to success..." http://www.iscb.org/iscb-pu... - Michael Kuhn
3 stories - Venkata P. Satagopam
(out of batteries, not sure the iPhone is adequate for this. So far a great intro :) ) - Oliver Hofmann from iPhone
3 studies: Tasmanian tiger, Mammoth, ... - Peter Menzel
explains how DNA can be found in hairs. - Peter Menzel
For extinct species, mitochondrial genome is easier to sequence, as it is up to 1,000 times more abundant than genomic DNA - sebi
aDNA = ancient DNA - Peter Menzel
background of ancient DNA ... - Venkata P. Satagopam
what can be learned from aDNA? - Venkata P. Satagopam
1. phylogenetics - Peter Menzel
2. population genetics - Peter Menzel
This has some parallels to the last talk in today's session in T1 by RE Green: Neandertal mitochondrial DNA to place them (and us) in the phylogenetic tree - sebi
3. directly observe evolutionary rates - Peter Menzel
4. observe evolution of function - Peter Menzel
short reads are good enough for aDNA junks - Peter Menzel
next-generation seq of aDNA - Venkata P. Satagopam
aDNA from hairs can easily be decontaminated - Peter Menzel
aDNA from hair shafts enclosed in a plastic bag like .... - Venkata P. Satagopam
Tasmanian tiger: extinction date 7/9/1936 - Peter Menzel
over 700 know specimens - Venkata P. Satagopam
thylacine = tasmanian tiger - Peter Menzel
attempts to study thylacine DNA - Venkata P. Satagopam
observations - Venkata P. Satagopam
2 mitochindrial genomes in GenBank - Peter Menzel
in GenBank two genes 12S, cytb both were wrong - Venkata P. Satagopam
both wrong at 10% of nucleotides - Peter Menzel
tasmanian devil closest relative to tiger - Peter Menzel
split at ~40m years ago - Peter Menzel
30% of new sequence data are from nuclear genome - Peter Menzel
-> maybe sequence whole genome? - Peter Menzel
one imp question ... did an epidemic contribute to extinction? - Venkata P. Satagopam
Epidemic probable cause of extinction - Peter Menzel
(the quiet comments are hillarious. 'we sequenced one from Stockholm swimming in alcohol. Long way from home, extinct, drowning his sorrows...') - Oliver Hofmann from iPhone
story 2 wooly mammoth - Venkata P. Satagopam
Mammoth: - Peter Menzel
hair sample from eBay - Peter Menzel
we found sample on ebay - Venkata P. Satagopam
for 90$ :-) - Peter Menzel
most samples are from Sibiria - Peter Menzel
have now 18 different complete mtDNA sequences - Peter Menzel
what we draw from this analysis ....mammoth separated from living elephants 6m years ago - Venkata P. Satagopam
like chimp and human - Peter Menzel
want to sequence the full nuclear genome - Peter Menzel
already got 0.7-fold coverage - Peter Menzel
= 3.3 billion bp - Peter Menzel
assuming real size of genome is 4.7Gb - Venkata P. Satagopam
aa identitiy is nearly 99.8% between mammoth and elephant - Peter Menzel
99.4% overall nucleotide identity - Peter Menzel
This equals 1 mutation per protein - sebi
(all this excluding indels) - Peter Menzel
90% seq from the sample from the wooly mammoth - Venkata P. Satagopam
Mammoths differ from all other vertebrates in some highly conversed genes, such as MRTO - Venkata P. Satagopam
maybe due to living in cold places - Peter Menzel
extinction scenario ...human killed them all - Venkata P. Satagopam
certainly not - Venkata P. Satagopam
extinction szenario: killed by humans.. become to warm... -> but thats' not true - Peter Menzel
humans arrives to late in Siberia - Peter Menzel
3rd story Tasmanian devil - Venkata P. Satagopam
tasmanian devil now - Peter Menzel
threatened by extinction due to cancer - Peter Menzel
cancer cells passed around by biting each other - Peter Menzel
cancer is passed through infection (biting other individuals, spreading cancer cells) - sebi
some resistant individuals - sebi
Trying to preserve the species, keeping the gene pool large enough - sebi
compared two individuals: one who is resistant, one who died - Peter Menzel
Finding out about the resistance: considering an ortholog of a human tumor supressor gene - sebi
the guy who died, has a mutation at an extremely conserved position - Peter Menzel
but not much is known what it means.. - Peter Menzel
want to look at non-coding nuclear SNPs - Peter Menzel
closest sequenced relative is oppossum - Peter Menzel
90m years of separation - Peter Menzel
-> computational problem: SNPs without a reference genome - Peter Menzel
produce sequences from many individuals instead - Peter Menzel
and compare SNPs there - Peter Menzel
-> extract population structure - Peter Menzel
seems to work.. but no final data... - Peter Menzel
Quick summary: Bioinformatics very helpful, but much more data needed - Peter Menzel
Role of genetic diversity in extinction still not really known.. no simple story - Peter Menzel
but genetic diversity not necessary a good indicator for risk of extinction.. - Peter Menzel
e.g. panda has high diversity, but endangered - Peter Menzel
TV documentary from Australia: http://www.abc.net.au/catalys... - Peter Menzel
(only to get the blog on top of the ISCB portal site; the figures messed up our layout) - Reinhard Schneider
Related: 11 Extinct Animals That Have Been Photographed Alive: http://ecoworldly.com/2009... - Peter Menzel
ISMB/ECCB
HL57: Erik Sonnhammer - FunCoup: global networks of functional coupling in eukaryotes
How to reconstruct networks - experimental networks are incomplete. - Roland Krause
Up to 300.000 interactions are proposed for human - Roland Krause
only 35000 known - Ruchira S. Datta
Each experimental method can give you more than 20% of the interactions. - Roland Krause
experiments have high false negative and false positive rates - Ruchira S. Datta
e.g., false positives from in vitro experiments: the interaction may never happen in a living cell - Ruchira S. Datta
Interactions have to be combined and evaluated. - Roland Krause
there are many kinds of evidence for functional coupling - Ruchira S. Datta
Lots of evidence for functional coupling, not only from PPI but als from localization, gene expresson, interacting domain, TFBS; miRNAs. - Roland Krause
domain interactions - Venkata P. Satagopam
Integrate different kind of data from various organisms. - Roland Krause
Some links are continous, some binary, etc. - Roland Krause
using Naive bayesian training - Venkata P. Satagopam
full Bayesian training would be too computationally heavy - Ruchira S. Datta
Naive Bayesian training, going from continuous data to distinct bins. - Roland Krause
compare with positive and negative reference datasets - Ruchira S. Datta
There is no negative reference data set out there, genes in different compartment might actually interact. - Roland Krause
therefore use random examples as negative set - Ruchira S. Datta
calculate enrichment as likelihood ratio=P(+)/P(-) - Venkata P. Satagopam
Learn log likelihoiod ratios for each evidence, requires large negative set. - Roland Krause
sum all the log-likelihood ratios to get full bayesian score - Ruchira S. Datta
4 different flavors of training sets: metabolic pathway, signaling pathway, physical ppi, and complexes from UniProt - Ruchira S. Datta
Training sets are from KEGG (metabolic and signaling), HPRD and Complexes from Uniprot. - Roland Krause
using the curated data only for training/validation, not as input to the networks - Ruchira S. Datta
No curated data used in the network, only in training. - Roland Krause
7 model organisms and human are combined with 50 individual data sets. - Roland Krause
Convert log scores to confidence scores. - Roland Krause
predict coupling between 2 genes, for each model FC-PI, FC-CM, FC-ML, FC-SL model - Venkata P. Satagopam
convert into confidence scores, which may be different in different models - Ruchira S. Datta
convert the Bayesian score using the probabilitly of functional coupling, which is unknown but they just set to 1/1000 ad hoc - Ruchira S. Datta
algorithmic innovations- we have to develop new evidence scores - Venkata P. Satagopam
Used input from PPI as continuous data, using experimental counts. - Roland Krause
discretization caused problems - Ruchira S. Datta
they figured out how to discretize - Ruchira S. Datta
test significance with chi-squared test - Ruchira S. Datta
integrate evidence from orthologs used by InParanoid - Ruchira S. Datta
using inparanoid orthologs - Venkata P. Satagopam
Used transfer of ortholougs information under the same Bayesian framework - Roland Krause
new phylogenetic patterns using orthologs - Ruchira S. Datta
Except for yeast, most species had more information transferred rather than generated for the organism. - Roland Krause
most support coming other species, other wise we may miss - Venkata P. Satagopam
many links would not have been found without evidence from other species - Ruchira S. Datta
only yeast is well supported all by itself - Ruchira S. Datta
validated networks, TCGARN science 2008 - Venkata P. Satagopam
Validation using cancer pathways, recovering 29 of 36 links, found an additional 25. Not entirely independent. - Roland Krause
Independent validation from recovering tumour mutation sets. - Roland Krause
applications - 1 . exploring local networks, 2. analyzing network conservation - Venkata P. Satagopam
Easy exploration of the data sources leading to an edge in the network. - Roland Krause
graph visualization: started with Medusa, but it didn't fulfill needs - Ruchira S. Datta
now developed JSquid (sp?), published last year - Ruchira S. Datta
used jSquid .. Bioinformatics 2008 - Venkata P. Satagopam
New view of the human disease network, build tree of disease interactions. - Roland Krause
conclusions - Venkata P. Satagopam
to discover novel functional coupling between genes - Venkata P. Satagopam
Cancers group together, neurological diseases do not. - Roland Krause
can expand gene sets such as pathways - Venkata P. Satagopam
(only to get the blog on top of the ISCB portal site; the figures messed up our layout) - Reinhard Schneider
ISMB/ECCB
Keynote: Mathias Uhlen - A global view on protein expression based on the Human Protein Atlas
Introduction: Works a lot on affinity reagents. Invented and developed pyrosequencing technology (http://en.wikipedia.org/wiki...) now used in 454 - Allyson Lister
Out line of the talk - 1. systematic biology -introduction, 2. HPR project 3. The Human protein Atlas - Venkata P. Satagopam
18th century - biologist. 19th - chemist (1/3 of elements discovered in Sweden in this century). 20th - physicists and at the end, computer scientist. He'd now like to say that the 21st century is the century of medicine. - Allyson Lister
HPR one of the largest projects in Sweden wrt funding, about 100 million euro so far - Oliver Hofmann
An impressive log-scale plot of number of bases sequenced since 1965. - Allyson Lister
developer of sequencing by synthesis via pyrosequencing in late 90s -- basis of 454 technology - Andrew Su
Personalized genomics ... 454 technology developed in our lab - Venkata P. Satagopam
Bioinformatics is the key in the new era of genomics. - Allyson Lister
95% of drugs (still) aimed at proteins - Oliver Hofmann
(Personal opinion: I like how it's not the "post-genomic era", but a new era of genomics :) ) - Allyson Lister
95% of drugs today target proteins. Thus, studying proteins is studying for the future - Diego M. Riaño-Pachón
Systems biology /omics is going to be fantastic in the next 10 years. - Allyson Lister
(That, or we find better ways of interfering with RNA) - Oliver Hofmann
Image of contradictory sign in Paris: you know where you want to go, but not how to get there. - Allyson Lister
We know where we want to go (characterize all proteins), but not sure how to get there (due to a lack of high-throughput methods) - Oliver Hofmann
The generation game - Nature july 7, 2007 - Venkata P. Satagopam
antibodies are the core tool for probing proteins - Andrew Su
but they are too cross-reactive - Diego M. Riaño-Pachón
Human antibody initiative HAI - Venkata P. Satagopam
human anitobdy initiative (HAI) -- uhlen, M. Synyder, P. Hudson -- generate comprehensive and validated antibody collection - Andrew Su
validation of commercially available antibodies - Venkata P. Satagopam
average success rate of commercial antibodies is 49% - Andrew Su
some companies 100% antibodies works fine, some companies 0%, in general 50% works fine - Venkata P. Satagopam
Antibodypedia -- a portal for validated anitbodies (we need to add a link from Gene Wiki...) - Andrew Su
From the website: "The antibodypedia is a community-based portal showing application-specific validation of publicly available antibodies to human protein targets. Each protein binder (antibody or other affinity reagent) has been scored in an application-specific manner into three main categories (supportive, uncertain and non-supportive)" - Oliver Hofmann
@Oliver - nice! - Allyson Lister
If you have 2 antibodies, you can compare results in various assay platforms so he wants to develop paired antibodies for every protein target. - Allyson Lister
Nat Methods 2008: High-througput method to identify epitopes - Oliver Hofmann
6 months ago published a paper Nature methods (december 2008) - Venkata P. Satagopam
(bummer, antibodypedia doesn't use mediawiki so can't assess current usage...) - Andrew Su
(I am entirely too short-sighted to read the author lists half of the time. Sigh) - Oliver Hofmann
HPR - The human proteome resource - Venkata P. Satagopam
(grumble grumble, antibodypedia creates YAI -- yet another identifier) - Andrew Su
(++ HPA -- uses ensembl gene IDs...) - Andrew Su
hpr is a multi-disciplinary program - Venkata P. Satagopam
(Proteinatlas seems to be using ENSG/ENSP/UniProt) - Oliver Hofmann
The gene factory does about 200 clones per week, and is in full production. - Allyson Lister
(HPA ids- mapped to uniprot also) - Venkata P. Satagopam
Close to 34.000 clones in the database - Oliver Hofmann
200 clones per week, 33,925 clones total (all human?) - Andrew Su
(@Andrew - indeed) - Allyson Lister
Open source (but in-house?) LIMS developed - Oliver Hofmann
(would be interesting to compare to origene collection of mammlian clones) - Andrew Su
The antigen design uses PRESTIGE, which is a bioinformatics approach to select antigens using the protein epitope signature tag (PrEST). - Allyson Lister
antigen design -- used PRESTIGE a bioinformatics approach to select antigen for antibody - Venkata P. Satagopam
read of this project is protein expression profiling - Venkata P. Satagopam
readouts -- immunohistochemistry (IHC) and IF (immunofluorescence) - Andrew Su
Organ, tissue, cellular and sub-cellular expression profiing on a protein basis - Oliver Hofmann
apply antibodies to tissue arrays (cancer focus, I think) - Andrew Su
(140+ human samples, around 200 tissues.. did someone catch the numbers?) - Oliver Hofmann
(Faq from the website: spatial distribution of proteins in 48 different normal tissues and 20 different cancer types as well as 47 different human cell line) - Oliver Hofmann
annotation of images taking place in Mumbai in India - Venkata P. Satagopam
image annotation -- difficult problem. automated anaylsis would be good, but now using indian pathologists for manual annotation - Andrew Su
$60 / 500 images for annotation? - Andrew Su
Confocal microscopy for subcellular localization, difficult to scale up to high-throughput - Oliver Hofmann
high-throughput subcellular localization in A0431 (squamous cell carcinoma), U-251MG (glioma), ??? - Andrew Su
(I think Bob Murphy talked on a similar project to map subcellular localization of proteins en masse...) (Oh, looks like it's a collaboration between the two...) - Andrew Su
They have a SVM that seems to be able to annotate 28 different parts of the cell. - Allyson Lister
2TB data each week (courtesy of 50.000 images in the same time) - Oliver Hofmann
2/3 of data come from in house data and 1/3 comes from different companies - Venkata P. Satagopam
(I wonder how the protein expression compares with our gene expression atlas http://www.ncbi.nlm.nih.gov/pubmed...) - Andrew Su
About 33% of the sample space done (6850 genes) - Oliver Hofmann
progress - started in 2005 , last week released version 5. 8,832 antibodies, covering 1/3 genes in uniprot - Venkata P. Satagopam
(And I suppose there's always more to do.. check for splice variants, truncated versions...) - Oliver Hofmann
Most recent release: 7 mln images. - Allyson Lister
(@Andrew: if there is overlap in the cell lines that could be an easy correlation analysis) - Oliver Hofmann
The next 5 yrs are also about getting the paired antibodies mentioned earlier. - Allyson Lister
all antibodies available to the public - Venkata P. Satagopam
(good good, HPA is already BioGPS plugin.. http://biogps.gnf.org/#goto=p... </shameless_plug>) - Andrew Su
central questions in proteomics - Venkata P. Satagopam
(Rodent atlas seems a bit redundant to Allen Brain Atlas? -- Oh, ABA is via in situ / RNA, this is protein. Again, would be interesting to compare...) - Andrew Su
how many proteins are expressed in a given cell? - Venkata P. Satagopam
how many protein are tissue specific? - Venkata P. Satagopam
@Oliver, probably not directly comparable by exact cell lines, but might be worth comparing by parental tissue. Need to wait until they allow downloading of data though... - Andrew Su
Ensembl "thinks" that the genes are up to 23,000, but UniProt "thinks" 20,000, but the number is probably with that (for genes coding for proteins). ("thinks" in scare quotes, as databases don't think - yet) - Allyson Lister
the size of human membrane proteome .. 5,514 human membrane proteins; covering 26% of protein-encoded genes - Venkata P. Satagopam
proteins expressed in normal cells - 6,800 antibodies towards (>25% of all protein encoding genes). 65 normal cell types (from 45 different tissue types) - Venkata P. Satagopam
70% of proteins expressed in a given cell, approx even distribution across # of cell lines (not what we observed on gene expression data, which had distinct peaks at tissue-specfiic and ubiquitous) - Andrew Su
80% of proteins expressed on average in cell lines (surprisingly high to me...) - Andrew Su
(@Andrew: protein selection might be biased towards the ones that are well expressed / had known anitbodies / ...) - Oliver Hofmann
9% of proteins cell type specific, 62% expressed across 3 different cell types - Oliver Hofmann
ubiquitous expression, but differing levels - Andrew Su
(interesting cytoscape visualizations of cell type / tissue specificity) - Andrew Su
In the Atlas: < 2% specific to a single cell type (84 proteins), well known ones like insulin - Oliver Hofmann
Includes a number of uncharacterized proteins with no known function - Oliver Hofmann
PROSPECTS: PROteomics SPECification in Time and Space - Allyson Lister
"Complementary technologies, including mass spectrometry, cryoelectron microscopy and cell imaging will be applied in innovative ways to capture transient protein complexes and the spatial and temporal dimensions of entire proteomes." - Oliver Hofmann
MCF-7 data with IHC, Mass spec - Oliver Hofmann
next generation seq of cDNAs from U2-OS human cell line 76% detected by mRNA seq - Venkata P. Satagopam
76% genes detected by next gen mRNA sequencing in U2-OS - Andrew Su
again -- mostly ubiquitous expression (now on mRNA level), but differing levels - Andrew Su
high fractions of all proteins expressed in human cells, tissues and organs - Venkata P. Satagopam
Lack of specificity not good knows for those looking for good antibody targets for therapeutic purposes - Oliver Hofmann
the quantity of proteins, rather than their presence /absence, is the key to cell identity - Diego M. Riaño-Pachón
few cell-specific proteins (<1%) and group-specific proteins (<10%) - Venkata P. Satagopam
find biomarkers for early detection of disease ... it is very good for human mankind - Venkata P. Satagopam
(Proteome 2008) Suspension bead arrays - Oliver Hofmann
mg/ml to pg/ml range (dynamic range of protein concentration in blood 10^12 ) - Oliver Hofmann
working on kidney disease in collaboration with astrazanica .... for detection of biomarkers - Venkata P. Satagopam
Developing 'next generation' plasma profiling, scale to one million assays / months - Oliver Hofmann
(has he mentioned availability of these antibodies? Commercially available? antibody-producing lines?) - Andrew Su
They're part of ENGAGE. - Allyson Lister
commercially available @Andrew - Allyson Lister
first draft of the human proteome by 2014 - Venkata P. Satagopam
Aim to have the draft version of the human proteome by... see above - Oliver Hofmann
(@Andrew I think via Prestige Antibodies (I remember the cute advertising slide). Does that mean advertising works?) - Allyson Lister
Nature, "The big ome" - 24 April 2008, editorial - Allyson Lister
tissue-specificity is achieved by precise regulation of protein levels in space and time - Venkata P. Satagopam
Prestige Antibodies through Sigma: http://www.sigmaaldrich.com/life-sc... - Andrew Su
science 26 sep 2008, vol 321 pages 1758-1761 - Venkata P. Satagopam
"Proteomics Ponders Prime Time", Science, 26 September 2008, in response to the Nature article - Allyson Lister
new lab in Stockholm coming soon, Science for Life Laboratory - Venkata P. Satagopam
New Science for Life laboratory being established, see http://www.newsdesk.se/pressro... - Oliver Hofmann
Q: importance of splice isoforms -- A: complexity that is currently not considered due to technical complexity (to be saved for second phase) - Andrew Su
Q: conclusions on tissue specificity have bias based on antibody availability? A: bias of commercial antibodies possible, but only 1/3 of data. Data they are generating based on walking down chromosomes (I think?), so don't expect bias... Also, some of ubiquitous expression is due to cross-reactivity. (first mention of this...) - Andrew Su
Q: perspective for gene therapy or antisense therapy, more generally non-protein based therapies. A: pharma shifting from small molecules to biologics (not sure about a "shift" rather than "expansion"). Gene therapy problem is getting into all relevant cells. Ubiquitous expression of proteins the root cause of side effects for protein-based targets, possibly... - Andrew Su
(only to get the blog on top of the ISCB portal site; the figures messed up our layout) - Reinhard Schneider
Other ways to read this feed:Feed readerFacebook