Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
ISMB/ECCB Stockholm 2009

ISMB/ECCB Stockholm 2009

17th Annual International Conference on Intelligent Systems for Molecular Biology & 8th European Conference on Computational Biology
The talk specific feeds will be created each day shortly before the start of the first presentation. Find talk specific blogs by searching here for the authors, the title of the talk or the talk identifier as given in the program (like HL03 for the 3rd Highlight paper) The feeds can also be accessed on the conference pages in the according sections: SIGs, Keynotes, Proceedings Track, Technology Track and Highlights and the last few blogs are shown on our web-portal page.

Happy blogging !!
PT45: Hamid Reza Chitsaz - A Partition Function Algorithm for Interacting Nucleic Acid Strands
First part presented by Raheleh Salari - Roland Krause
Increased interest in RNA-RNA interaction prediction requires computational target prediction. - Roland Krause
ncRNAs bind to mRNA and regulate translation, including the specificity etc. - Roland Krause
Several models have been described, e.g. PairFold, RNAhybrid, RNAup, IRIS, InteRNA. - Roland Krause
Problem np-complete - Roland Krause
All current approaches do not include the probability and stability of the joint secondary structure. - Roland Krause
Interaction energy model and interaction partition function over all SS without pseudoknots or crossing interactions and zigzags. - Roland Krause
The standard model by Matthews et al (1999) assumes an energy model with independence between hairpins, bulges etc. - Roland Krause
More than one loop-loop interaction in a real example, ignore interhybrid loop, other loops termed kissing loops. - Roland Krause
# that was a little fast - Roland Krause
A kissing loop is intramolecular loop that makes interaction with the other strand. - Roland Krause
Give the energy functions for the new structures for the interaction partition function. - Roland Krause
Second part given now by Hamid Chitsaz - Roland Krause
Main interest in the strength and stability of the RNA interaction, which is challenging because all structures must be accounted for exactly once. - Roland Krause
Make use of a dynamic programming approach using divide and conquer. Compute the partition function for bits of the structure. - Roland Krause
McCaskill's algorithm (1990) is introduced, describing a participation function for single strands. - Roland Krause
For two unpaired strands, McCaskill's algorithm can be used directly, - Roland Krause
98 cases need to be considered, not even in the paper but in the suppl. mat. - Roland Krause
# sounds like a complicated model, it's not completely presented - Roland Krause
Explanation for the no-zig zag assumption motivated by less running time, one arch describes them all. - Roland Krause
Can predict the equilibrium concentaion, the melting temperature and the UV absorption. - Roland Krause
Presented algorithm (piRNA) outperforms existing ones by an order of magnitude, experimental validation with Tm shows little deviation. - Roland Krause
Algorithm is available. - Roland Krause
Andrew Su
Special Session 5: Next Steps in eQTL Analysis: Gaining Insight at the Systems Level
Speakers: Ritsert Jansen, Andrew Su, Andreas Beyer, Rob Williams - Andrew Su
Ritsert Jansen - find xQTLs (eQTLs, pQTLs, mQTL, phQTL) - links both to genetic and epigenetic variation - Leopold Parts
e - expression; p - protein; m - metabolite; ph - phenotype - Leopold Parts
Keurentjes PNAS 2006 - prioritize potential regulators by the number of genes they are associated to - Leopold Parts
Fu (Nature Prot 2007) - find mQTLs and recover glusinolate pathway - Leopold Parts
[ coverage of Ritsert Jansen's talk is here, -- not in ISMB room by accident ] - Michael Kuhn
Jansen (Curr Opinion Plant Bio 2009) - integrate xQTLs; summary of results for integrating the 4 levels of association. Not all associations are shared across the levels, but some are. - Leopold Parts
Would like to infer causes, not just associations. - Leopold Parts
Can compare causal vs. pleiotropic models. - Leopold Parts
Problems - observed noise not biological; few causal associations compared to pleiotropic ones - Leopold Parts
Need large sample sizes to reliably infer causal relations. 30% variance explained by QTL for both traits required for 200 individuals. - Leopold Parts
Boerjan (Nat Genet 2009) - hotspots; few loci in genome with high number of associations for levels from transcripts to phenotype - Leopold Parts
Johannes (Nat Rev Gen 2008, PLOS Genetics 2009) - chromatin state uncorrelated to DNA sequence correlated to traits? Large part of heritable variability explained by epigenetic variants. - Leopold Parts
@Michael - thanks; any chance to move these? Anyway, next ones will be in more appropriate place :) - Leopold Parts
@Leopold: Unfortunately not, I wish this was easier. I'll make a post for the next talk and link it here (this time in the ISMB room!) - Michael Kuhn
Michael Kuhn
eQTL special session: Andreas Beyer
future will be data integration: eQTL plus other information - Michael Kuhn
Suthram et al, 2008 Mol Sys Biol - Michael Kuhn
eQTL electric diagrams: simulate flow as electric current - Michael Kuhn
need to improve eQTL methods, switch to multi-parametric methods - Michael Kuhn
random forest eQTL mapping, as opposed to traditional regression methods - Michael Kuhn
eQTL should rather be a classification problem - Michael Kuhn
people are worried if they have enough power (individuals) for multivariate regression - Michael Kuhn
Regression trees on genotypes as a multivariate approach - Leopold Parts
random forests (RF) are based on decision trees, search for marker that separates pop. into more homogeneous subgroups - Michael Kuhn
split subgroups again based on different markers, groups become smaller and more homogenous - Michael Kuhn
randomly pick splits, get forest of decision trees: thus random forest - Michael Kuhn
importance measures for markers: increase in variance when marker is randomized; reduction of RSS when used; number of times marker is used - Michael Kuhn
(last one is simpler, but less established) - Michael Kuhn
should work on real data, rather than simulations - Michael Kuhn
benchmark on known pathways to identify regulators and regulated genes - Michael Kuhn
KEGG enrichment in yeast: RF methods and lasso outperform univariate methods - Michael Kuhn
Lasso 2nd best, beating two of three RFs - Leopold Parts
RF1 > Lasso > RF2 > RF3 > univariate methods (in yeast) - Leopold Parts
similar ranking in mouse data (although more noisy overall) - Michael Kuhn
The different RF approaches best in mouse; marginal differences - Leopold Parts
RF will fall back to univariate if power is sufficient for only one marker (by giving back confidence values) - Michael Kuhn
(metric for real data tests - fraction of locus-gene pairs in same pathway) - Leopold Parts
have 2 different mouse populations BXD, MDP [see Andrew Su's talk]. can data from one population inform studies on the other? - Michael Kuhn
find consistency much above random between the two strains - Michael Kuhn
most agreement on cis-band between the two populations, but also some interesting trans bands - Michael Kuhn
have consistent eQTLs, want to finemap - Leopold Parts
want to use MDP data for fine-mapping BXD. BXD is more homogeneous, thus less markers - Michael Kuhn
MDP data: more markers! - Michael Kuhn
but also lots of noise... - Michael Kuhn
find a gene in the same pathway next to 2 high markers - Michael Kuhn
again, validate on pathway membership - Michael Kuhn
also combining eQTLs called with different methodology (RF vs another test) - Leopold Parts
MDP data significantly boosts hit rate: increasing from 15% to 30% - Michael Kuhn
finding genetic interactions with RF - Michael Kuhn
want to know if two significant markers are interacting or just have independent marginal effects - Michael Kuhn
look at split symmetry: significant markers A and B. After split with A, is B a marker in both of the sub-populations or just in one? - Michael Kuhn
look for conditional effects by comparing the relative levels at which two markers were selected in the tree - Leopold Parts
if both sub-populations: no interaction, if one: probably interaction - Michael Kuhn
can create interaction graph - Michael Kuhn
find new regulations for gene Auts2, but know little about the gene - Michael Kuhn
find 4 genes which interact with Auts2 - Michael Kuhn
multiple testing issue? will find these effects in a random dataset.. - Leopold Parts
take home: do second level analysis after multivariate mapping; finding epistatic interactions might be possible even with small pop. sizes if you have high quality data - Michael Kuhn
Confirmation of results on top of simulation studies - consistency in pathways, categories as a quality metric - Leopold Parts
HL46: Li Wang - Prioritizing functional modules mediating genetic perturbations and phenotypic effects
validation: ortholog lethal ration: the proportion of yeast genes whose orthologs in another species, specifically, C. elegans, are lethal - Ruchira S. Datta
hypothesis about ortholog lethal ratio: nonlethal genes in lethal complexes > nonlethal genes in nonlethal complexes - Ruchira S. Datta
hypergeometric enrichment test was not significantly different, but the Bayesian network model was validated. they think this is because the hypergeometric test produced many false positives - Ruchira S. Datta
gene lethality is more conserved at the module level - Ruchira S. Datta
use module-based mapping to transfer phenotypic mapping across species, rather than gene-based - Ruchira S. Datta
PT43: Tobias Marschall - Efficient Exact Motif Discovery
Unsupervised motif discovery on a string with no previous knowledege in an automated fashion. - Roland Krause
issues: How to measure over-representation and how to find them. - Roland Krause
184 publications in pubmed for "motif discovery algorithm" - Roland Krause
aim to establish an (almost) exact method based on a rigorous motif statistics - Mikhail Spivakov
given: query text, IUPAC motifs, random text model (background) - for now, iid - Mikhail Spivakov
Calculating a p-value of a given query text, a IUPAC motif and a random text model. - Roland Krause
want: a p-value for a motif - Mikhail Spivakov
use a novel device called probabilistic arithmetic automata - won't go into details - Mikhail Spivakov
Need to compute the distribution of occurrences by chance. Not a straight forward task, recently proposed a new approach by building a probabilistic arithmetic automata. - Roland Krause
an exact calculation - Mikhail Spivakov
The problem is that computing p-values is infeasible due to large number of motifs. - Roland Krause
matches occur in clumps. use compound possion approximation (almost exact) to calculate exact distribution of clump sizes. approximate number of clumps by Poisson distribution - Mikhail Spivakov
Use of a Compound Poisson Approximation on a set of clumps (sets of overlapping motifs) - Roland Krause
clump = overlapping occurences - Marcel Martin
The clump size can be used the probabilistic atutomata. - Roland Krause
nice: clumP SIze is abbreviated with an uppercase psi - Marcel Martin
How to bound the p-value to prune the search space? - Roland Krause
bound no. of occurences using the no. of clumps - Marcel Martin
p-value: the probability of observing >k occurences (when found k in the real data) - Mikhail Spivakov
Motifs with the same composition have the same expectation. - Roland Krause
iterate over all possible compositions, not over the motifs themselves to take advantage of the same expectation - Marcel Martin
but iid model for DNA isn't very appropriate - Mikhail Spivakov
Use of a suffix tree of the sequence, iterate over the motifs, use the lower bound for pruning, walk the tree and identify overrepresented motifs. - Roland Krause
so re-evaluate the motifs producing a good p-value with iid on a Markovian text model - Mikhail Spivakov
designing a good benchmark set is hard - Marcel Martin
other tools: Weeder, MEME - Marcel Martin
Benchmark sets are not easy, used a set by Sandve et al. - Roland Krause
Outperforms Weeder and MEME at the cost of higher running time of ~12 hours. - Roland Krause
algorithm is not as fast as the other tools. is easily parallelizable - Marcel Martin
with a 4.4 Mbp genome of M.tuberculosis, found motifs in ~250 CPU hrs in a parallelized setting - Mikhail Spivakov
best motif: AGACSCARAA (or sth like that), found in literature - Marcel Martin
computationally demanding, but possible with modern computers - Mikhail Spivakov
List of models, the first is described, others are under investigation. - Roland Krause
in the future: use modern hardware (eg GPUs) - Mikhail Spivakov
optimise wrt Markovian models directly (rather than iid) - Mikhail Spivakov
Future work could incorporate Markovian models directly or use phylogenetic information. - Roland Krause
Q. Is the implementation available? A: Given in the paper. - Roland Krause
question: is the tool available? yes, URL in the paper - Marcel Martin
you have to take into account overlapping motifs for doing proper statistics - Marcel Martin
Q. Are the data in Jasper or Transfac? A. Had an expert looking at it. - Roland Krause
# Jasper and Transfac do not really cover Mycobacterium motifs - Roland Krause
Q: Performance of the algorithm on short motifs. A. Length 10 is the upper bound for the algorithm which is quite dependent on the length. - Roland Krause
Q: applying to protein models? A: problematic because alphabet is larger and indels would need to be modelled - Marcel Martin
Q: how is the iid text model? how do you justify that the text fulfills the model? A: the iid model is estimated from the text. dependencies between characters are incorporated by using the Markovian model - Marcel Martin
Q. (Marcel Schulz) Differences in Markov models of different orders. A. Shorter orders give spurious results. - Roland Krause
Q: why only a part of the motif space? A: tried to come up with a plausible set that includes most motifs - Marcel Martin
TT40: Franck Tanoh - BioCatalogue: A Curated Web Service Registry for the Life Science Community
A catalogue where to register web services. REST API, open source codebase. - Gabriele Sales
Estimate of public web services in Life Science: 3000+ - Gabriele Sales
People who have an interest in such services: users, developers, service providers, and tool developers - Allyson Lister
What BioCatalog adds: free text searches, tags (controlled vocabulary), automatic monitoring and testing. - Gabriele Sales
You can bookmark lots of services, even without signing up - Allyson Lister
Each web service is associated to categories. - Gabriele Sales
An icon captures the state of the service (online, not responding). - Gabriele Sales
Can refine searches (aka "search within results") - Gabriele Sales
lists licensing / costs as well - Allyson Lister
The input and output of the services have their own description. - Allyson Lister
Demo of service description submission. - Gabriele Sales
Supported service types: SOAP, SOAPLAB Server, REST. - Gabriele Sales
Blog post: (This one isn't as long as usual - a lot of the talk was a demo where I didn't take as many notes) - Allyson Lister
HL44: Luis de Figueiredo - Benchmarking tools in Metabolic Pathway Analysis
how to represent a metabolic system? it's a thermodynamically open system, we need to draw a boundary around it - Ruchira S. Datta
we approximate that the system reaches a steady state - Ruchira S. Datta
certain metabolites form a pooll and are removed from the representation of the steady state - Ruchira S. Datta
Metabolic system can be represented as stoichiometric matrix - Venkata P. Satagopam
get the stoichiometric matrix; the solution set is a polyhedral convex cone. similarly if using other constrained representations - Ruchira S. Datta
Elementary Flux Mode: minimal set of enzymes that can operate at steady state - Ruchira S. Datta
1st bench mark system - conversion of fatty acids to carbohydrate - Venkata P. Satagopam
how to convert fatty acids to carbohydrate? Weinman et al 1957 showed a certain pathway is not possible - Ruchira S. Datta
model used by Weinman et al in 1957 - Venkata P. Satagopam
Conversion of AcCoa into G6P - Humans - Venkata P. Satagopam
use EFMs to show how conversion can take place - Ruchira S. Datta
2nd model ....explain ... Nucleotide metabolism in human red blood cells . - Venkata P. Satagopam
nucleotide metabolism i human red blood cells, lose nucleotides to hypoxanthane. Is this loss reversible? - Ruchira S. Datta
de Figueiredo et al (2009) Bioinformatics - Ruchira S. Datta
have 3 metabolic models relevant in biochemistry that can be used for benchmarking metabolic pathway analysis tools - Ruchira S. Datta
find EFMs of interest, K-shortest EFMs; again, see - Ruchira S. Datta
tools that just treat the pathways as a graph are only the first step, but are missing the chemical information - Ruchira S. Datta
this is not the same as Petri nets - Ruchira S. Datta
PT44: Jacob Joseph - Family Classification Without Domain Chaining
HL42: Curtis Huttenhower - Exploring the human genome with functional maps
HEFalMp: data integration for human genomic data. - Gabriele Sales
Integration of genomic data and prior knowledge using bayesian integration. - Gabriele Sales
Each resulting network is based on a specific functional context. - Gabriele Sales
Input data is huge: 10,000s of experiments, billions of data points. - Gabriele Sales
But also very sparse. - Gabriele Sales
Our prior knowledge comes from GeneOntology - Venkata P. Satagopam
Human case sources: genomic data (interactions, microarrays, sequences); prior knowledge (229 biological processes from GO and KEGG). - Gabriele Sales
There is so much information about humans that the independence assumption at the base of naive bayesian classification is violated. Used mutual information for regularization. - Gabriele Sales
start with predicted gene or gene product interaction network - Ruchira S. Datta
Functional mapping methodology. Start from gene interaction network. - Gabriele Sales
each edge is an integration of hundreds of datasets, going from low to high confidence - Ruchira S. Datta
Average strengh of relationships defines associations. - Gabriele Sales
take subnetworks associated with one process and another, use strengths of edges to quantify how related the two processes are - Ruchira S. Datta
Four measures of functional associations between two gene sets: edges between genes; edges within each set; background edges incident to each set; all edges in the network. - Gabriele Sales
any tow sets of genes 4 measures 1. edges between their genes 2. edges within each set, 3. the background edges incident to each set 4. the baseline of all edges in the network - Venkata P. Satagopam
any sets of genes G1 and G2 can be compared 4 ways: 1. edges between their genes, 2. edges within each set (e.g., BRCA is well-studied, so easy to get high confidence; want to normalize this out). 3. the background edges incident to each set, and 4. the baseline of all edges in the network (i.e., the baseline of the average confidence will vary from place to place within the network) - Ruchira S. Datta
Such measures are combined into a single score. - Gabriele Sales
normalize wrt the background, like TFIDF - Ruchira S. Datta
get bootstrap p-values wrt the null hypothesis: random gene sets - Ruchira S. Datta
null distribution is approximately normal with mean 1 - Ruchira S. Datta
P-value bootstrapping: null distribution approximately normal. - Gabriele Sales
this lets us convert a Functional Association score into a p-value - Ruchira S. Datta
massive data sets and genomes requires efficient algorithms and implementations. - Venkata P. Satagopam
C++ library for computational functional genomics (open source, fully documented). - Gabriele Sales
Efficient computation for biological discovery - Venkata P. Satagopam
also includes parallelization; massively speeds up - Ruchira S. Datta
HEFalMp: predicting human gene function - Ruchira S. Datta
HEFalMp Predicting human gene functions - Venkata P. Satagopam
see many predictions, zooming in allows to view the specific data from which the prediction is derived - Ruchira S. Datta
to understand human diseases - Venkata P. Satagopam
Validation: autophagy. - Gabriele Sales
have validated human predictions; missed one for tissue specific - Ruchira S. Datta
have interactive interface to the data, using Grapple - Ruchira S. Datta
current work more species and more interactions - Venkata P. Satagopam
need postdocs - Venkata P. Satagopam
# Can only recommend working with and for Curtis :) - Oliver Hofmann
PT41: Shai Lubliner - Modeling Interactions between Adjacent Nucleosomes Improves Genome-wide Predictions of Nucleosome Occupancy
Nucleosome position affects transcriptional regulation, models for position are required for understanding of TR. - Roland Krause
75-90% of the DNA is associated with nucleosomes, play an important regulatory role. - Oliver Hofmann
Use a recently published model to produce the affinity landscape, showing the suitability of nucleosome occupancy. - Roland Krause
Genomic nuceosome affinity landscape based on a thermodynamical model - Oliver Hofmann
The model is probably published here ( - Roland Krause
Can interactions between nucleosomes introduced into the model? - Roland Krause
Two kinds of interactions: direct or bridged by transcription factors or other proteins. - Roland Krause
Additional interactions are important for chromatin organization. DNA bending proteins, TF, histone modifications, etc. - Oliver Hofmann
First approach focuses on direct interaction, including the distance between the nucleosomes. - Roland Krause
Trying to capture interactions between adjacent nucleosomes (cooperative effects) - Oliver Hofmann
What types of functions can represent cooperativity types? - Roland Krause
Notable features include a right shifted main peak, followed by exponential decay. - Roland Krause
Tested Exponential, Stepped Exp, S-ES, Stepping-Off and Stepping-On - Roland Krause
Sampled 5000 configurations and compared it to the linker length preference. - Roland Krause
Can the model properly learned? - Roland Krause
Repeat with data samples, add noise and try to fit different models to the sampled occupancy landscape - Oliver Hofmann
Used synthetic data, including experimental noise, then forgot the algorithm to generate the data and used the previously described models. - Roland Krause
Exponential function shows a good correlation in five-fold cross-validation, with the model incorporating the interaction outperforming the ones without. - Roland Krause
Are the interactions relevant in vivo and in vitro? - Roland Krause
For an in vitro validation the Exp, Step functions work better than the no cooperativity model - Oliver Hofmann
Short linker lengths are favored. - Roland Krause
Interactions are also important using in vivo data, data from several chromosomes and different organisms (yeast and C. elegans) show similar results. - Roland Krause
The interesting part -- biological basis for this preference? - Oliver Hofmann
The in vitro system consists of nucleosomes and DNA only. - Roland Krause
shorter length may allow for interaction of nucleosomes, energetically favoring their shift from otherwise better binding positions - Oliver Hofmann
Hypothesis: electrostatic interaction between nucleosomes, which has been described previously but not shown in the data. - Roland Krause
# Surprised that the very model was not presented in the talk. - Roland Krause
# Wonder whether there is a difference between promoters, other nucleosome covered regions (or other genomic subsets); with the role nucleosome displacement plays in regulation I'd expect some shift - Oliver Hofmann
Q: When you started motivating the model, there was a peak at 10 but the sample data did not have that.[?] A: These are effects of histone H1, which are not modeled. There are a lot of question marks about the role of H1 in yeast. The interaction that we modeled are global and can occur anywhere on the DNA. More complex model would incorporate additional local effects, e.g. other factors. - Roland Krause
Q. Have you finished the model for the higher eukaryotes: A The model is independent of the organism. There might be differences, although this is debated. Differnces between yeast and human are known. - Roland Krause
Roland Krause
Birds of a Feather session: Beyond microblogging, Wednesday, 1pm confirmed.
Thanks for setting this up, @Roland! - Allyson Lister
Some thoughts I've had: Pre-annotation of papers and posters prior to the meeting, as is done by David Shotton in "Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article" at . Don't think it's automated, but you get the idea - and David Shotton gave a talk here this week ( ; and also Reflect ( - Allyson Lister
Also, how should scientists start with microblogging and social networking? - Allyson Lister
Let's meet for the 4.15 coffee break at the coffee bar in front of K1, @Allyson and other interested people. If you have ideas, it would be good to prepare them. - Roland Krause
sounds good :) - Allyson Lister might also be of interest (Utopia ) - Allyson Lister
I COMPLETELY forgot about the meetup at coffee today! I'm sooo sorry! Was anything decided? Can you put up any summary points that might be of interest tomorrow? My apologies! - Allyson Lister
@Allyson: Wednesday. It's Tuesday :o) - Oliver Hofmann
@Oliver - I missed the pre-BOF meetup today at 4:15 at coffee - see above... :( - Allyson Lister
Oh. I did not even SEE that comment. Sigh. - Oliver Hofmann
No problem, we can meet over nibbles at the reception now. I can tell you what I have in mind. - Roland Krause
@Roland - I'm afraid I'm not available this evening. Morning? Or just write your ideas here? I'm a dork - sorry! :) - Allyson Lister
This should not take long at all, but please leave your ideas here. This is going to be an interactive session anyway, so we don't have to be all that prepared. My slides are more of a moderation prop than a standalone presentation. - Roland Krause
I would like to have some meta-discussion: why are we live blogging? How is this useful and how can we make it more useful? Does it mainly benefit those not attending the meeting or is it also useful as an archive of what has been said? - Michael Kuhn from iPod
Which room the BoF takes place? - Peter Menzel
In addition to what Michael Kuhn (@biocs) said, meeting microblogging can also benefit the presenter - giving them valuable feedback. Also links posted to papers, websites, etc. are great, to easily view more info about what someone is/was talking about. - Fiona Brinkman
I'll miss this, due to my time difference (Can. west coast) but I look forward to all your thoughts and thank you for setting this BoF up! - Fiona Brinkman
@Fiona - we'll make sure to post (maybe microblog!) the BoF :) Thanks for your input - benefits to the presenter is a really good point. - Allyson Lister
Last year at ISMB, even though I was already on FF I took notes as I habitually do, with calligraphic fountain pen. During the course of the intervening year I would find myself trying to refer back to those notes while in conversation with others, but had difficulty finding what I wanted in time. - Ruchira S. Datta
Paradoxically, I perceive myself as being slower when taking notes electronically than by hand (I think I tend to spend more time formulating them). But since we're doing this cooperatively, if one of us doesn't catch something another will. The notes will be searchable any time on the web; I won't have to dig up a particular file on a particular computer. The same effort provides the same resource for others. - Ruchira S. Datta
I did revert to the fountain pen at various points this year though! It's more reliable than depending on the availability of power, wireless, and FriendFeed. - Ruchira S. Datta
@Peter - the ISMB website tells me we will be in C8 for this session - Allyson Lister
The BioSysBio 2009 conference report ( just came out (disclaimer, I'm an author). Simon Cockell had a wordle used (, and FF and Twitter get a mention, if a short one: "A new feature for BioSysBio 2009 to extend participation in the conference was to a wider audience by communicating live... more... - Allyson Lister
Roland, where is the BoF going to be? - Michael Kuhn
The location is still given as C8. - Roland Krause
Set up an Etherpad for collaborative note taking: - Oliver Hofmann
I will be there from 12:40 onwards, if you have left suggestions here, please be there a little earlier for a quick roundup. - Roland Krause
Getting started. Mini-overview of past/current/future - Oliver Hofmann
Disclaimer for presentations -- what is allowed and what isn't (mail, blogging, film...) - Oliver Hofmann
Reference to Cameron Neylon's logos for presenters: - Allyson Lister
A need to make presenters aware of what might be distributed how - Oliver Hofmann
Reference to CSHL and recent "controversy": - Allyson Lister
Conference blogging ideas from nature: - Allyson Lister
Brief discussion of CSHL controversy - Oliver Hofmann
Journalists and bloggers should abide by the same rules at a conference - whatever those rules are - Allyson Lister
Classical blog posts, twitter and friendfeed: surprising how effective friendfeed is given it's basic nature - Oliver Hofmann
Find it quite difficult to make sense out of Twitter (best - wordle aka Simon Cockell?) - Allyson Lister
Oliver says FF isn't perfect, but it does have a very low barrier to access and start using - Allyson Lister
Where do personal opinions come into it? In general at ISMB we are all being polite. What level is acceptable? - Allyson Lister
Roland: the discussion culture in science has changed a litle. He read that 50 years ago they were having much more open discussions - Allyson Lister
Maybe people might be away from the conference and following remotely, or at the conference and shy - such people could pose questions on friendfeed, and then chair (who is monitoring FF) could ask them at the end of the talk - Allyson Lister
Differences to last year: increased depth of coverage (due to ISCB support, raised awareness?) - Oliver Hofmann
We all thanked the organisers from this year - Allyson Lister
ISMB people (Reinhard?) will do some basic stats of the coverage of this conference - Allyson Lister
Many of us bloggers feel that the first to benefit are ourselves - Allyson Lister
First person to benefit from blogging: the blogger - Oliver Hofmann
Live blogging especially useful for keynotes - for recording the presenter's thoughts.. and not always an accompanying paper available.. ? - Peter Menzel
Andrew: heavily blogged talks are more useful for external people than delegates - Allyson Lister
liveblogging is complementary to webcast etc as it takes less time - if you like the liveblogging and the blog posts after scanning, then you go and look at webcast - Allyson Lister
Also, people liveblogging tends to pick out the key points quickly - Allyson Lister
webcasting also has many time and technical issues - Allyson Lister
oliver: Perhaps much of the commenting is too sober. Not enough meta-information, e.g. "here is X's main competitor's paper..." - Allyson Lister
it would be nice to have more background - yes - Jim Procter
Roland: can you have two streams - one light, one serious. Allyson: but it would be difficult in practice, if you have to look at two different things - Allyson Lister
Three levels: factual information/transcript. notes to self, public commentary / context - Oliver Hofmann
Feeds are important feedback for the presenter ? - Peter Menzel
technical issues: there can be issues with internet / wireless - Allyson Lister
(Like right now?) - Oliver Hofmann
@Oliver - yep! Back now, though :) - Allyson Lister
Should we have TLAs or abbrevs to indicate personal comments or uncertainty? Maybe not TLAs (makes barrier for newbies) - how about single symbol at the beginning, e.g. '#' for personal comments - Allyson Lister
We should write up a boilerplate: what it means to liveblog; if you don't want it, then... (e.g. logo from Cameron) ;why scientists should liveblog; *suggest* that your FF account should link to something that identifies you; you might want to consider only posting things that you would be happy saying to people's face. - Allyson Lister
no legislation, though - don't want to force people to do anything - Allyson Lister
ISMB will certainly do it next year - Allyson Lister
imho: Official Facebook group etc. should be more advertised.. - Peter Menzel
@Peter - I agree - the embedded FF threads within the ISMB site were advertised, but more should have been made of them - it's quite cool to see! - Allyson Lister
Some great points made here - thank you! - Fiona Brinkman
see here for the Google Doc to make a write-up out of this discussion: - Michael Kuhn
Jim Procter
BOF: VizBi - Visualization of Biological Data 1pm Wednesday K1
we have had two questions and calls from the floor so far : - Jim Procter
13-14 people interested in genomes and sequence visualization - Jim Procter
9 interested in alignments and phylogenies - Jim Procter
systems biology (highthrouput (omix) and pathways) - *everyone is interested in this session* - Jim Procter
Will there be any discussion on types of indexes and containers for visualizing very large datasets? For example ways to very quickly grab the features in a small genomic region for display in genome browser? - Jan Aerts
"Jan- I'll ask your question in a second - we'll go through the sessions once again - Jim Procter
LM session description- visualization of final processed data either still images or trajectories - ~ 6 people - seems less than expected - Jim Procter
3-4 people interested in mri - Jim Procter
and around 30 people have raised their hands more than once. - Jim Procter
What about visualization of features that have more than 1 position on a genome? E.g. readpairs. => circular genome browser? - Jan Aerts
@Jan - we hope to cover that as well. I also hoped todeal with genome assembly visualization- but we couldn't find anyone willing to review it (I don't actually beleive this myself!) - Jim Procter
usability keynote yet to be determined (we have invited someone and await their reply)- but we aim to discuss how best to create usable biological software - Jim Procter
@Jim: regarding the circular browser: something like Circos, but for browsing rather than just presentation; not static. - Jan Aerts
@Jan - yes. interactive and analytical visualization is defintely the focus - Jim Procter
@Jan - sorry I haven't raised your question yet - do you want to ask yourself at the appropriate moment ? - Jim Procter
we have over 70 people registered at - and we will be opening registration soon. - Jim Procter
question about how do we visualize textual semantic relationships - this is still a challenge in information visualization - Jim Procter
sean explains that we really had to rationalise the sessions to make things strongly focused (as a result, his tools - as an example - may not be reviewed in the specific talks) - Jim Procter
@Jim - I'm still in Hinxton, UK.... - Jan Aerts
Can someone also ask if people would see value in a biohackathon focussed on visualization? Not a meeting, but actually getting things done? - Jan Aerts
@Jan - ah - ok. Sean has really answered your question implicitly. We really wanted to get the *interactive* tools covered - so a biologist would have an easy way of discovering the best tools for each type of focus. - Jim Procter
@Jan - I will. I can also raise this via OBF - Jim Procter
suggestion of having a bioinformatics web and application usability interest group that can provide guidelines on usability for our applications - Jim Procter
often all we need it a place to start for this.. not a complete service - Jim Procter
userbase/tool disconnect - often tools are developed by the person who originiated the database/method/etc without any idea of what the most productive user is actually going to want to do - and often without any real general knowledge of how the rational biologist works - Jim Procter
Am working on a genome viewer that looks like Circos, but is completely interactive (i.e. zooming in/out of loci). Bumping into both design and technical issues, though. - Jan Aerts
@Jan - can you contact Cydney Nielsen ? make sure she knows about your tool - we are also setting up WIkipedia pages (see - Jim Procter
sean -> says user interface designer must be completely in the mindset of the end user - Jim Procter
floor: new instruments drive new tools to visualize data - and there are several levels on how users need to be enabled to visualize - Jim Procter
suggest there is a lack of cognitive science and psychology in this meeting - Jim Procter
poster deconstruction: images with sharp and bright colours are great for 12 year olds but not necessarily intuitively informative about the underlying information. - Jim Procter
i.e. there could be standard coloursets for types of data (minimum numbers of colours) - Jim Procter
NIls follows on - there are a number of visualization fields where this has been an issue and very reliable guidelines have been developed to deal with colour confusion - Jim Procter
this conference is a first step - we hope that there will be a much stronger accent on pactical visualization theory and attract some of the experts from the more general visualization community - Jim Procter
floor: its often about finding out what you want to ask of some data, and then finding a visualization that can give them an answer. - Jim Procter
my response: this used to be the aim of the visualization toolkits but these are often not quite appropriate for bioinformaitcs (scaling, deployment, licencing) - Jim Procter
floor: very important to actually watch the user to see how they use your application - great understanding about how you can improve it - Jim Procter
floor#2 point: pure html web applications - and a 2d page is not sufficient for completely integrated visualization - Jim Procter
sean considers its too early to start defining standards (I feel standards are actually defined elsewhere, we just need to start to see how they can apply to us) - Jim Procter
Do people think we might be able to end up with some "design patterns" such as in software development (once the field has matured a bit more)? - Jan Aerts
Jan et al - I'm moving this discussion to its own friendfeed room - Jim Procter
ah. possibly. I defintely think there are analogous patterns in user interface design that map to the kinds of cognitive processes that a biologist operates - Jim Procter from iPhone
Nils is talking about this - systems biology visualization is all about abstract information models - networks, matrices, multidimensional time series, etc - these have all got standard (and new/emerging) models for their visualization - Jim Procter from iPhone
sean discusses occams razor - don't create a million categories - Jim Procter from iPhone
around 10 or so are interested in contributing to the wiki pages - Jim Procter from iPhone
how do we stop the wikipedia list being 'just a list of biological tools' - Jim Procter from iPhone
sean comments that there are many ways of cutting a slice out of this area - we've started at the very simplest and simply list tools with some indication of the types of methods they are relevant to and types of visualuization that they accomplish - Jim Procter from iPhone
developer comment:visualization tools require significant R&D on behalf of the developer - Jim Procter from iPhone
how to link ecological data with molecular data - Jim Procter from iPhone
sean reveals the hidden agenda - enabling different types of visualization tool to be easily combined. Wehope that bringing the developers of these tools together might happen - Jim Procter from iPhone
practically everyone here is a developer, around ten or so are users, and a scattering ofpeople came because they were just interest - Jim Procter from iPhone
offline now - please continue here or at the other friendfeed room : - Jim Procter from iPhone
Allyson Lister
Special Session 4: PLoS: Panel Session with previous speakers plus Philip Bourne, Editor-In-Chief PLoS CB
AA: the problem is reward. The reward for publishing a model as SBML is zero, or even negative right now. Until it stops preventing us from doing science, you won't see as much an uptake as you should. Even now, it's somewhat limited in what it can capture. - Allyson Lister
AA: even if he forced his lab to do MIAME stuff, they'd find ways to circumvent it. Need to try harder to have a reward. Linux worked because there is a dictatorial and charismatic leader at the center that started out with a core group - Allyson Lister
PB: We should start having DOIs and PubMed IDs to things that are a little less than traditional. - Allyson Lister
PB: Very few people sit and look through an entire journal anymore: they go to specific papers. So, you should be able to navigate between papers, abstracts and other components of the paper. - Allyson Lister
HL40: Gary Hon - Histone modifications at human enhancers reflect global cell-type-specific gene expression
Different cells are different because they express different genes. - Cass Johnston
Gene expression profiles from microarray studies can be used to identify cell types - Cass Johnston
Regulation of transcription still not really understood. Coding regions are well studied, but most of the genome is non-coding. Presumably these regions contain regulatory elements. - Cass Johnston
Some regulatory elements are known. Promoters, Enhancers, Insulators. - Cass Johnston
Enhancers are the least understood elements in transcriptional regulation. - Gabriele Sales
How to find enhancers genome wide? - Cass Johnston
Enhancers can be found by: sequence elements (>2500 TF motifs). - Gabriele Sales
sequence based searches? >2500 transcription factor motifs... - Cass Johnston
can look at motifs, but they are highly degenerate - Mikhail Spivakov
But such motives are short and highly degenerate. - Gabriele Sales
still need to look at in vivo data, since enhancers are often tissue-specific - Mikhail Spivakov
look at the "epigenetic code": histone modifications - Mikhail Spivakov
Chromatin modifications have been associated with activation and repression. - Gabriele Sales
previous studies: chromatin signatures of promoters - Mikhail Spivakov
Histon modifications do mark enhancers? - Gabriele Sales
ChIP-chip - Cass Johnston
are there signatures of enhancers that are specific to their function? - Mikhail Spivakov
Previous study: Schubeler, Nature Genetics, 2007. - Gabriele Sales
use chip-chip data, focus on ENCODE regions - 1% of the human genome - Mikhail Spivakov
(Schübeler Nat Gen 2007) - Mikhail Spivakov
H3K4me1 high at enhancers but not promoters. H3K4me3 is high at promoters but not enhancers - Cass Johnston
Study of histone modification for 414 promoters. - Gabriele Sales
ChIP-chipped 5 cell lines for K4Me1, K4Me3, K27ac - Leopold Parts
Epigenetic dynamics of promoters, insulators and enhancers? How do they change between cell types? Which play biggest roll in determining cell type differences? - Cass Johnston
CTCF insulator binding sites are invariant between cell types - Cass Johnston
chromatin signatures at promoters and insulators (CTCF-bound regions) are invariant - Mikhail Spivakov
in general, enhancer histone modifications are cell-type specific - Leopold Parts
while the profiles at enhancers are tissue-type specific - Mikhail Spivakov
Using this signature, prediction of 36589 enhancers genome-wide. - Gabriele Sales
moved to genome-wide analysis of H3K4me and H3K27ac + DNA I hypersensitivity and co-activators p300 and MED1 - predicted a large number of enhancers - Mikhail Spivakov
Validation: recovery of well-studied enhancers. - Gabriele Sales
7/9 of cloned predicted enhancers showed activity - Mikhail Spivakov
Found many motifs specifically enriched in enhancers - Cass Johnston
Enhancers are enriched near HeLa-specific genes. - Gabriele Sales
enhancers are enriched at cell-type-specific genes - Mikhail Spivakov
(HeLa vs K562) - Mikhail Spivakov
is the same true in "real" cells (as opposed to immortalized cell lines) - Mikhail Spivakov
look at ES cells vs a mixture of ES-derived differentiated cells (on treatment with BMP4) - Mikhail Spivakov
H3k27ac is also a mark of enhancer activity - Cass Johnston
H3K4me3, H3K27me3 depleted at enhancers H3K4me1 and H3K27ac are enriched - Mikhail Spivakov
poised enhancers contain only H3K4me1 and gain H3K27ac on activation - Mikhail Spivakov
Conservation only slightly above random for found enhancers - Leopold Parts
HL41: Roland Dunbrack - Comparative analysis of crystal interfaces of homologous proteins
two papers: Xu JMB 381, 487-507 2008: , missed 2nd one - Michael Kuhn
homooligomers: 45-60% of all proteins in the PDB are homolooligomers - Michael Kuhn
is a protein an oligomer in solution? - Michael Kuhn
can determine using gel filtration, analytical ultracentrification, cross-linking + SDS-PAGE - Michael Kuhn
sources of information: PDB/PQS/PISA - Michael Kuhn
need to distinguish between asymmetric and biological units - Michael Kuhn
Xu, Canutescu, Dunbrack, Bioinformatics, 2006: look at overlap of the 3 info sources - Michael Kuhn
66% total intersection - Michael Kuhn
see different annotations for the same protein family - Michael Kuhn
shows an example where a dimer is formed by a small interface, the biounit dbs often don't get this correct - Michael Kuhn
look for crystal forms for each family; look at interfaces for each crystal form (building cube out of 3x3x3 unit cells) - Michael Kuhn
look at common interfaces in multiple crystal forms and calculate backbone structure overlap (# of matching residues) - Michael Kuhn
observation: different space groups will sometimes have nearly the same crystal form - Michael Kuhn
e.g. 1yak and 1yaf - Michael Kuhn
176 families, 3139 entries with interfaces in N/N crystal forms (N >= 4) - Michael Kuhn
try to see which interfaces are correct - Michael Kuhn
benchmarking: check performance of the 3 data sources using reference set - Michael Kuhn
get false positives, try to estimate how many crystal forms do you need to identify false positives - Michael Kuhn
small distortion of the interface is enough to make a monomer out of a dimer - Michael Kuhn
TT36: Younghoon Kim - MONET: A Cytoscape plugin for genome-scale network inference from expression profiles using modularization and parallel processing techniques with supercomputing resources
Motivation: insufficient information in gene expression data alone for the efficient analysis of thousands of genes - Oliver Hofmann
Improving the sample to gene ratio by modularization - Oliver Hofmann
GO annotation for the functional description. Divide/conquer approach, identify seed genes in global network, build local networks, parallel bayesian network learning - Oliver Hofmann
Available via the cytoscape plugin manager - Oliver Hofmann
Barb Bryant
PLoS session will feature, in the 4 time slots, (1) Abigail Morrison (neuroscience), (2) Adam Arkin (synthetic biology), (3) Donna Slonim (human development / TM), (4) panel discussion. Themes include sharing, collaboration, and areas outside the mainstream of ISMB.
All three speakers are editors at PLoS Com Biol - Diego M. Riaño-Pachón
TT34: Mark Clement - GNUMAP: Unbiased Probabilistic Mapping of Next-Generation Sequencing Reads
Workflow: hash the genome into short k-mers. Map reads on the genome using a probabilistic algorithm. - Gabriele Sales
Defaults to 10-mers currently - Oliver Hofmann
GNUMap use 10-mers as seeds. - Gabriele Sales
Alignments: probabilistic Needleman-Wunsch to compensate for the variable quality of base calls. - Gabriele Sales
Takes base call quality into account - Oliver Hofmann
Provides a quick overview of single read variation - Oliver Hofmann
The algorithm uses a PWM. - Gabriele Sales
What happens to reads matching multiple loci in the genome? Options: discard (repeats); map to all; pick one. - Gabriele Sales
Uses probabilistic assignment (proportional) for PWMs matching multiple genomic locations - Oliver Hofmann
Relation to other tools via simulation studies - Oliver Hofmann
More tests on real data: ETS1 binding domain. - Gabriele Sales
Planned future support for paired-end reads, SOLiD - Oliver Hofmann
Question: run time -- aim for balance. Higher accuracy, but not as fast as some others - Oliver Hofmann
No limit on the sequence size, should work on 454 - Oliver Hofmann
Requires all four base call probabilities, can't work on FASTQ alone - Oliver Hofmann
Allyson Lister
Special Session 4: PLoS: Donna Slonim on human development / Translational medicine
Should be from Bench to Bytes to Bedside and Back, not from Bench to Bedside - Allyson Lister
What might be helpful in closing the loop: strongly-interdisciplinary collaborations, availability of clinical data, and standards to ensure that data are shared in useful ways. - Allyson Lister
Translational Development Genomics: while there has been progress in screening, is there anything more that we can get out of genomics data and help with diagnoses? - Allyson Lister
So they did a pilot study for Down's Syndrome (caused by trisomy 21). - Allyson Lister
There is quite a range of expression - very little is significant. BUT there's huge disregulation of the genome. - Allyson Lister
The connectivity map further implicates oxidative stress: the top compounds (positive correlation) relate to oxidation and ion transport. - Allyson Lister
HL39: Francisco Melo - Evolutionary potentials for protein structure and function prediction
protein structure prediction through: fold recognition and comparative modeling, or ab initio structure prediction - Ruchira S. Datta
after making a model, its quality is assessed to decide whether to use it or discard it - Ruchira S. Datta
usually have several iterations of comparative modeling - Ruchira S. Datta
will speak about model quality assessment, an important ingredient in this process - Ruchira S. Datta
below 40% sequence identity, have larger errors in predicted structure, mostly due to sequence alignment error - Ruchira S. Datta
one way of dealing with this is to produce many alignments and use model quality assessment to assess the errors of the produced models - Ruchira S. Datta
detect errors due to incorrect template and misalignment, which can occur with template-target identities of 20%, 25%, or 30% id - Ruchira S. Datta
knowledge-based potentials, mean force potentials, or statistical potentials are scoring functions for model assessment - Ruchira S. Datta
states are represented using geometrical descriptors - Ruchira S. Datta
obtained scores represent pseudo-energies - Ruchira S. Datta
we don't have an unbiased sample of the folded proteins, but use the known native folded proteins - Ruchira S. Datta
very important to set the parameters of the potentials - Ruchira S. Datta
may have interactions between amino acids that are close in 3d space - Ruchira S. Datta
matrix = [a][b][k][r], interactions between C-alpha and C-betas; matrix dimensions are [40][40][10][30] - Ruchira S. Datta
traditionally derive statistical potential by taking whole pdb, then clustering it and taking representatives, in order to have unbiased sample of fold space - Ruchira S. Datta
train on features of these to get relative frequencies, derive relative statistical potential - Ruchira S. Datta
here instead they derive evolutionary or structure-specific potential - Ruchira S. Datta
but homologous protein sequences contain valuable information - Ruchira S. Datta
important or key residues for fast folding can be derived using information from known homologs - Ruchira S. Datta
some key interactions that provide stability can be derived from homologs - Ruchira S. Datta
PSI-BLAST, then MSA, then comparatively model each sequence using the template - Ruchira S. Datta
use these structures to derive family-specific statistical potential - Ruchira S. Datta
have to duck out for a meeting now :-( - Ruchira S. Datta
Allyson Lister
Special Session 4: PLoS: Adam Arkin on Synthetic Biology
Running the Net: Finding and Employing the OPerating Principles of Cellular Systems: the need for scientific standards and cooperation - Allyson Lister
synthetic biology, make simulation of cell, cheaper, more reliable and faster - Diego M. Riaño-Pachón
Again, need for standards and cooperation! - Diego M. Riaño-Pachón
synthetic biology is data driven - Allyson Lister
Dogs had been breed for almost anything (poor dogs) - Diego M. Riaño-Pachón
For dogs, that such differences would cause survival effects in the "wild" doesn't bother many people. - Allyson Lister
Next is the classic example of the cane toad, which destroyed environmental diversity. - Allyson Lister
TRANSPARENCY is the key to synthetic biology - the focus of making things as predicable as they can - Allyson Lister
About the Cane toad in Australia: - Diego M. Riaño-Pachón
Engineering is all about well-characterized, standard parts and devices. - Allyson Lister
The meta-data about microarrays in GEO is so poor that almost make those microarray useless - Diego M. Riaño-Pachón
But how to achieve those standards? - Diego M. Riaño-Pachón
If you put two attenuators on the same transcript, it behaves about as you expect: it functions as a NOT-OR gate. - Allyson Lister
single antisense RNA-mediated transcription attenuator: NOT gate - Allyson Lister
Bacteria engineered as pathogens to target particular human tissue (e.g. tumors). To do that, you have to build many different modules with its own computational and culure unit tests. - Allyson Lister
Absolute requirements: openness, transparency, standards, team-science approaches. - Allyson Lister
what we need: openness, publication of dead ends and multiple failures. - Diego M. Riaño-Pachón
PT36: Yang Huang - Graph Theoretical Approach To Study eQTL: A Case Study of Plasmodium Falciparum
P.falciparum is the most deadly human malaria pathogen - Oliver Hofmann
Little information about gene regulation so far, eQTL might be able to shed some light on this regulation and drug resistance - Oliver Hofmann
SNPs might affect gee expression. Consider expression as a quantitative trait like height, weight. Identify the associated locus by statistical methods. - Oliver Hofmann
Traditional tests between multiple loci, all expression. Comprehensive and without biast, but does not use the inherent data structure, computationally expensive and a problem of statistical power. - Oliver Hofmann
Alternative approach GeD, Graph-based eQTL decomposition. Include strain data in the association graph - Oliver Hofmann
Graph structure: Three types of vertices: gene linked to strain linked to locus - Oliver Hofmann
Find cliques to reduce data complexity - Oliver Hofmann
Each clique has 3 vertices (G/S/L) that are fully connected, in addition each clique is a maximal subgraph that cannot be extended further - Oliver Hofmann
Represent inherent data structures - Oliver Hofmann
Heuristic approach on eQTL cliques to look for (Locus,gene) pairs with certain patterns; refer to graph/diagram in paper - Oliver Hofmann
Cliques help to detect eQTLs, avoiding a large number of tests; integration of strain information provides a new framework for eQTL studies - Oliver Hofmann
PT35: Pradipta Ray - DISCOVER: A Feature-Based Discriminative Method for Motif Search in Complex Genomes
The problem: find transcription factor binding sites. - Gabriele Sales
In higher organisms, binding sites operate in clusters. - Gabriele Sales
Two approaches: supervides / unsupervised. They used a supervised search. - Gabriele Sales
Traditionally used PWMs to model motifs - Cass Johnston
PWM: position weight matrix. Used to score sequence windows. - Gabriele Sales
Problem: high false positive rate. - Gabriele Sales
An evolution of this idea: hidden markov models. - Gabriele Sales
Two states: motif and background states / distributions. You learn parameters of these models from known data. - Gabriele Sales
The performance of such models has saturated in recent years - Gabriele Sales
problems with HMM / generative models: may tune to noise, rather than the signal; Seem to have hit peak performance; Difficult to incorporate other sources of information to improve predictions - Cass Johnston
New approaches try to integrate other sources of evidence: multi-species phylogenetics, distance from TSS and between TFBSs, epigenetic data. - Gabriele Sales
DISCOVER allows you to integrate multiple sources of evidence into your motif model - Cass Johnston
They use conditional random fields, a discriminative model. - Gabriele Sales
The estimation phase reduces to a convex maximization problem. - Gabriele Sales
What features correlate with TFBSs? Motifs, for example, have high PhastCons scores. Background is linked to GC content. - Gabriele Sales
and check these features are discriminatory in the context of the model. - Cass Johnston
Leave-one-out cross validation on Drosophila data - Gabriele Sales
DISCOVER is 20% better than other algorithms on the F1 score (harmonic mean of precision and recall). - Gabriele Sales
Precision (TP/(TP+FP))/Recall l(TP/(TP+FN)) curves. DISCOVER balances the Precision/Recall trade-off better than other tools. - Cass Johnston
is abstracized really a word? - Cass Johnston
HL36: Patrick Bradley - Leveraging the context-specific coordination of transcript and metabolite concentrations to discover gene-metabolite interactions.
[apologizes to bloggers for the long title we'd have to write down :) ] - Michael Kuhn
want to find missing edges in metabolic maps, especially for non-model organisms - Michael Kuhn
e.g. pathogens which have evolved novel p/ws - Michael Kuhn
want to find different types of edges (e.g. regulation) - Michael Kuhn
gene-metabolite interactions: AMP kinase senses concentrations of AICAR, AMP and ATP - Michael Kuhn
how do transcription and metabolism influence each other on a global scale? - Michael Kuhn
paper: Bradley PH, Brauer MJ et al, PLoS Comp Bio, Jan 2009. - Michael Kuhn
starve yeast for etiehr carbon or nitrogen, look at transcriptome and metabolomic response - Michael Kuhn
SVD (singular value decomposition) suggests that transcr. and metab. responses are coordinated - Michael Kuhn
4 broad classes of metabolites: glycolysis; TCA cycle; biosynthetic intermediates; amino acids - Michael Kuhn
look at GO terms of correlated metabolites and genes; show expected behavior. e.g. amino acids -- protein biosynthesis - Michael Kuhn
to identify dependencies in the data, look at: direction and strength of correlation, environmental pertubation, class of metabolite - Michael Kuhn
some genes and metabolites display context-specific patterns, e.g. methionine vs. MET6 - Michael Kuhn
in some cases positive corr. under one environmental condition and negative corr. under another condition - Michael Kuhn
train Bayesian network on pathway data from KEGG - Michael Kuhn
nitrogen starvation seems to be more informative - Michael Kuhn
Allyson Lister
Special Session 4: PLoS Abigail Morrison on Neuroscience
First of three speakers at the PLoS special session - Allyson Lister
All three speakers are editors at PLoS Comp Biol - Diego M. Riaño-Pachón
In computational neuroscience, the key ideas to be communicated are mathematical and computational models as well as data analysis methods - Allyson Lister
Very little standardization in many areas of computational neuroscience - Allyson Lister
Lack of standardization is hindering progress and building on others work - Diego M. Riaño-Pachón
Abigail Morrison can think of only one model, in all the times she's worked on it, that they've been able to reproduce without going back to the authors - Allyson Lister
Is it science, or is it travel reporting? - Allyson Lister
Approach to solve the problem: work together to create tools to facilitate reproducibility - Diego M. Riaño-Pachón
International Neuroinformatics Coordinating Facility - Diego M. Riaño-Pachón
Japan node: Visiome (and we have another -omics) - Diego M. Riaño-Pachón
The Japanese node focuses on the visual side of things, and has produced Visiome, which attempts to collect both papers and figures separately as well as model parameters, simulation scripts and figure-generation scripts. This can all be downloaded, and then hopefully run it on your own system - Allyson Lister
Also the Simulation Server Platform, which uses a VM to reproduce the environment the original model was run in so others can test and run it - Allyson Lister
German node' goal is to provide open source tools for data sharing and analysis - Diego M. Riaño-Pachón
The problem is that there are many different recording devices and analysis tools, and no standardization - Allyson Lister
German node developing unified data format + associated tools - Allyson Lister
They also want to design and implement a machine-readable declarative language to describe neural network model (like SBML) - first meeting in March 2009 so still new. - Allyson Lister
PyNN, Python Neural network simulation package - Diego M. Riaño-Pachón
facilitates cross-checking, as it can run on any simulator that implements the common API - Diego M. Riaño-Pachón
Simulation-code written in PyNN, can use several underlying simulators, instead of using simulator-specific languages - Diego M. Riaño-Pachón
Gregory Wilson, American Scientist 2006 "Where's the real bottleneck in scientific computing?" - Allyson Lister
if nobody can understand you ideas and reproduce your results, have you really remove a bottleneck or just made it somebody else"s problem? - Diego M. Riaño-Pachón
HL35: Johannes Soeding - Context-specific BLAST detects twice as many homologous proteins as BLAST
Not only about BLAST, but also about models of protein evolution. - Gabriele Sales
Importance of BLAST searches: 400,000 searches run at NCBI per day. - Gabriele Sales
BLAST (1990) and PSI-BLAST (1997) cited 45,000 times. - Gabriele Sales
Protein bioinformatics relies heavily on sequence searching. - Gabriele Sales
Sequence alignments are a special case of profile alignments. - Gabriele Sales
The score of an alignment can be thought as the log of the ratio of the probability of the mutations needed to go from sequence X to Y over the average probability of Y. - Gabriele Sales
Context specific substitution matrices used with success, among others, for protein structure prediction (Rice & Eisenberg 1997, Huang & Bystroff 2006). - Gabriele Sales
Their approch: take 6 neighbours to the left and to the right of a nucleotide. - Gabriele Sales
The search uses a sliding window over the sequence; the mutation probability is computed by looking at a precalculated library of profiles. - Gabriele Sales
Profiles are built out of homology relations found with BLAST. The library contains 1 million profiles. - Gabriele Sales
Performances: CS-BLAST finds ~2 times more homologs than BLAST. - Gabriele Sales
PSI-BLAST: iterative search. - Gabriele Sales
Extension with context specific pseudocounts -> CSI-BLAST - Gabriele Sales
Performances increase not as much as in the BLAST case, but still more sensitivity. - Gabriele Sales
PSI-BLAST 5th iteration is similar to the 2nd of CSI-BLAST. - Gabriele Sales
Some problems in the E-value calculations. Repeat proteins could cause high-scoring false positives. They modified the calculation and removed the bias. - Gabriele Sales
Bioinformatics is now driven by information, no longer by algorithms. - Gabriele Sales
To tame information you need a statistical model. - Gabriele Sales
Poster: U08 - Gabriele Sales
Keynote: Eugenia del Pino - The comparative analysis reveals independence of developmental processes during early development in frogs
Has studied "marsupial" frogs, found in the gardens of her university. - Allyson Lister
Brief description of Ecuador, and the Galápagos islands - Diego M. Riaño-Pachón
And, there are no native frogs in the Galapagos. - Allyson Lister
Most of frog diversity is in Brasil, Colombia and Ecuador - Diego M. Riaño-Pachón
Why not just studying Xenopus laevis? To discover developmental differences and for training of local researchers - Diego M. Riaño-Pachón
X. laevis comes from S. Africa as the early development of this frog is better known than that of humans. - Allyson Lister
Beautiful pictures of foam-nesting frog embryos - Diego M. Riaño-Pachón
embryos are white for camoflage in the foam-nest frog - Allyson Lister
Dendrobatid frogs include the poison arrow frogs, though they don't study the poisonous ones. - Allyson Lister
The males find an appropriate nest spot, then call the females. Males care for the embryos for 20 days. - Allyson Lister
Beautiful drawings of the early development of the dendrobatids studied by O. Perez - Allyson Lister
Marsupial frogs (Gastrotheca), the female have a pouch located on her back where she carries the embryos - Diego M. Riaño-Pachón
The mother transports the embroys for 4 months, and then puts them in the water by opening the pouch with her hind legs. - Allyson Lister
the eggs of marsupial fros are larger than average - Diego M. Riaño-Pachón
The reproductive physiology of the frog resembles that of mammals. - Allyson Lister
Now a video or early embryo development in X. laevis (from Jim Smith) - Diego M. Riaño-Pachón
Gastrulation is a common process in vertebrates - Diego M. Riaño-Pachón
A major problem of early development is to perform the change from sphere to elongated tadpole. The important movements are called the dorsal convergence and extension. - Allyson Lister
An important gene in CE is Brachyury. - Allyson Lister
The DBL is also called the "organizer" due to the work of Spemann-Mangold in 1924. - Allyson Lister
Organizer genes are conserved across vertebrates. - Allyson Lister
The cavity formed during gastrulation is the Archenteron ("the primitive cut") - Allyson Lister
In the slowly-developing frogs there is delayed elongation of the archenteron. - Allyson Lister
Polyclonial antibodies were made for both Lim1 and Brachyury. These are TFs and accumulated in nuclei. - Allyson Lister
Notochord elongation and therefore dorsal CE begins in the midgastrula in rapidy-developing frogs. - Allyson Lister
Conversely, notochord elongation and CE occur after blastopore closure in slowly-developing frogs. - Allyson Lister
Implications of this work: frog gastrulation is modular, and CE is not essential for gastrulation. - Allyson Lister
Also, the head and trunk organizers are separable from each other - Allyson Lister
There are different ways to make a frog - pierenry
Thanks for that guys, takes me back to my developmental biology days working on defective organiser function in KO mice.. - Daniel Swan
Michael Kuhn
BioPathways SIG: Tomer Shlomi, Predicting metabolic biomarkers of human inborn errors of metabolism
study metabolic networks because they contain less errors than other types of networks - Michael Kuhn
hundreds of in-born errors of metabolism (IEM), affecting 1/5000 of babies - Michael Kuhn
there's a scale of methods from kinetic models to topological analysis, in between: constraint-based modeling - Michael Kuhn
known method: predict steady-state flux using stoichiometric matrix - Michael Kuhn
modeling human metabolic network is harder than microbes: can't assume exponential growth, can't control nutrient intake - Michael Kuhn
goal: predict metabolites whose concentration is altered upon mutation - Michael Kuhn
find out which metabolites are processed when the enzymes works / does not work. predict changing concentration of chemicals which aren't taken up / secreted - Michael Kuhn
apply to metab. diseases in OMIM, get 223 metabolites which conc. changes - Michael Kuhn
most disorders have few biomarkers, about half have a distinct pattern of biomarkers - Michael Kuhn
benchmark: text-mining on OMIM data (noisy): get moderate correlation w/ predictions - Michael Kuhn
also manually extract data for amino-acid metablism. enrichment over random: between 6 and 15.8 - Michael Kuhn
won't work if there are other metabolic routes. need stoichiometric info and network topology for correct modelling - Michael Kuhn
HL34: Tomer Shlomi - Network-based prediction of human tissue-specific metabolism
so far similar to Sunday's biopathways SIG talk: - Michael Kuhn
want to predict metabolic activity of human genes - Michael Kuhn
problem: expression level does not reflect metabolic activity: has post-translational regulation - Michael Kuhn
(4% up-regulated, 16% down-regulated) - Michael Kuhn
large-scale validation using tissue-specific literature data - Michael Kuhn
tissue-specificity of disease genes: if a gene causes a disease in a certain tissue, it is likely to be active in this tissue - Michael Kuhn
method is capable of predicting tissues where the genes are post-transcriptionally up-regulated - Michael Kuhn
highly expressed neighboring genes give hints that a certain gene will be active - Michael Kuhn
MEAT: metabolic expression analysis tool - Michael Kuhn
follow-up work: biomarkers from genetic disorders, see SIG talk linked at the top and the paper: - Michael Kuhn
PT33: Thomas Abeel - Towards a Gold Standard for Promoter Prediction Evaluation
definition of a promoter. Multiple transcription start sites in a given transcriptional unit. - Cass Johnston
1997: first paper evaluating promoter prediction. 9 prediction programs. 15-30% success - Cass Johnston
2004 - whole genome. Precision 5-87% and recall 27-29%. - Cass Johnston
2006 - CAGE tags provide known results to test predictions - Cass Johnston
Need a gold standard as there is no consensus method at the moment, and different tools all claim to be the best. - Cass Johnston
They use data from CAGE (clustered into transcription start regions) and RefSeq (5' end as true promoter site) to test predictors - Cass Johnston
17 programs. All free for academic use and capable of being run on the whole human genome. Ran all programs over various thresholds - Cass Johnston
Plot precision v recall and calc AUC over threshold range for each of the programs. - Cass Johnston
Four programs scoring over 20% in prediction: ARTS, EP3, Eponine, ProSOM - sebi
considered 4 best scoring programs. Found classes of promoters - some with a single peak of tss, some with a couple, a few with many tss all over the region - Cass Johnston
Many of these programs predict unique promoters the others don't provide -- merge prediction to increase information? Majority vote? - sebi
Current promoter prediction: 1/3 sites can be predicted and 2/3 predictions are correct. - Cass Johnston
Other ways to read this feed:Feed readerFacebook