17th Annual International Conference on Intelligent Systems for Molecular Biology & 8th European Conference on Computational Biology The talk specific feeds will be created each day shortly before the start of the first presentation. Find talk specific blogs by searching here for the authors, the title of the talk or the talk identifier as given in the program (like HL03 for the 3rd Highlight paper) The feeds can also be accessed on the conference pages in the according sections: SIGs, Keynotes, Proceedings Track, Technology Track and Highlights and the last few blogs are shown on our web-portal page.
A kissing loop is intramolecular loop that makes interaction with the other strand.
- Roland Krause
Give the energy functions for the new structures for the interaction partition function.
- Roland Krause
Second part given now by Hamid Chitsaz
- Roland Krause
Main interest in the strength and stability of the RNA interaction, which is challenging because all structures must be accounted for exactly once.
- Roland Krause
Make use of a dynamic programming approach using divide and conquer. Compute the partition function for bits of the structure.
- Roland Krause
McCaskill's algorithm (1990) is introduced, describing a participation function for single strands.
- Roland Krause
For two unpaired strands, McCaskill's algorithm can be used directly,
- Roland Krause
98 cases need to be considered, not even in the paper but in the suppl. mat.
- Roland Krause
# sounds like a complicated model, it's not completely presented
- Roland Krause
Explanation for the no-zig zag assumption motivated by less running time, one arch describes them all.
- Roland Krause
Can predict the equilibrium concentaion, the melting temperature and the UV absorption.
- Roland Krause
Presented algorithm (piRNA) outperforms existing ones by an order of magnitude, experimental validation with Tm shows little deviation.
- Roland Krause
Jansen (Curr Opinion Plant Bio 2009) - integrate xQTLs; summary of results for integrating the 4 levels of association. Not all associations are shared across the levels, but some are.
- Leopold Parts
Would like to infer causes, not just associations.
- Leopold Parts
Can compare causal vs. pleiotropic models.
- Leopold Parts
Problems - observed noise not biological; few causal associations compared to pleiotropic ones
- Leopold Parts
Need large sample sizes to reliably infer causal relations. 30% variance explained by QTL for both traits required for 200 individuals.
- Leopold Parts
Boerjan (Nat Genet 2009) - hotspots; few loci in genome with high number of associations for levels from transcripts to phenotype
- Leopold Parts
Johannes (Nat Rev Gen 2008, PLOS Genetics 2009) - chromatin state uncorrelated to DNA sequence correlated to traits? Large part of heritable variability explained by epigenetic variants.
- Leopold Parts
@Michael - thanks; any chance to move these? Anyway, next ones will be in more appropriate place :)
- Leopold Parts
@Leopold: Unfortunately not, I wish this was easier. I'll make a post for the next talk and link it here (this time in the ISMB room!)
- Michael Kuhn
eQTL electric diagrams: simulate flow as electric current
- Michael Kuhn
need to improve eQTL methods, switch to multi-parametric methods
- Michael Kuhn
random forest eQTL mapping, as opposed to traditional regression methods
- Michael Kuhn
eQTL should rather be a classification problem
- Michael Kuhn
people are worried if they have enough power (individuals) for multivariate regression
- Michael Kuhn
Regression trees on genotypes as a multivariate approach
- Leopold Parts
random forests (RF) are based on decision trees, search for marker that separates pop. into more homogeneous subgroups
- Michael Kuhn
split subgroups again based on different markers, groups become smaller and more homogenous
- Michael Kuhn
randomly pick splits, get forest of decision trees: thus random forest
- Michael Kuhn
importance measures for markers: increase in variance when marker is randomized; reduction of RSS when used; number of times marker is used
- Michael Kuhn
(last one is simpler, but less established)
- Michael Kuhn
should work on real data, rather than simulations
- Michael Kuhn
benchmark on known pathways to identify regulators and regulated genes
- Michael Kuhn
KEGG enrichment in yeast: RF methods and lasso outperform univariate methods
- Michael Kuhn
Lasso 2nd best, beating two of three RFs
- Leopold Parts
RF1 > Lasso > RF2 > RF3 > univariate methods (in yeast)
- Leopold Parts
similar ranking in mouse data (although more noisy overall)
- Michael Kuhn
The different RF approaches best in mouse; marginal differences
- Leopold Parts
RF will fall back to univariate if power is sufficient for only one marker (by giving back confidence values)
- Michael Kuhn
(metric for real data tests - fraction of locus-gene pairs in same pathway)
- Leopold Parts
have 2 different mouse populations BXD, MDP [see Andrew Su's talk]. can data from one population inform studies on the other?
- Michael Kuhn
find consistency much above random between the two strains
- Michael Kuhn
most agreement on cis-band between the two populations, but also some interesting trans bands
- Michael Kuhn
have consistent eQTLs, want to finemap
- Leopold Parts
want to use MDP data for fine-mapping BXD. BXD is more homogeneous, thus less markers
- Michael Kuhn
find new regulations for gene Auts2, but know little about the gene
- Michael Kuhn
find 4 genes which interact with Auts2
- Michael Kuhn
multiple testing issue? will find these effects in a random dataset..
- Leopold Parts
take home: do second level analysis after multivariate mapping; finding epistatic interactions might be possible even with small pop. sizes if you have high quality data
- Michael Kuhn
Confirmation of results on top of simulation studies - consistency in pathways, categories as a quality metric
- Leopold Parts
validation: ortholog lethal ration: the proportion of yeast genes whose orthologs in another species, specifically, C. elegans, are lethal
- Ruchira S. Datta
hypothesis about ortholog lethal ratio: nonlethal genes in lethal complexes > nonlethal genes in nonlethal complexes
- Ruchira S. Datta
hypergeometric enrichment test was not significantly different, but the Bayesian network model was validated. they think this is because the hypergeometric test produced many false positives
- Ruchira S. Datta
gene lethality is more conserved at the module level
- Ruchira S. Datta
use module-based mapping to transfer phenotypic mapping across species, rather than gene-based
- Ruchira S. Datta
use a novel device called probabilistic arithmetic automata - won't go into details
- Mikhail Spivakov
Need to compute the distribution of occurrences by chance. Not a straight forward task, recently proposed a new approach by building a probabilistic arithmetic automata.
- Roland Krause
The problem is that computing p-values is infeasible due to large number of motifs.
- Roland Krause
matches occur in clumps. use compound possion approximation (almost exact) to calculate exact distribution of clump sizes. approximate number of clumps by Poisson distribution
- Mikhail Spivakov
Use of a Compound Poisson Approximation on a set of clumps (sets of overlapping motifs)
- Roland Krause
Use of a suffix tree of the sequence, iterate over the motifs, use the lower bound for pruning, walk the tree and identify overrepresented motifs.
- Roland Krause
so re-evaluate the motifs producing a good p-value with iid on a Markovian text model
- Mikhail Spivakov
designing a good benchmark set is hard
- Marcel Martin
Future work could incorporate Markovian models directly or use phylogenetic information.
- Roland Krause
Q. Is the implementation available? A: Given in the paper.
- Roland Krause
question: is the tool available? yes, URL in the paper
- Marcel Martin
you have to take into account overlapping motifs for doing proper statistics
- Marcel Martin
Q. Are the data in Jasper or Transfac? A. Had an expert looking at it.
- Roland Krause
# Jasper and Transfac do not really cover Mycobacterium motifs
- Roland Krause
Q: Performance of the algorithm on short motifs. A. Length 10 is the upper bound for the algorithm which is quite dependent on the length.
- Roland Krause
Q: applying to protein models? A: problematic because alphabet is larger and indels would need to be modelled
- Marcel Martin
Q: how is the iid text model? how do you justify that the text fulfills the model? A: the iid model is estimated from the text. dependencies between characters are incorporated by using the Markovian model
- Marcel Martin
Q. (Marcel Schulz) Differences in Markov models of different orders. A. Shorter orders give spurious results.
- Roland Krause
Q: why only a part of the motif space? A: tried to come up with a plausible set that includes most motifs
- Marcel Martin
Human case sources: genomic data (interactions, microarrays, sequences); prior knowledge (229 biological processes from GO and KEGG).
- Gabriele Sales
There is so much information about humans that the independence assumption at the base of naive bayesian classification is violated. Used mutual information for regularization.
- Gabriele Sales
start with predicted gene or gene product interaction network
- Ruchira S. Datta
each edge is an integration of hundreds of datasets, going from low to high confidence
- Ruchira S. Datta
Average strengh of relationships defines associations.
- Gabriele Sales
take subnetworks associated with one process and another, use strengths of edges to quantify how related the two processes are
- Ruchira S. Datta
Four measures of functional associations between two gene sets: edges between genes; edges within each set; background edges incident to each set; all edges in the network.
- Gabriele Sales
any tow sets of genes 4 measures 1. edges between their genes 2. edges within each set, 3. the background edges incident to each set 4. the baseline of all edges in the network
- Venkata P. Satagopam
any sets of genes G1 and G2 can be compared 4 ways: 1. edges between their genes, 2. edges within each set (e.g., BRCA is well-studied, so easy to get high confidence; want to normalize this out). 3. the background edges incident to each set, and 4. the baseline of all edges in the network (i.e., the baseline of the average confidence will vary from place to place within the network)
- Ruchira S. Datta
Such measures are combined into a single score.
- Gabriele Sales
Repeat with data samples, add noise and try to fit different models to the sampled occupancy landscape
- Oliver Hofmann
Used synthetic data, including experimental noise, then forgot the algorithm to generate the data and used the previously described models.
- Roland Krause
Exponential function shows a good correlation in five-fold cross-validation, with the model incorporating the interaction outperforming the ones without.
- Roland Krause
Are the interactions relevant in vivo and in vitro?
- Roland Krause
For an in vitro validation the Exp, Step functions work better than the no cooperativity model
- Oliver Hofmann
Interactions are also important using in vivo data, data from several chromosomes and different organisms (yeast and C. elegans) show similar results.
- Roland Krause
The interesting part -- biological basis for this preference?
- Oliver Hofmann
The in vitro system consists of nucleosomes and DNA only.
- Roland Krause
shorter length may allow for interaction of nucleosomes, energetically favoring their shift from otherwise better binding positions
- Oliver Hofmann
Hypothesis: electrostatic interaction between nucleosomes, which has been described previously but not shown in the data.
- Roland Krause
# Surprised that the very model was not presented in the talk.
- Roland Krause
# Wonder whether there is a difference between promoters, other nucleosome covered regions (or other genomic subsets); with the role nucleosome displacement plays in regulation I'd expect some shift
- Oliver Hofmann
Q: When you started motivating the model, there was a peak at 10 but the sample data did not have that.[?] A: These are effects of histone H1, which are not modeled. There are a lot of question marks about the role of H1 in yeast. The interaction that we modeled are global and can occur anywhere on the DNA. More complex model would incorporate additional local effects, e.g. other factors.
- Roland Krause
Q. Have you finished the model for the higher eukaryotes: A The model is independent of the organism. There might be differences, although this is debated. Differnces between yeast and human are known.
- Roland Krause
Some thoughts I've had: Pre-annotation of papers and posters prior to the meeting, as is done by David Shotton in "Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article" at http://dx.doi.org/10... . Don't think it's automated, but you get the idea - and David Shotton gave a talk here this week (http://ff.im/4xwI9) ; and also Reflect (http://ff.im/4yVO1)
- Allyson Lister
Let's meet for the 4.15 coffee break at the coffee bar in front of K1, @Allyson and other interested people. If you have ideas, it would be good to prepare them.
- Roland Krause
I COMPLETELY forgot about the meetup at coffee today! I'm sooo sorry! Was anything decided? Can you put up any summary points that might be of interest tomorrow? My apologies!
- Allyson Lister
@Oliver - I missed the pre-BOF meetup today at 4:15 at coffee - see above... :(
- Allyson Lister
Oh. I did not even SEE that comment. Sigh.
- Oliver Hofmann
No problem, we can meet over nibbles at the reception now. I can tell you what I have in mind.
- Roland Krause
@Roland - I'm afraid I'm not available this evening. Morning? Or just write your ideas here? I'm a dork - sorry! :)
- Allyson Lister
This should not take long at all, but please leave your ideas here. This is going to be an interactive session anyway, so we don't have to be all that prepared. My slides are more of a moderation prop than a standalone presentation.
- Roland Krause
I would like to have some meta-discussion: why are we live blogging? How is this useful and how can we make it more useful? Does it mainly benefit those not attending the meeting or is it also useful as an archive of what has been said?
- Michael Kuhn
from iPod
In addition to what Michael Kuhn (@biocs) said, meeting microblogging can also benefit the presenter - giving them valuable feedback. Also links posted to papers, websites, etc. are great, to easily view more info about what someone is/was talking about.
- Fiona Brinkman
I'll miss this, due to my time difference (Can. west coast) but I look forward to all your thoughts and thank you for setting this BoF up!
- Fiona Brinkman
@Fiona - we'll make sure to post (maybe microblog!) the BoF :) Thanks for your input - benefits to the presenter is a really good point.
- Allyson Lister
Last year at ISMB, even though I was already on FF I took notes as I habitually do, with calligraphic fountain pen. During the course of the intervening year I would find myself trying to refer back to those notes while in conversation with others, but had difficulty finding what I wanted in time.
- Ruchira S. Datta
Paradoxically, I perceive myself as being slower when taking notes electronically than by hand (I think I tend to spend more time formulating them). But since we're doing this cooperatively, if one of us doesn't catch something another will. The notes will be searchable any time on the web; I won't have to dig up a particular file on a particular computer. The same effort provides the same resource for others.
- Ruchira S. Datta
I did revert to the fountain pen at various points this year though! It's more reliable than depending on the availability of power, wireless, and FriendFeed.
- Ruchira S. Datta
@Peter - the ISMB website tells me we will be in C8 for this session
- Allyson Lister
The BioSysBio 2009 conference report (http://genomebiology.com/2009...) just came out (disclaimer, I'm an author). Simon Cockell had a wordle used (http://www.flickr.com/photos...), and FF and Twitter get a mention, if a short one: "A new feature for BioSysBio 2009 to extend participation in the conference was to a wider audience by communicating live...
more...
- Allyson Lister
Roland, where is the BoF going to be?
- Michael Kuhn
Journalists and bloggers should abide by the same rules at a conference - whatever those rules are
- Allyson Lister
Classical blog posts, twitter and friendfeed: surprising how effective friendfeed is given it's basic nature
- Oliver Hofmann
Find it quite difficult to make sense out of Twitter (best - wordle aka Simon Cockell?)
- Allyson Lister
Oliver says FF isn't perfect, but it does have a very low barrier to access and start using
- Allyson Lister
Where do personal opinions come into it? In general at ISMB we are all being polite. What level is acceptable?
- Allyson Lister
Roland: the discussion culture in science has changed a litle. He read that 50 years ago they were having much more open discussions
- Allyson Lister
Maybe people might be away from the conference and following remotely, or at the conference and shy - such people could pose questions on friendfeed, and then chair (who is monitoring FF) could ask them at the end of the talk
- Allyson Lister
Differences to last year: increased depth of coverage (due to ISCB support, raised awareness?)
- Oliver Hofmann
We all thanked the organisers from this year
- Allyson Lister
ISMB people (Reinhard?) will do some basic stats of the coverage of this conference
- Allyson Lister
Many of us bloggers feel that the first to benefit are ourselves
- Allyson Lister
First person to benefit from blogging: the blogger
- Oliver Hofmann
Live blogging especially useful for keynotes - for recording the presenter's thoughts.. and not always an accompanying paper available.. ?
- Peter Menzel
Andrew: heavily blogged talks are more useful for external people than delegates
- Allyson Lister
liveblogging is complementary to webcast etc as it takes less time - if you like the liveblogging and the blog posts after scanning, then you go and look at webcast
- Allyson Lister
Also, people liveblogging tends to pick out the key points quickly
- Allyson Lister
webcasting also has many time and technical issues
- Allyson Lister
oliver: Perhaps much of the commenting is too sober. Not enough meta-information, e.g. "here is X's main competitor's paper..."
- Allyson Lister
it would be nice to have more background - yes
- Jim Procter
Roland: can you have two streams - one light, one serious. Allyson: but it would be difficult in practice, if you have to look at two different things
- Allyson Lister
Three levels: factual information/transcript. notes to self, public commentary / context
- Oliver Hofmann
Feeds are important feedback for the presenter ?
- Peter Menzel
technical issues: there can be issues with internet / wireless
- Allyson Lister
Should we have TLAs or abbrevs to indicate personal comments or uncertainty? Maybe not TLAs (makes barrier for newbies) - how about single symbol at the beginning, e.g. '#' for personal comments
- Allyson Lister
We should write up a boilerplate: what it means to liveblog; if you don't want it, then... (e.g. logo from Cameron) ;why scientists should liveblog; *suggest* that your FF account should link to something that identifies you; you might want to consider only posting things that you would be happy saying to people's face.
- Allyson Lister
no legislation, though - don't want to force people to do anything
- Allyson Lister
imho: Official Facebook group etc. should be more advertised..
- Peter Menzel
@Peter - I agree - the embedded FF threads within the ISMB site were advertised, but more should have been made of them - it's quite cool to see!
- Allyson Lister
Some great points made here - thank you!
- Fiona Brinkman
13-14 people interested in genomes and sequence visualization
- Jim Procter
9 interested in alignments and phylogenies
- Jim Procter
systems biology (highthrouput (omix) and pathways) - *everyone is interested in this session*
- Jim Procter
Will there be any discussion on types of indexes and containers for visualizing very large datasets? For example ways to very quickly grab the features in a small genomic region for display in genome browser?
- Jan Aerts
"Jan- I'll ask your question in a second - we'll go through the sessions once again
- Jim Procter
LM session description- visualization of final processed data either still images or trajectories - ~ 6 people - seems less than expected
- Jim Procter
and around 30 people have raised their hands more than once.
- Jim Procter
What about visualization of features that have more than 1 position on a genome? E.g. readpairs. => circular genome browser?
- Jan Aerts
@Jan - we hope to cover that as well. I also hoped todeal with genome assembly visualization- but we couldn't find anyone willing to review it (I don't actually beleive this myself!)
- Jim Procter
usability keynote yet to be determined (we have invited someone and await their reply)- but we aim to discuss how best to create usable biological software
- Jim Procter
@Jim: regarding the circular browser: something like Circos, but for browsing rather than just presentation; not static.
- Jan Aerts
@Jan - yes. interactive and analytical visualization is defintely the focus
- Jim Procter
@Jan - sorry I haven't raised your question yet - do you want to ask yourself at the appropriate moment ?
- Jim Procter
we have over 70 people registered at www.vizbi.org - and we will be opening registration soon.
- Jim Procter
question about how do we visualize textual semantic relationships - this is still a challenge in information visualization
- Jim Procter
sean explains that we really had to rationalise the sessions to make things strongly focused (as a result, his tools - as an example - may not be reviewed in the specific talks)
- Jim Procter
Can someone also ask if people would see value in a biohackathon focussed on visualization? Not a meeting, but actually getting things done?
- Jan Aerts
@Jan - ah - ok. Sean has really answered your question implicitly. We really wanted to get the *interactive* tools covered - so a biologist would have an easy way of discovering the best tools for each type of focus.
- Jim Procter
@Jan - I will. I can also raise this via OBF
- Jim Procter
suggestion of having a bioinformatics web and application usability interest group that can provide guidelines on usability for our applications
- Jim Procter
often all we need it a place to start for this.. not a complete service
- Jim Procter
userbase/tool disconnect - often tools are developed by the person who originiated the database/method/etc without any idea of what the most productive user is actually going to want to do - and often without any real general knowledge of how the rational biologist works
- Jim Procter
Am working on a genome viewer that looks like Circos, but is completely interactive (i.e. zooming in/out of loci). Bumping into both design and technical issues, though.
- Jan Aerts
@Jan - can you contact Cydney Nielsen ? make sure she knows about your tool - we are also setting up WIkipedia pages (see http://en.wikipedia.org/wiki...)
- Jim Procter
sean -> says user interface designer must be completely in the mindset of the end user
- Jim Procter
floor: new instruments drive new tools to visualize data - and there are several levels on how users need to be enabled to visualize
- Jim Procter
suggest there is a lack of cognitive science and psychology in this meeting
- Jim Procter
poster deconstruction: images with sharp and bright colours are great for 12 year olds but not necessarily intuitively informative about the underlying information.
- Jim Procter
i.e. there could be standard coloursets for types of data (minimum numbers of colours)
- Jim Procter
NIls follows on - there are a number of visualization fields where this has been an issue and very reliable guidelines have been developed to deal with colour confusion
- Jim Procter
this conference is a first step - we hope that there will be a much stronger accent on pactical visualization theory and attract some of the experts from the more general visualization community
- Jim Procter
floor: its often about finding out what you want to ask of some data, and then finding a visualization that can give them an answer.
- Jim Procter
my response: this used to be the aim of the visualization toolkits but these are often not quite appropriate for bioinformaitcs (scaling, deployment, licencing)
- Jim Procter
floor: very important to actually watch the user to see how they use your application - great understanding about how you can improve it
- Jim Procter
floor#2 point: pure html web applications - and a 2d page is not sufficient for completely integrated visualization
- Jim Procter
sean considers its too early to start defining standards (I feel standards are actually defined elsewhere, we just need to start to see how they can apply to us)
- Jim Procter
Do people think we might be able to end up with some "design patterns" such as in software development (once the field has matured a bit more)?
- Jan Aerts
ah. possibly. I defintely think there are analogous patterns in user interface design that map to the kinds of cognitive processes that a biologist operates
- Jim Procter
from iPhone
Nils is talking about this - systems biology visualization is all about abstract information models - networks, matrices, multidimensional time series, etc - these have all got standard (and new/emerging) models for their visualization
- Jim Procter
from iPhone
sean discusses occams razor - don't create a million categories
- Jim Procter
from iPhone
around 10 or so are interested in contributing to the wiki pages
- Jim Procter
from iPhone
how do we stop the wikipedia list being 'just a list of biological tools'
- Jim Procter
from iPhone
sean comments that there are many ways of cutting a slice out of this area - we've started at the very simplest and simply list tools with some indication of the types of methods they are relevant to and types of visualuization that they accomplish
- Jim Procter
from iPhone
developer comment:visualization tools require significant R&D on behalf of the developer
- Jim Procter
from iPhone
how to link ecological data with molecular data
- Jim Procter
from iPhone
sean reveals the hidden agenda - enabling different types of visualization tool to be easily combined. Wehope that bringing the developers of these tools together might happen
- Jim Procter
from iPhone
practically everyone here is a developer, around ten or so are users, and a scattering ofpeople came because they were just interest
- Jim Procter
from iPhone
AA: the problem is reward. The reward for publishing a model as SBML is zero, or even negative right now. Until it stops preventing us from doing science, you won't see as much an uptake as you should. Even now, it's somewhat limited in what it can capture.
- Allyson Lister
AA: even if he forced his lab to do MIAME stuff, they'd find ways to circumvent it. Need to try harder to have a reward. Linux worked because there is a dictatorial and charismatic leader at the center that started out with a core group
- Allyson Lister
PB: We should start having DOIs and PubMed IDs to things that are a little less than traditional.
- Allyson Lister
PB: Very few people sit and look through an entire journal anymore: they go to specific papers. So, you should be able to navigate between papers, abstracts and other components of the paper.
- Allyson Lister
Gene expression profiles from microarray studies can be used to identify cell types
- Cass Johnston
Regulation of transcription still not really understood. Coding regions are well studied, but most of the genome is non-coding. Presumably these regions contain regulatory elements.
- Cass Johnston
Some regulatory elements are known. Promoters, Enhancers, Insulators.
- Cass Johnston
Enhancers are the least understood elements in transcriptional regulation.
- Gabriele Sales
H3K4me1 high at enhancers but not promoters. H3K4me3 is high at promoters but not enhancers
- Cass Johnston
Study of histone modification for 414 promoters.
- Gabriele Sales
ChIP-chipped 5 cell lines for K4Me1, K4Me3, K27ac
- Leopold Parts
Epigenetic dynamics of promoters, insulators and enhancers? How do they change between cell types? Which play biggest roll in determining cell type differences?
- Cass Johnston
CTCF insulator binding sites are invariant between cell types
- Cass Johnston
chromatin signatures at promoters and insulators (CTCF-bound regions) are invariant
- Mikhail Spivakov
in general, enhancer histone modifications are cell-type specific
- Leopold Parts
while the profiles at enhancers are tissue-type specific
- Mikhail Spivakov
Using this signature, prediction of 36589 enhancers genome-wide.
- Gabriele Sales
moved to genome-wide analysis of H3K4me and H3K27ac + DNA I hypersensitivity and co-activators p300 and MED1 - predicted a large number of enhancers
- Mikhail Spivakov
Validation: recovery of well-studied enhancers.
- Gabriele Sales
7/9 of cloned predicted enhancers showed activity
- Mikhail Spivakov
Found many motifs specifically enriched in enhancers
- Cass Johnston
Enhancers are enriched near HeLa-specific genes.
- Gabriele Sales
enhancers are enriched at cell-type-specific genes
- Mikhail Spivakov
TT36: Younghoon Kim - MONET: A Cytoscape plugin for genome-scale network inference from expression profiles using modularization and parallel processing techniques with supercomputing resources
Improving the sample to gene ratio by modularization
- Oliver Hofmann
GO annotation for the functional description. Divide/conquer approach, identify seed genes in global network, build local networks, parallel bayesian network learning
- Oliver Hofmann
PLoS session will feature, in the 4 time slots, (1) Abigail Morrison (neuroscience), (2) Adam Arkin (synthetic biology), (3) Donna Slonim (human development / TM), (4) panel discussion. Themes include sharing, collaboration, and areas outside the mainstream of ISMB. http://www.iscb.org/ismbecc...
What might be helpful in closing the loop: strongly-interdisciplinary collaborations, availability of clinical data, and standards to ensure that data are shared in useful ways.
- Allyson Lister
Translational Development Genomics: while there has been progress in screening, is there anything more that we can get out of genomics data and help with diagnoses?
- Allyson Lister
So they did a pilot study for Down's Syndrome (caused by trisomy 21).
- Allyson Lister
There is quite a range of expression - very little is significant. BUT there's huge disregulation of the genome.
- Allyson Lister
The connectivity map further implicates oxidative stress: the top compounds (positive correlation) relate to oxidation and ion transport.
- Allyson Lister
after making a model, its quality is assessed to decide whether to use it or discard it
- Ruchira S. Datta
usually have several iterations of comparative modeling
- Ruchira S. Datta
will speak about model quality assessment, an important ingredient in this process
- Ruchira S. Datta
below 40% sequence identity, have larger errors in predicted structure, mostly due to sequence alignment error
- Ruchira S. Datta
one way of dealing with this is to produce many alignments and use model quality assessment to assess the errors of the produced models
- Ruchira S. Datta
detect errors due to incorrect template and misalignment, which can occur with template-target identities of 20%, 25%, or 30% id
- Ruchira S. Datta
knowledge-based potentials, mean force potentials, or statistical potentials are scoring functions for model assessment
- Ruchira S. Datta
states are represented using geometrical descriptors
- Ruchira S. Datta
we don't have an unbiased sample of the folded proteins, but use the known native folded proteins
- Ruchira S. Datta
very important to set the parameters of the potentials
- Ruchira S. Datta
may have interactions between amino acids that are close in 3d space
- Ruchira S. Datta
matrix = [a][b][k][r], interactions between C-alpha and C-betas; matrix dimensions are [40][40][10][30]
- Ruchira S. Datta
traditionally derive statistical potential by taking whole pdb, then clustering it and taking representatives, in order to have unbiased sample of fold space
- Ruchira S. Datta
train on features of these to get relative frequencies, derive relative statistical potential
- Ruchira S. Datta
here instead they derive evolutionary or structure-specific potential
- Ruchira S. Datta
but homologous protein sequences contain valuable information
- Ruchira S. Datta
important or key residues for fast folding can be derived using information from known homologs
- Ruchira S. Datta
some key interactions that provide stability can be derived from homologs
- Ruchira S. Datta
PSI-BLAST, then MSA, then comparatively model each sequence using the template
- Ruchira S. Datta
use these structures to derive family-specific statistical potential
- Ruchira S. Datta
Running the Net: Finding and Employing the OPerating Principles of Cellular Systems: the need for scientific standards and cooperation
- Allyson Lister
If you put two attenuators on the same transcript, it behaves about as you expect: it functions as a NOT-OR gate.
- Allyson Lister
single antisense RNA-mediated transcription attenuator: NOT gate
- Allyson Lister
Bacteria engineered as pathogens to target particular human tissue (e.g. tumors). To do that, you have to build many different modules with its own computational and culure unit tests.
- Allyson Lister
Little information about gene regulation so far, eQTL might be able to shed some light on this regulation and drug resistance
- Oliver Hofmann
SNPs might affect gee expression. Consider expression as a quantitative trait like height, weight. Identify the associated locus by statistical methods.
- Oliver Hofmann
Traditional tests between multiple loci, all expression. Comprehensive and without biast, but does not use the inherent data structure, computationally expensive and a problem of statistical power.
- Oliver Hofmann
Alternative approach GeD, Graph-based eQTL decomposition. Include strain data in the association graph
- Oliver Hofmann
Graph structure: Three types of vertices: gene linked to strain linked to locus
- Oliver Hofmann
Each clique has 3 vertices (G/S/L) that are fully connected, in addition each clique is a maximal subgraph that cannot be extended further
- Oliver Hofmann
Heuristic approach on eQTL cliques to look for (Locus,gene) pairs with certain patterns; refer to graph/diagram in paper
- Oliver Hofmann
Cliques help to detect eQTLs, avoiding a large number of tests; integration of strain information provides a new framework for eQTL studies
- Oliver Hofmann
An evolution of this idea: hidden markov models.
- Gabriele Sales
Two states: motif and background states / distributions. You learn parameters of these models from known data.
- Gabriele Sales
The performance of such models has saturated in recent years
- Gabriele Sales
problems with HMM / generative models: may tune to noise, rather than the signal; Seem to have hit peak performance; Difficult to incorporate other sources of information to improve predictions
- Cass Johnston
New approaches try to integrate other sources of evidence: multi-species phylogenetics, distance from TSS and between TFBSs, epigenetic data.
- Gabriele Sales
DISCOVER allows you to integrate multiple sources of evidence into your motif model
- Cass Johnston
They use conditional random fields, a discriminative model.
- Gabriele Sales
The estimation phase reduces to a convex maximization problem.
- Gabriele Sales
What features correlate with TFBSs? Motifs, for example, have high PhastCons scores. Background is linked to GC content.
- Gabriele Sales
and check these features are discriminatory in the context of the model.
- Cass Johnston
Leave-one-out cross validation on Drosophila data
- Gabriele Sales
DISCOVER is 20% better than other algorithms on the F1 score (harmonic mean of precision and recall).
- Gabriele Sales
Precision (TP/(TP+FP))/Recall l(TP/(TP+FN)) curves. DISCOVER balances the Precision/Recall trade-off better than other tools.
- Cass Johnston
HL36: Patrick Bradley - Leveraging the context-specific coordination of transcript and metabolite concentrations to discover gene-metabolite interactions.
In computational neuroscience, the key ideas to be communicated are mathematical and computational models as well as data analysis methods
- Allyson Lister
Very little standardization in many areas of computational neuroscience
- Allyson Lister
Lack of standardization is hindering progress and building on others work
- Diego M. Riaño-Pachón
Abigail Morrison can think of only one model, in all the times she's worked on it, that they've been able to reproduce without going back to the authors
- Allyson Lister
Is it science, or is it travel reporting?
- Allyson Lister
Approach to solve the problem: work together to create tools to facilitate reproducibility
- Diego M. Riaño-Pachón
The Japanese node focuses on the visual side of things, and has produced Visiome, which attempts to collect both papers and figures separately as well as model parameters, simulation scripts and figure-generation scripts. This can all be downloaded, and then hopefully run it on your own system
- Allyson Lister
Also the Simulation Server Platform, which uses a VM to reproduce the environment the original model was run in so others can test and run it
- Allyson Lister
German node' goal is to provide open source tools for data sharing and analysis
- Diego M. Riaño-Pachón
The problem is that there are many different recording devices and analysis tools, and no standardization
- Allyson Lister
German node developing unified data format + associated tools
- Allyson Lister
They also want to design and implement a machine-readable declarative language to describe neural network model (like SBML) - first meeting in March 2009 so still new.
- Allyson Lister
facilitates cross-checking, as it can run on any simulator that implements the common API
- Diego M. Riaño-Pachón
Simulation-code written in PyNN, can use several underlying simulators, instead of using simulator-specific languages
- Diego M. Riaño-Pachón
Gregory Wilson, American Scientist 2006 "Where's the real bottleneck in scientific computing?"
- Allyson Lister
if nobody can understand you ideas and reproduce your results, have you really remove a bottleneck or just made it somebody else"s problem?
- Diego M. Riaño-Pachón
Protein bioinformatics relies heavily on sequence searching.
- Gabriele Sales
Sequence alignments are a special case of profile alignments.
- Gabriele Sales
The score of an alignment can be thought as the log of the ratio of the probability of the mutations needed to go from sequence X to Y over the average probability of Y.
- Gabriele Sales
Context specific substitution matrices used with success, among others, for protein structure prediction (Rice & Eisenberg 1997, Huang & Bystroff 2006).
- Gabriele Sales
Their approch: take 6 neighbours to the left and to the right of a nucleotide.
- Gabriele Sales
The search uses a sliding window over the sequence; the mutation probability is computed by looking at a precalculated library of profiles.
- Gabriele Sales
Profiles are built out of homology relations found with BLAST. The library contains 1 million profiles.
- Gabriele Sales
Performances: CS-BLAST finds ~2 times more homologs than BLAST.
- Gabriele Sales
Extension with context specific pseudocounts -> CSI-BLAST
- Gabriele Sales
Performances increase not as much as in the BLAST case, but still more sensitivity.
- Gabriele Sales
PSI-BLAST 5th iteration is similar to the 2nd of CSI-BLAST.
- Gabriele Sales
Some problems in the E-value calculations. Repeat proteins could cause high-scoring false positives. They modified the calculation and removed the bias.
- Gabriele Sales
Bioinformatics is now driven by information, no longer by algorithms.
- Gabriele Sales
To tame information you need a statistical model.
- Gabriele Sales
A major problem of early development is to perform the change from sphere to elongated tadpole. The important movements are called the dorsal convergence and extension.
- Allyson Lister
hundreds of in-born errors of metabolism (IEM), affecting 1/5000 of babies
- Michael Kuhn
there's a scale of methods from kinetic models to topological analysis, in between: constraint-based modeling
- Michael Kuhn
known method: predict steady-state flux using stoichiometric matrix
- Michael Kuhn
modeling human metabolic network is harder than microbes: can't assume exponential growth, can't control nutrient intake
- Michael Kuhn
goal: predict metabolites whose concentration is altered upon mutation
- Michael Kuhn
find out which metabolites are processed when the enzymes works / does not work. predict changing concentration of chemicals which aren't taken up / secreted
- Michael Kuhn
apply to metab. diseases in OMIM, get 223 metabolites which conc. changes
- Michael Kuhn
most disorders have few biomarkers, about half have a distinct pattern of biomarkers
- Michael Kuhn
benchmark: text-mining on OMIM data (noisy): get moderate correlation w/ predictions
- Michael Kuhn
also manually extract data for amino-acid metablism. enrichment over random: between 6 and 15.8
- Michael Kuhn
won't work if there are other metabolic routes. need stoichiometric info and network topology for correct modelling
- Michael Kuhn
2006 - CAGE tags provide known results to test predictions
- Cass Johnston
Need a gold standard as there is no consensus method at the moment, and different tools all claim to be the best.
- Cass Johnston
They use data from CAGE (clustered into transcription start regions) and RefSeq (5' end as true promoter site) to test predictors
- Cass Johnston
17 programs. All free for academic use and capable of being run on the whole human genome. Ran all programs over various thresholds
- Cass Johnston
Plot precision v recall and calc AUC over threshold range for each of the programs.
- Cass Johnston
Four programs scoring over 20% in prediction: ARTS, EP3, Eponine, ProSOM
- sebi
considered 4 best scoring programs. Found classes of promoters - some with a single peak of tss, some with a couple, a few with many tss all over the region
- Cass Johnston
Many of these programs predict unique promoters the others don't provide -- merge prediction to increase information? Majority vote?
- sebi
Current promoter prediction: 1/3 sites can be predicted and 2/3 predictions are correct.
- Cass Johnston