Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »

Neil Swainston › Likes

Neil Saunders
Has our quest for completeness made things too complicated? - http://nsaunders.wordpress.com/2009...
Neil, great post. And you're right, we do make things too complicated sometimes, but do we do that at the level at which we ask questions, or at the software implementation level? My take is the latter, cause you need to ask questions the way you want to, but that doesn't mean what makes it all come together has to be one complex mess - Deepak Singh
Glad you like it. One of those that bubbled up out of frustration at inability to achieve! I feel that science is the business of turning complex (real-world) things into simple models - and that we've moved away from that idea. - Neil Saunders
I'm a sucker for this kind of ambitious thinking. Go Neil! - Bill Hooker
I think it's a good sign that things like this are now obvious. Things start out as a complex mess of disconnected things, overlapping complicated ways of connecting them are devised, then it becomes obvious what the simpler thing to do is. - Mr. Gunn
Great ! But aren't you re-inventing something like RDF Neil ? feature/probe/value is nothing but a RDF statement... - Pierre Lindenbaum
No, I don't want to reinvent anything. If RDF will work for me, I'll use it. I'll also use SQL, NoSQL, key-value pairs, document-oriented or whatever it takes. I just think that trying to integrate data by combining other peoples large, complex representations is not working. We need to simplify the whole business. - Neil Saunders
I think there is a middle road here - we need high level generic descriptions like what Neil is proposing (and like my "We have stuff, we do stuff to it, which makes stuff"), but also a way of pointing to more sophisticated information that might be useful in specific contexts. I think we can have the best of both worlds as long as the data representation is separated from the metadata and the organization of each can be described in a machine readable (and agreed!) form - Cameron Neylon
I'm too old school, leaving comments on blogs... who does that any more. I’m sure you’re aware that you’ve just described a model using *triples*. Which means you could start storing these kinds of simple relationships in a triple store like virtuoso etc. As you say, you don't have to reinvent anything, just simplify the use (conventions) of existing approaches (e.g. RDF). I would like... more... - Greg Tyrelle
@Greg about the web interface, one cool interface for adding RDF statements/triples is freebase/Acre: http://www.freebase.com/apps/ - Pierre Lindenbaum
I like blog comments :-) Yes, my example looks like RDF triples. No, that was not really my intention. Let's ask these questions: (1) what data relationships would make sense to a biologist? (2) what are the commonalities in the data, which a biologist may not have considered at an abstract level? As I wrote in the post, many datasets that look different are really different ways of looking at the same thing. - Neil Saunders
The joys of data modelling :-) For (1): I'm afraid asking for a definition of some data relationships is building an(other?)) ontology. - Pierre Lindenbaum
Let's put it another way. What we have, presently, are quite complete, often large and complex, but useful and usable descriptions of individual experiment types. "Integration" essentially means "parse them individually and mash-up the results". That's what makes it difficult. Perhaps we need an "ontology of integration" :-) But let's keep it really, really minimal. - Neil Saunders
I actually think you will struggle to find data commonalities across bioscience. Even the simple proposal of target, measurement, value could break down in many cases e.g. we tried ages ago to get some intensity data from a bunch of microarray experiments and we gave up because we couldn't get across what we needed. What are you really measuring? Does it mean the same thing to different... more... - Cameron Neylon
I think there's a good case for storing, in the first instance, raw values. Figure out how to process them later (that's statistics). Focus on trends (up, down, stayed the same). Focus on well-defined variables that do mean the same to everyone (intensity, in theory = amount of transcript, regardless of the very real difficulties). And I think more experiments fall into... more... - Neil Saunders
@Pierre freebase is exactly what I had in mind, however the web client (the best part) is not open. @Neil Store the data first, ask questions later. Nice. One of my hopes for semantic web technology was that is could be come a universal mashup system (RDF+ontologies+triplestores). But you start down that path, and you suddenly realise that the semweb is asking you to get your data... more... - Greg Tyrelle
But for me your example of a gel isn't raw data. The raw data is the image. Which might have several targets or assays on it. Up/down stayed the same is only really of interest in particular types of science. And I challenge you to find any well defined variables :-) Intensity to me is a measure of optical density but questions of background, object size, masking, averaging algorithm... more... - Cameron Neylon from twhirl
But agree with what you and Greg are saying, first thing get the data somewhere, with allt the metadata you can automatically collect. Then worry about capturing more metadata as people do stuff with the data. Writing this grant proposal right at the moment. - Cameron Neylon from twhirl
And in microarrays, "raw" data is the image of the slide. But aside from a cursory inspection to ensure that it isn't complete rubbish, nobody much cares about that. I'd argue that there's a point in the preprocessing at which a numerical value emerges which could be called "useful" and which encapsulates the object being measured. It needs more work (e.g. normalization) to get information from it, but it's the "value" in feature/reporter/value. - Neil Saunders
To me this about finding something a bit like an upper ontology that describes the general category that objects (targets, assay, value, inputs, outputs, data, process, sample) fall into. That lets you do the general integration, and the more detailed local data structures become more useful as you can agree more and more on what details are important. So I absolutely agree with what... more... - Cameron Neylon
Heh heh It was exactly that image that we did care about - which was the problem :-) I will admit to being an edge case, but in some ways we're all edge cases, they're just different edges... - Cameron Neylon
Neil, may I link to this FF thread from Book of Trogool? - D0r0th34
It's in a rough state but - http://dl.dropbox.com/u... - Cameron Neylon
:-) Sure, different questions, different "levels" of data. I guess my angle is more a statistical one: how do I compare (seemingly) quite different datasets - what numbers can I extract and crunch? Less interested in the capture and description of data at every stage in the process. - Neil Saunders
Dorothea, sure, not a problem. - Neil Saunders
"This is a gel", "this is a sample",.... AFAIK all those kinds of statements are part of the OBI ontology: http://obi-ontology.org/page... e.g. "Agarose Gel" http://bioportal.bioontology.org/ajax_co... - Pierre Lindenbaum
Sure, and those are very complete descriptions of experimental components. But what I want is: "I saw A on my gel, B in my LC/MS, C on my expression array and D on my SNP array and when I plug all that into some Bayesian predictor, it says cancer" :-) - Neil Saunders
Ontologies are not the issue, it's more low level than that. I also work with microarrays, proteomics, metabolomics, and numerous physiological data sets. To keep all the data in one place I use a relational database, in this case postgresql because I like to store raw intensity values in array datatypes, along with pylons based web interfaces to display various views of the data to my... more... - Greg Tyrelle
My argument would be that the reason you're less productive is not because of the RDF and ontologies per se, but because the ontologies aren't really built for what we want to do. They're for describing certain types of outcomes, not for integrating data in a discovery phase. But Neil's (entity, probe, value) is still an ontology of sorts. It is just a higher level one. My belief is... more... - Cameron Neylon
But keep the discussion going - this is exactly the problem that e.g the SAGE project will have - http://sagebase.org - and as a notional member of the data working group I could do with all the ideas and help that's out there... - Cameron Neylon
We are thinking too much in terms of data representation here. In the end what you are looking at is a data warehousing problem. You have different front end systems and you want to be able to pull data in for offline processing into a warehouse. That's pretty much what you do at any company doing a lot of analytics/business intelligence. Different types of data being collected in... more... - Deepak Singh
This reminds me of the type of approach we were considering a while back - with a focus on each observable event during an experiment. http://usefulchem.blogspot.com/2008... - Jean-Claude Bradley
Neil, I was under the impression that normalization across arrays and labs wasn't actually a solved problem, yet. Surely that would have to come first before stripping things down to just assay-key-value? - Mr. Gunn
Normalization ... aaargh! Most definitely not a solved problem - Rajarshi Guha
Normalizing within your own experiments is hard enough, never mind across unrelated datasets. It's something we have to solve though, to make the most of public data. - Neil Saunders
Neil, you may be intersted in looking at the Ontology-Based eXtensible Data Model (OBX) that was developed by Richard Scheuermann's group at UT Southwestern. It is being used for the ImmPort database (www.immport.org) The OBX model utilizes the BFO / OBI ontology as guides in creating a data model that is robust to new datatypes. You can see a presentation about it here:... more... - Burke Squires
Thanks Burke. ImmPort looks very impressive, I must say. - Neil Saunders
This reminds me of what the TCGA is starting to do, by defining "data levels". For microarray data, Level 1 might be the raw images, Level 2, the intensity calls, Level 3, the normalized intensities, and Level 4 information on whether it's up or down regulated across multiple samples. For people like me, doing integrative analyses, it's easy to focus just on the higher level data and... more... - Chris Miller
which is exactly why you need separation of the layers and tools to bring data together for the downstream stuff - Deepak Singh from IM
Neil, I think you have just explained why tab-delimited files are often more useful than complex XML representations of the same data ;-) - Lars Juhl Jensen
Tab-delimitted files would be grrrreat for me in my lab. If any of the rest of you would like to share our data, however, then you're completely screwed. Is the problem not that we're all duplicating each other's work by writing the same kind of parsers for the same kind of data? Proteomics (for example) has a standard (http://www.ebi.ac.uk/pride/). Is it really so hard to use / develop the community-based tools that are being generated around this standard?!? - Neil Swainston
Well, the ratio of usable tools to schemas/ontologies is a whole other debate :-) But sure, in principle the tools are there - for individual types of data. What I highlight in the post is the difficulty of genuine data integration, as opposed to the current "write a parser for everything and mash it up" approach. - Neil Saunders
#1 rule of data integration - if a format exists, it will be used - Deepak Singh
...and if it doesn't exist there is a 70% chance someone will create it :-) - Cameron Neylon
Chris M makes an important point wrt data levels, analogous to trace archives vs sequence dbs. Extending the sequence analogy, obsoleting levels will become important (it will rapidly become cheaper to resequence rather than store sequence). - Chris Cotsapas
Björn Brembs
Put all scientists in jail! - http://bjoern.brembs.net/news...
You think 'Open Access' means changing the way we do science? You think "Open Notebook Science" is the ultimate way to do modern science? Let me show you some people who want to *really* change the way we do science. - Björn Brembs
At all costs do not tell this group about ONS, lest they actually start supporting it. The last thing Open Anything needs is to be associated with these double-barreled douchebags. - Bill Hooker
oh boy - well that kind of writing is automatically self-destructing anyway - Jean-Claude Bradley
I already know one person who thinks we need to get behind these guys and 'clean up' science. I told him exactly what Bill wrote - we want to change the way science is done, but not into what they want! - Björn Brembs
In case some people missed it this is from the Discovery Institute - Jean-Claude Bradley
How sad. - Andrew Lang
Andrew Spong
Let the internet replace scholarly journals http://www.theaustralian.com.au/higher-... (The Australian)
Nice quote: "Academic publishing has a bizarre business model. Academics and scientists at government-funded universities and institutions carry out research and write papers about it. These papers are then reviewed by other state-funded scientists and handed to the editors of academic journals, who also happen to be on the government payroll. Funding for all the academics involved in... more... - Björn Brembs
Nice quote: "Academic publishing has a bizarre business model." Think that pretty much encapsulates it. - Neil Swainston
Hey Elsevier, you listening? Mene mene tekel, assholes... - Bill Hooker
Egon Willighagen
dereferencable identifier - Egon Willighagen
Nice. Works for UniProt and ChEBI, too. Any chance of a web service equivalent?!? - Neil Swainston
Yes, very likely... it's going to be integrated with sadiframework.org - Egon Willighagen
Duncan Hull
Annotation and merging of SBML models with semanticSBML - http://www.citeulike.org/user...
Summary: SBML is the leading exchange format for mathematical models in Systems Biology. Semantic annotations connect model elements with external knowledge via unique database identifiers and ontology terms, enabling software to check and process models by their biochemical meaning. Such information is essential for model merging, one of the key steps toward the construction of large kinetic models. The tool semanticSBML helps users to check and edit MIRIAM annotations and SBO terms in SBML models. Using a large collection of biochemical names and database identifiers, it supports modellers in finding the right annotations and in merging existing models. Initially, an element matching is derived from the MIRIAM annotations and conflicting element attributes are categorised and highlighted. Conflicts can be resolved automatically or manually, allowing the user to control the merging process in detail. Availability: SemanticSBML is free software written in Python and released under the... more... - Duncan Hull
Wladimir Labeikovsky
Tailor-Made Mass Spec :The Scientist [2009-11-01] - http://www.the-scientist.com/article...
Like the look of Decision Tree. Is it the hardware that needs improving or the software? - Neil Swainston
Mike Chelen
PathVisio / WikiPathways tool for creating and analysing biological pathway diagrams - http://www.pathvisio.org/
PathVisio / WikiPathways tool for creating and analysing biological pathway diagrams
I'd really like to see that refactored as a collaborative Google Wave gadget. - Dan Hagon
Me too, re: Google Wave gadget. Then add SBML support, use SBGN in the display, support MIRIAM annotations and I can retire penniless. - Neil Swainston
@Neil, there are a few other tools which support SBML and SBGN (see http://sbgn.org/Communi...). Wikipathways seem to be inventing yet another pathway format and dont provide a conversion to any other existing "standard". Shame as it would benefit everyone if they did. - Frank
Thinking about it a little more, I'd really like to see the above refactored as a collaborative Google Wave Gadget. I've been involved in about five network reconstruction "jamborees" now, which involve flying loads of people around the World to sit in a room and discuss things that they could do with PathVisio (if it supported SBML...) or Payao. Anyways, this costs a fortune (see the... more... - Neil Swainston
@Neil: WikiPathways is intended exactly for that type of collaborative pathway creation. WikiPathways pathway format is based on, and developed in cooperation with, http://www.genmapp.org. So admittedly it's not a widely supported standard, but at least it wasn't a complete new invention. SBML / SBGN support is on its way. Re Google Wave: unfortunately, all this work predated Google Wave by several years... - Martijn van Iersel
Mike Chelen
Jan Aerts
Cameron, Mendeley and Google Wave on BBC News http://news.bbc.co.uk/2...
Wot no http://www.citeulike.org ? Which one is betamax, which one VHS? http://en.wikipedia.org/wiki... - Duncan Hull
because they do not have a Pr machine ... - Paulo Nuin
Cool article - looks really similar to the Channel 4 one though! - Euan
Lars Juhl Jensen
Michael Barton
Is there a best practice for microbial genome annotation?
Nope. There are a variety of pipelines that perform similar tasks. Good starting point might be IMG documentation - http://img.jgi.doe.gov/w.... - Neil Saunders
Worth remembering that there is very little "best practice" in any bioinformatics. For a long time, we made it up as we went along. It's only this new generation of bioinformaticians that have any formal software engineering education and bandy around fancy terms like "best practice" to make us feel bad ;-) - Neil Saunders
I think its more like the Perl culture "There is more than one way to do it !!" Best practices in bioinformatics is currently in an ad-hoc state of practice.Just like Damian Conways's Perl Best Practices is one of the best guide for good coding practices for Perl - hope we will also have a book on "Best Practices in Bioinformatics" soon, may be by a group of authors from LifeScientists room - what say ? - Khader Shameer
@Khader thats why we need flexible guidelines and not the constrained best practice. Several minimal guidelines have been already worked out for the different aspects of the life science domain. MIBBI (http://www.mibbi.org/index...) can be a good starting point in this case. - Abhishek Tiwari
I think very often in bioinformatics, TIMTOWTDI. It's not like software development, with a "task" and an "optimal solution". What I think matters most is that however you do it, it's documented and repeatable. - Neil Saunders
I completely agree with you Neil, but some efforts towards developing well defined, documented workflows / protocols (can we call this as "Best Practices") to perform generic tasks (eg. annotation) will be useful for the community. I think several 'standards' (eg. MIRIAM/MIBBI) are developed to bring in a common frame work for routine tasks. I believe TLS is an ideal place to get a consensus about such practices and work on a wikibook of best practices in bioinformatics. - Khader Shameer
And I agree with you. I'm all for standards and best practice. I'm also a realist and a practical bioinformatician :-) - Neil Saunders
@Abishek : Best practices are not always "constrained", and constrained practices are impossible due to complexity of biological system - flexibility should be there. But my point is that even if MIBBI / other standards (http://www.mibbi.org/index...) are available for a long time - I've never seen them in research papers - is it due to poor visibility of such projects or no interest in promoting such initiative ? - Khader Shameer
Khader, that's a good question. There seems to be a disconnect between standards developers and the people who should be using the standards. I think it's a publishing problem. Developers publish in computational journals and use computational jargon; users don't read those journals or understand the jargon. - Neil Saunders
Khader, In my opinion the main motive of guidelines is to avoid the disagreement while best practices try to bring an agreement in community. Also, people are using these guidelines. Its just lack of awareness otherwise more and more people will adopt them. Take any Biomodels database model or CellML repository model, they are well annotated according to MIRIAM guidelines. Allyson... more... - Abhishek Tiwari
I find the line "it's not like software development" to pretty much sum up some of the problems in bioinformatics. Why isn't it?!? - Neil Swainston
It's complicated :-) In part, it's because researchers are more interested in quick answers (= quick fixes) than good code. In part because it's only in recent times that bioinformaticians receive formal software training. In part, because biological problems are more complex than input -> process -> output and you don't always know exactly what you want to achieve when you start. And I guess, biological information has a lot of "context", not easily captured by simple routines. - Neil Saunders
Hi Neil. Yep, all that you say is true. Just from a personal perspective, I've found that being "disciplined" in writing code (making nice, clean, interfaces to modules, unit testing, documenting) means that in the middle-to-long-run, quick answers are easier to come by. By building up a reasonably reliable library of classes (I'm a Java-geek), sticking the bits of Lego together is... more... - Neil Swainston
Neil, I absolutely agree. It took me some time to get to the point of trying to "do things right" from the outset - libraries, documentation etc. and I'm glad I got there. I think a lot of the problems stem from how academic research is conducted. "Can you just give me a table by tomorrow?" "Sure, let me write a library." "No, I just want a table." Hack together perl script, deliver table, discard, move on. Rinse and repeat, until contract expires. Leave mess behind. - Neil Saunders
Couldn't put it better myself! I guess I'm lucky in so far as that I do have the luxury of longer timescales... until my contract expires. - Neil Swainston
Thanks Abishek for the pointers to application of different standards. My point is the goal of both best practices and standards are the same - getting a consensus to do repetitive experiments / workflows. But as Neil's are discussing - the choice of individual bioinformatics projects is mainly to get a good fix, rather than an excellent code base. But hope some degree of consensus can be obtained if people can follow standards as a first step. - Khader Shameer
Science isn't set up to reward coding standards. Funding agencies reward quick biological results, not infrastructure and software development. I'd argue that for every 5 biological grants, the NIH should be funding one software/database/computational infrastructure grant. The amount of data is only getting bigger. - Chris Miller
I'd agree with that, Chris. Career wise, it's pretty much immaterial whether I churn out a hack or something "good" and reusable. It's quite annoying. Grrrr!! - Neil Swainston
@Michael / Neil : I am agreeing with "Science isn't set up to reward coding standards", but as a subject in the interface of science and technology - it is high time that bioinformatics should embrace the standards. For Michael's question I was trying to make a point that if there is a standard/best practice/generic protocol for microbial genome annotation - he could have just followed... more... - Khader Shameer
I think genome annotation is an excellent example of how bioinformatics is not like software development. You don't just run a program and annotate a genome. There are lots of biological features: protein-coding genes, non-protein coding genes, motifs - all with their own associated metadata, all with various, disparate tools written specifically for each type of feature. Annotation is... more... - Neil Saunders
too right Neil. is there a best practice for violin-making, vision quests, or coming-of-age experiences? ;) - Ian Holmes
:-) Exactly. The end result is what matters. - Neil Saunders
srsly tho -- there are plenty of papers describing microbial genome annotation. it's still an open research area, but there are commonalities (repeats, transposons, genes, typical errors, ...) so I guess the rough union of those vague concepts would constitute the current best practice. not exactly a recipe... - Ian Holmes
:D best practice for violin-making, vision quests, or coming-of-age experiences :D - Neil, in the current era of bioinformatics with Webservices and Work-flows - having an SOP/BP is always help you to kick start the work in minimal time rather than going through all genome project paper for the flowcharts for annotations. - Khader Shameer
@ Ian : OK, finally that's something that Michael/any one interested in annotation to get from this thread. - Khader Shameer
Khader, what we're saying is that in this case, there isn't an SOP/BP, because it just isn't that kind of procedure. But there is, as Ian says, plenty of advice available. I guess, in terms that CS people might understand, it's not agile. You actually have to put some work into understanding what's going on and what you want to do. - Neil Saunders
@Neil - ^(chicken|egg)? - It could and should be that kind of procedure though. All the advice in the world isn't going to help the people that actually *use* your annotations. The current 'system' for annotating anything is so mindlessly broken I'm surprised it works at all. Now all it needs is a catchy name. Blight of Bioinformatics maybe? - Paul J. Davis
Thanks for the comments everyone. I'm going to read as many genome papers as possible and try and put what I read together. - Michael Barton
Just remembered this article: http://www.nature.com/nbt... whic is a good look at current annotation practices. I also finally found http://www.ncbi.nlm.nih.gov/genomes... which describe's actual paramters that NCBI uses for gene prediction. - Paul J. Davis
Neil Saunders, I agree a lot of advice is available and it is definitely helpful. For example, I was not aware of something like MIARE (thanks to Abishek), am now implementing in our RNAi screen. But I can't agree with you if you define bioinformatics projects as non-agile. From a simple BLAST based sequence analysis to large scale data analysis is following agile approach. Think of n... more... - Khader Shameer
Thanks Paul,for the links to the articles. - Khader Shameer
Khader, your very use of the word "agile" sums up what this is all about. Clearly you are "new school" bioinformatics and appreciate software development. "Old school" bioinformatics would never even use the word :-) As I keep saying, I don't disagree with anyone here who calls for better practices, standards or "agility". Just be aware that there are still plenty of old-timers around for whom bioinformatics means "hack together something that works." - Neil Saunders
Here's a paper that describes how microbes are annotated in Swiss-Prot: http://dx.doi.org/10... - Eric Jain
Neil : Just loved the definition "hack together something that works" :) - Khader Shameer
Ruchira S. Datta
ICSB 2009 Keynote: Bernhard Palsson: Microbial Systems Biology: From genome annotation to in silico models
Will be interesting to compare with last year's keynote at ISMB http://ff.im/7p2fL - Ruchira S. Datta
Will talk about: 1. the grand scheme of things: towards a mechanistic genotype-phenotype relationship. - Ruchira S. Datta
2. Network reconstruction: a common denominator. - Ruchira S. Datta
3. Biological research in the area of systems biology. 4 key application areas, from microbes to human. - Ruchira S. Datta
4. The paradigm, and emerging fundamentals. - Ruchira S. Datta
Mendel established discrete genes => genotype/phenotype. - Ruchira S. Datta
Pauling: single mutation of beta-globin resulted in sickle-cell anemia. - Ruchira S. Datta
Craig Venter: first whole genome sequence. - Ruchira S. Datta
1997-2000: Palsson joined UCSD, lab members Christophe Schilling and Jeremy Edwards started making metabolic networks for E. coli, H. influenzae, H. pylori. - Ruchira S. Datta
Iman Family & Joachim Foster, 2001-2003: Eukaryotes. - Ruchira S. Datta
Many lab members: RECON 1: first human metabolic reconstruction (2007). - Ruchira S. Datta
for metabolic networks we know have genotype/phenotype maps - Pedro Beltrao
Network reconstruction is a BiGG knowledge base, encoding chemistry into a mathematical format. Birth of genome-scale (metabolic) systems biology. Now have mechanistic basis for the genotype-phenotype relationship at genome scale. - Ruchira S. Datta
how do we build metabolic networks ? - Pedro Beltrao
Unlike physics 100 years ago, need to account for dual causation (history vs function) - Ruchira S. Datta
2. Network reconstructiong: building the genotype-phenotype relationship. - Ruchira S. Datta
Check metastructure of genomes. M matrix (stoichiometric), E (expression) matrix, and O (operon) matrix, then integrate them. - Ruchira S. Datta
"Meta-structure" of E. coli genome: higher level than operon structure. - Ruchira S. Datta
Want to know where genes are, transcription factors, transcription start sites, promoters, etc. - Ruchira S. Datta
Collate 4 'omics datatypes on a genome scale. - Ruchira S. Datta
Have high-quality ChIP-chip data. 250bp resolution peaks. Have expression profiling using tiled arrays. Solexa for first ~30bp of transcripts. 1bp resolution. Proteomics by mass-spec. - Ruchira S. Datta
the genome, chip chip data , transcription data, proteomics - Pedro Beltrao
Thus have multiple genome-scale measurements of different kinds along the whole genome. - Ruchira S. Datta
doing this for several bacterial ... done in e.coli - Pedro Beltrao
Found >100 new transcripts. Some of them are quite small, candidate small RNAs. - Ruchira S. Datta
35% of operons have multiple start sites with multiple active in given condition - Pedro Beltrao
Integrate the four kinds of data to characterize different aspects of a module. - Ruchira S. Datta
Compare these with current annotations. - Ruchira S. Datta
looks like the aim will be to go for a full model of e.coli - Pedro Beltrao
M matrix is nice as it's just binary, so no errors in the matrix. - Ruchira S. Datta
4-step process for metabolism: 1. Draft reconstruction, 2. Curated reconstruction, 3. Genome-scale metabolic model, 4. Validation and iterative improvement. Then ready for use as platform for design and discovery. - Ruchira S. Datta
60-step process will be detailed in Nature Protocols. - Ruchira S. Datta
recent review in metabolic network reconstruction http://www.nature.com/nrmicro... - Pedro Beltrao
Have exponential growth in available reconstructions and their uses. - Ruchira S. Datta
by 2008 around 90 apps based on E.Coli metabolic reconstruction - Attila Csordas
Had reconstruction jamborees. Conceived in 2006. Yeast metabolism: Nature Biotech Nov 2008; salmonella has just been submitted, human Recon 1.1 in June 2009, Recon 2.0 in March 2010, will have Staph. and TB. - Ruchira S. Datta
the metabolic reconstruction is well established ... what about the transcriptional network - Pedro Beltrao
4 step matrix for the E matrix as well - Ruchira S. Datta
Lots of ChIP-chip data shows role of metastructure. - Ruchira S. Datta
4-step process for O matrix is still under development. - Ruchira S. Datta
different transcriptional units depending on condition - Pedro Beltrao
Merge into ME matrix: Metabolism and Expression. In particular, will contain all antibiotic targets. - Ruchira S. Datta
Putting it all together. See Feist et al, Nature Rev Micro 02/2009 - Ruchira S. Datta
Marcus Cobert has done this for Mycoplasma genitalia, and has simulator which he will present at ICSB. - Ruchira S. Datta
Inclusion of protein structure: Thermotoga maritima. Forthcoming in Science (2009). Put together in integrated reconstruction. Can see how folds travel through the pathways and infer how they might have come up through gene duplication. - Ruchira S. Datta
upcoming paper in science thermotoga maritima structures for many proteins ( aprox 100) that were studied in the context of the metabolic network - Pedro Beltrao
If drug has off-target binding site (=> side effect), one of the few ways to analyze that is with reconstruction: perturb in two places at once. - Ruchira S. Datta
Toll-like Receptors in human macrophages has appeared. Large reconstruction w/ ~950 reactions. These reconstructions can be merged to make scalable models. - Ruchira S. Datta
i really like this connection between structure function at a large network level - Pedro Beltrao
3. Biological sciences in the era of systems biology. - Ruchira S. Datta
Map omics data to get context. - Ruchira S. Datta
Can replace expression profile with computed metabolic functions to analyze drug response phenotypes. - Ruchira S. Datta
studying drug candidates using these network models and gene expression changes - Pedro Beltrao
Lots of metabologmics data, e.g., Rabinowitz at Princeton. Use with multi-scale kinetic models. - Ruchira S. Datta
Exo-metabolomics: easy to measure concentrations of metabolites *outside* of cell. If genetic defect leads to difference in secreted metabolite, have candidate biomarker. - Ruchira S. Datta
Gap-filling: systematically finding missing parts. PNAS 103:17480 2006 Use model to find positive growth environments not explained by the model. Use model to hypothesize what is missing. Use bioinformatics to find it. - Ruchira S. Datta
Understanding complex biological processes, e.g., adaptive laboratory evolution to optimal phenotypes. E.g., Nature 420(6912) (2002), and Nat Genet 36(10) (2004). Now it's possible to resequence the genomes and find all the mutations that appeared during the adaptive evolution. We can infer their causality using allelic replacement. - Ruchira S. Datta
Unexpected: mutations to global regulators and RNAP are more frequent than mutations to specific transcription factors. - Ruchira S. Datta
Mutations in RNAP are in-frame deletions in the jaw region of the enzyme where it binds to the DNA. Studied w/ R. Landick, U. Wisconsin: leaves initiation site much faster. Adaptive mutations in jaw region have consistent effect on its kinetics. - Ruchira S. Datta
Systems Level Metabolic Engineering: Current Opinions in Biotech, 2008, 19:454-460. From random mutations to targeted mutations, to system-level. - Ruchira S. Datta
Review: Feist and Palsson Nature Biotech 2008. Another one forthcoming in Molec. Sys. Biol. 2009, which includes work on communities. - Ruchira S. Datta
4. The Systems Biology Paradigm. - Ruchira S. Datta
The uses of reconstructions are many, and were hard to predict 10 years ago. - Ruchira S. Datta
Systems biology paradigm: 1. Database 2. Knowledge Base 3. Mathematical Model 4. Validation and Use - Ruchira S. Datta
some publicity to many of their computational tools - Pedro Beltrao
In silico analysis: constraint-based reconstruction and analysis (COBRA) has been successful. http://systemsbiology.ucsd.edu/downloa.... Alternatively KAIST: MetaFluxNet. - Ruchira S. Datta
John Doyle, Caltech, ICSB 2005, was flabbergasted by how informative the stoichiometric matrix is - Ruchira S. Datta
4 fundamental subspaces: Row space: thermodynamics; Null space: steady state. - Ruchira S. Datta
Have emerging axioms. - Ruchira S. Datta
Co-founder of Genomatica - Ruchira S. Datta
the small scale modelling/design principles people must be dying by the end of this talk ;) - Pedro Beltrao
Q: Axiom of mass conservation in cell? The cell grows? A: There is mass balance. - Ruchira S. Datta
Q: What about non-model organisms? 99% of microbial universe is non-culturable. What would you do given only a genome? A: There are very few model organisms known in great detail and we tend to extrapolate from them. Metabolic maps for e.g. Geobacter showed that even though the organism was poorly known, the reconstruction was surprisingly useful. - Ruchira S. Datta
an interesting question about how applicable is network reconstruction to novel bacteria that cannot be grown in the lab - Pedro Beltrao
One of the unknown frontiers is how to deal with communities. Now we can sequence the entire metagenome, and culture some of the cells. We can subtract those genomes from the metagenome and make a synthetic community. There's a lot of talk at the federal level in this country on how to deal with communities effectively. - Ruchira S. Datta
Q: How to enhance production of specific compounds? A: Metabolic reconstruction allows computation of good genetic manipulations. In many cases these are not intuitive ahead of time. Can predict what genes to introduce and delete for a desired phenotype, and predict whether it is evolutionarily stable and its growth. - Ruchira S. Datta
Q: Have mostly been talking at single cell level; how about tissues and organs? A: Human reconstruction has been tailored based on tissue-specific profiles, though not curated yet. So should be able to get organelle-specific and cell-specific models. No model yet has >1 cell type. Unpublished model of liver also models adipose tissue and muscle. - Ruchira S. Datta
Q: How to assess completeness and correctness of reconstructed network? A: E. coli has about 4400 genes in the genomes, but 1200 of those genes are in the metabolic reconstruction. 95% have biochemical data associated, so high-quality reconstruction. Assess completeness using gap analysis. Conversely, look at metabolomic data: metabolites in reconstruction? 20-30 metabolites were absent... more... - Ruchira S. Datta
Thinks 90% of metabolic functions of E. coli are in the reconstruction. - Ruchira S. Datta
Abhishek Tiwari
Is blogging bad for my academic career? « Myrmecos Blog - http://myrmecos.wordpress.com/2009...
Is blogging bad for my academic career? « Myrmecos Blog
If you remain professional in tone and content blogging can really only help your career I think - Jean-Claude Bradley
Jean-Claude: fully agreed. "Is public in speaking bad for your academic career?" Yes, if you talk nonsense, it sure is! - Egon Willighagen
... which reminds me... reading a bad paper makes me think twice about reading other papers of that same author... - Egon Willighagen
@Egon and @Jean-Claude +1 . From the above blog post (and I am in agreement there, too): "I’m the only person keeping me from looking like an idiot in public....So, mindful of the risk that I’m only broadcasting my own shortcomings, I have no immediate plans to change course." - Allyson Lister
Ruchira S. Datta
Birds of a Feather session: Semantic Web-Linked Data, organized by Eric Neumann, in T5
linked data is: a simple set of 4 guidelines for publishing RDF data on the Web (over HTTP), developed by Tim Berners-Lee in 2006 - Ruchira S. Datta
1. Use URIs as names for things (globally unique identity). 2. Use HTTP URIs (everyone has a web browser/client) 3. When someone looks up a URI, provide useful information...in the form of RDF data. 4. Include links to other URIs (foster discovery of additional information). - Ruchira S. Datta
Context-independent identifiers (URIs) would make things so much more useful and interoperable - like Lego pieces. - Ruchira S. Datta
Some want to get the semantics exactly right and use formal logic and OWL, but here we're emphasizing just the linkability of things. - Ruchira S. Datta
A URI can only refer to one thing, but one thing can have several URIs, unfortunately. - Ruchira S. Datta
several years ago, tried to bridge use LSIDs (life science ids): thing:something:something:identifier. But this can only be recognized by some particular software, not a web browser. Strong influence from W3C to use HTTP URIs, per the law of least power: do what requires the least technology. Even Mark Wilkerson who was touting LSIDs has come around to HTTP URIs. - Ruchira S. Datta
A commenter says LSIDs still exist, it's just that they can extract them from HTTP URIs. - Ruchira S. Datta
There are other proposals, e.g., shared names; Neumann prefers even less constraint than shared names. - Ruchira S. Datta
So, now if you put in a URI you get something back. You should be able to get RDF back. UniProt does this: if you put .rdf on the end of the URI, you'll get the data back as RDF. - Ruchira S. Datta
Now colleagues can just use the URIs in order to reuse the data; don't need to copy the data. - Ruchira S. Datta
someone says the UniProt accession is an identifier, whereas the URI is a way to get at the thing through the web - Ruchira S. Datta
identifiers can overlap, but HTTP URIs put things in unique namespaces - Ruchira S. Datta
it needs to be stable: when you put this out, you're establishing a contract with the community that it's going to change - Ruchira S. Datta
currently, the url, e.g., http://www.uniprot.org/uniprot... is also the URI. the second part is the identifier of the record and the part before the slash is the namespace. At http://purl.bioontology.org, we separate the namespace and the url. So going there we have a PO box that can eternally forward it. - Ruchira S. Datta
Problem: this assumes http://purl.bioontology.org may go away. Thus the Banff Manifesto. - Ruchira S. Datta
Reduce the likelihood of catastrophic failure by consolidating it into an institution, e.g., Stanford University, with longevity. - Ruchira S. Datta
This just pushes the problem onto purl.bioontology.org. - Ruchira S. Datta
The domain name can be transferred, so why do we need purl.bioontology.org? - Ruchira S. Datta
Transferring a zillion domain names is a pain, transferring one domain name is easy. The institution commits to maintaining that domain. - Ruchira S. Datta
It should be not just the institution, but the community--the community will continue to live on. - Ruchira S. Datta
Knowledge should be monotonic: it grows and doesn't disappear. Even if a particular effort dries up, the URIs should still be valid so we can still see what was there. - Ruchira S. Datta
The Linking Open Data Project: A community project started within the W3C Semantic Web Education & Outreach group in 2007 - Ruchira S. Datta
The LOD (Linking Of Data) "cloud", May 2007: many projects with various links between them, e.g., MusicBrainz, FOAF, DBpedia, etc. - Ruchira S. Datta
By March 2008, had tripled - Ruchira S. Datta
you can put any kind of data up and make it available to Sparkle queries - Ruchira S. Datta
By September, WordNet and various other dbs had come in - Ruchira S. Datta
March 2009: life sciences comes in, with Bio2RDF - Ruchira S. Datta
now you can find the data that is in NCBI and UniProt in RDF format, but not the experimental data yet - Ruchira S. Datta
to make this useful for interesting research, will need URIs, and to figure out what are the rules that are important for life sciences - Ruchira S. Datta
when you publish using this data, how is your data that builds on top of it going to be able that's linked from it? - Ruchira S. Datta
we don't really have this concept in life sciences yet, people don't know about it - Ruchira S. Datta
suppose one looks for a concept in the LOD cloud, like "heart"; how do we know which thing to query? BioOntology, DBpedia, etc? - Ruchira S. Datta
one can't do the Google on it yet - Ruchira S. Datta
bioontology guy hates Google analogy; Google gives millions of hits, but we want the contextual query - Ruchira S. Datta
i protest, have to have indexing before ranking - Ruchira S. Datta
this doesn't solve the problem of redundancy: we want the facts about a protein, regardless of their source - Ruchira S. Datta
bioontology guy says you don't need the index to answer the query, just to answer it fast - Ruchira S. Datta
someone else says, need the indexing in order to do the clustering - Ruchira S. Datta
she says you need semantic web overlays. we need hierarchical indexing environment in order to do this at scale - Ruchira S. Datta
one needs to be able to query on an abstraction - Ruchira S. Datta
bioontology guy: what's more important, query or browsing? - Ruchira S. Datta
someone else: even browsing, if something is 3 links away, may not even go there - Ruchira S. Datta
bio2rdf guy says: first we ask everyone simultaneously: do you know about this? then we ask what do you know about it? we have implemented Shared Names. But the URI just goes to the original record. Many people have said many things about the same entity. - Ruchira S. Datta
nobody wants to have to read all the papers in MedLine. the punchline is the links: how does this protein relate to others. if we don't trust a link, *then* we want to drill down - Ruchira S. Datta
if there are 5 million sources of "A is related to B", we don't want to read all of them, we just want to know that there are 5 million of them. we also want to know the kind of evidence, e.g., particular kind of experiment. Then the user can decide whether to trust it. - Ruchira S. Datta
At this conference, enormous number of people mining data. We should be able to see their results as easily as the original sources. - Ruchira S. Datta
Great coverage! Thanx! It's like being there... would have loved to sneak in on this BoF... - Egon Willighagen
Egon: feel free to wander in... - Ruchira S. Datta
we want just the local subnetwork, not the text. we want the facts - Ruchira S. Datta
people want question answering - Ruchira S. Datta
Bing bought Powerset for this purpose - Ruchira S. Datta
someone says this turned out to be crap, e.g., "Psoriasis causes arms" - Ruchira S. Datta
the question can be a small subgraph, not necessarily an English sentence - Ruchira S. Datta
when we want information about a protein, there are only a limited number of kinds of things we can be interested in, so the software can guide the query context-sensitively - Ruchira S. Datta
we need to distinguish the problem of document retrieval from query formulation - Ruchira S. Datta
we shouldn't just think of scientists, but also other kinds of users - Ruchira S. Datta
this will all be possible, but many people are currently just reinventing RDF over and over again - Ruchira S. Datta
how many here are producers of RDF? roughly 8 - Ruchira S. Datta
put the things that we create in RDF, e.g., if you make the intersection of this fact with this paper, you are in charge of minting that URI - Ruchira S. Datta
if everyone does this, then this facilitates cross-references and exchanges - Ruchira S. Datta
Nophar Geifman has been working with Eytan Ruppin on finding cliques in GO around different diseases. A thing like that should have an URI, so other people can use it. - Ruchira S. Datta
can we do micro-experiments so by ISMB next year, we can prove this concept - Ruchira S. Datta
paradigm shift between hypothesis-driven query versus, the data throws the hypothesis at you - Ruchira S. Datta
what's important is the use case: what is the question you can answer that would make them go "wow"? - Ruchira S. Datta
David Hune is working with FreeBase and has developed Parallax, a facet browser - Ruchira S. Datta
he also developed Exhibit - Ruchira S. Datta
faceted browsing makes more sense to biologists - Ruchira S. Datta
look at Google's Wonder Wheel - Ruchira S. Datta
Jamie Gonagell (sp??!) at SciFoo camp designs games, first slide was World of Warcraft: if you harnessed the collective brainpower that youngsters spend on WoW every day, you could rewrite Wikipedia every day! - Ruchira S. Datta
we need to figure out how to pull people in - Ruchira S. Datta
I mentioned during the session, but forgot to link here (hard to talk and type at the same time!), Marti Hearst's new book _Search User Interfaces_ http://searchuserinterfaces.com - Ruchira S. Datta
How to pull people in - that's the challenge to get this working. Could we get some seed money into this scientific effort and have it distributed with mechanisms similar to Google Adds? Yes if so we could have students and scientists putting efforts into this rather than some obscure webservers, blog etc with Google Adds. But how to generate the seed money? Government grants, donations or pay for usage? - Bo Servenius
Egon Willighagen
MetFlow: Taverna workflows in a web browser (for metabolomics) - http://msbi.ipb-halle.de/MetFlow...
Cool to see a Taverna workflow run in a browser window! Cheers to the Halle team! - Egon Willighagen
Meeting at #taverna on irc.freenode.net today at 16:30 CEST. - Egon Willighagen
Duncan Hull
Daily Mail-o-matic | qwghlm.co.uk - http://www.qwghlm.co.uk/toys...
A new Daily Mail headline every time you click the button. Now updated to include 2009 bogeymen! I was going to give the generator a sophisticated grammar for more varied sentences, until I realised the Mail’s grammar is nearly always the same. If you like this, you might also like the The (New) Daily Mail Oncological Ontology Project - tracking Daily Mail’s classification of inanimate objects into two types: those that cause cancer, and those that cure it. - Duncan Hull
NICE ONE, Duncan. After 6 spin's my pers. fav was:- "WILL DUMBING-DOWN KILL CLIFF RICHARD?" - Graham Steel
"HAS THE METRIC SYSTEM GIVEN HOMEOWNERS CANCER?" Scarily realistic, these. - Matthew Todd
Brilliant. "Could Gordon Brown turn the countryside gay?" You be the judge. - Neil Saunders
WILL TEACHERS GIVE BRITAIN'S FARMERS DIABETES? - Duncan Hull
Double-plus like. Great Friday afternoon site. - Frank Norman
HAS GORDON BROWN GIVEN CLIFF RICHARD SWINE FLU? (should have noticed this website to brighten up my mood when discussing unethical marketing...) - Egon Willighagen
"Has the MMR jab impregnated your pension" ... lol - Deepak Singh
"Is Jacqui Smith Burgling You?"...hang on...that's not really very funny is it...? - Cameron Neylon
"WILL THE E.U. STEAL THE IDENTITY OF THE CHURCH?" yikes! thanks :) - Allyson Lister
The Daily Mail Ontology project is pretty good too http://dailymailoncology.tumblr.com/ they've classified everything in the world into just two classes, things that cause cancer and things that don't :) - Duncan Hull
Duncan... yes, excellent suggestion! We should RDF-ify that for the LODD effort! Try aspirin! A wealth of information! :) - Egon Willighagen
Oh, and don't get tempted to install that GreaseMonkey script on your colleagues machines :) - Egon Willighagen
The utterly absurd ones are my favourites. "WILL LESBIANS HAVE SEX WITH HOUSE PRICES?" - Neil Swainston
Duncan Hull
YMCA: Just a little bit of GTCA! - http://duncan.hull.name/2009...
I wonder how many squillionaires in the marketing department had to put in endless nightshifts to dream this up? - Neil Swainston
@Neil some people have all the fun... - Duncan Hull
Jonathan Eisen
Yet another reason for more openness in science - Cancer Patients Challenge the Patenting of a Gene - NYTimes - http://www.nytimes.com/2009...
Duncan Hull
German comedy ambassador Henning Wehn on the european sense of humour (youtube.com) - http://www.cs.man.ac.uk/~hulld...
Roll on Sunday. Ausgezeichnet! - Neil Swainston
quote "British always say we Germans don't have a sense of humour. I don't find that funny." --Henning Wehn - Duncan Hull
@Neil Ja! - Duncan Hull
Dave Munger
Today's infographic: Historical distribution of songs with word "boogie" http://wordmunger.com/?p=1134
Pierre Lindenbaum
100 Publications Every Graduate Student Should Read - http://compgen.blogspot.com/2009...
I have been wanting for several years now to make a list of 100 important publications that every one of my graduate students should read before they graduate. I will make this list here and will slowly add to it and edit it over the coming months. I will attempt to organize these by discipline. Please email me your suggestions or comments. - Pierre Lindenbaum
For books, Worldcat.org can give a list of editions and what libraries have them: http://www.worldcat.org/oclc... - John Dupuis
My favorite was already mentioned in the comments of the blog post: "Why Most Published Research Findings Are False", John P. A. Ioannidis, PLoS Medicin, http://is.gd/xyKV - Konrad Förstner
Abhishek Tiwari
Present and future of proteomics data curation at the PRIDE database - http://dx.doi.org/10...
Present and future of proteomics data curation at the PRIDE database
Duncan Hull
Michel Dumontier on Representing Biochemistry - http://duncan.hull.name/2009...
Seminar title: Increasingly Accurate Representation of Biochemistry, Speaker: Michel Dumontier, dumontierlab.com, Abstract: Biochemical ontologies aim to capture and represent biochemical entities and the relations that exist between them in an accurate manner. A fundamental starting point is biochemical identity, but our current approach for generating identifiers is haphazard and consequently integrating data is error-prone. I will discuss plausible structure-based strategies for biochemical identity whether it be at molecular level or some part thereof (e.g. residues, collection of residues, atoms, collection of atoms, functional groups) such that identifiers may be generated in an automatic and curator/database independent manner. With structure-based identifiers in hand, we will be in a position to more accurately capture context-specific biochemical knowledge, such as how a set of residues in a binding site are involved in a chemical reaction including the fact that a key... more... - Duncan Hull
Allyson Lister
Informal Knowledge Sharing in Science via Social Networking - http://themindwobbles.wordpress.com/2009...
This is a cross-posted item available both from this, my home blog, and http://biosharing.org, a new blog specifically concerned with “news and information about activities related to the development of data policies and standards in the biological domain, in particular for the area of ‘omics”. You can find the post on biosharing.org at: http://biosharing.org/2009... . Recently, [...] - Allyson Lister
Liked the piece, Allyson. I can see the point regarding reporting from conferences and such-like (something you did with consummate professionalism in Cambridge, for example). Just looking the FriendFeed, though, I'm beginning to get the impression that 99% of users are, in the nicest possible way, informatics geeks like me. Why are my lab-based colleagues on Facebook and not on Twitter? - Neil Swainston
Other ways to read this feed:Feed readerFacebook