Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Anil Thomas
"Big question is not whether data is open or not, but if data is well structured and annotated so that one can reproduce the results and re-use it without going out of context." - Madhu Pandey
But without the data around, how do you decide how to structure and annotate it? We can't assume to know everything about the data - Deepak Singh
Annotations using raw data is of little use, annotation requires extracted knowledge like ontologies (Gene Ontology, SBO), Domain dictionary and Knowledge models. Just attaching every numerical/string value we got does not make sense. We have each kind of data out there, sufficient to standardized the formats. - Madhu Pandey
Standards (open or otherwise) that exist detached from actual data or implementations (or that are too attached to one specific implementation) often end up being not so useful (or worse). - Eric Jain
Don't know the context of the original quote about open data being more important than open source. But most people here will probably agree that it's more important that GO remains open than that it is produced with open source tools, just to give an example. - Eric Jain
check out this How nature blog aggregates the top stories http://www.abhishek-tiwari.com/2009... - Anil Thomas
Eric, I quoted Ian Davis from Talis who was talking about Semantic Web/Linked Data stuff .. http://www.slideshare.net/iandavi... - Deepak Singh
"Communicate first. Standardize second" - Jean-Claude Bradley (but communicate doesn't just mean dump either, it means communicate) - Cameron Neylon
Left my catch on this in my blog: http://chem-bla-ics.blogspot.com/2009... - Egon Willighagen
BTW, anyone knows how I can make FF aware that by blog item that will show up on FF too, is actually a comment on this item? - Egon Willighagen
I have to +1 to the importance of standards. Related post from Frank sums up my opinions on that one: http://peanutbutter.wordpress.com/2008... Of course you can't work in isolation with the data itself. That's where community standards come in: that is, standards that are developed by the community, for the community. Takes much longer that way, but you end up with happier people and greater uptake. - Allyson Lister
Allyson, standards are very important, especially for data sharing, but best come up when people have access to the same data and you can get a community around them. Private data usually ends up leading to many different standards (which is an oxymoron if there ever was one). Completely agree on the community standard bit - Deepak Singh
What Deepak said: before you can have community standards you need a community. And the best way to build a scientific community is around shared data. - Cameron Neylon
Makes complete sense, Deepak. Agreed :) - Allyson Lister
@Cameron - yikes, I think I see a chicken-and-egg situation. For how else to have good shared data except to all use the same format? :) - Allyson Lister
Allyson, it is, but the whole concept of get it out there and don't try and make it perfect applies here as well - Deepak Singh
You don't *need* standards to share or use other people's data. Standards can (but not all do...) make doing so a lot less painful. The pain in fact appears to be a major motivator for bothering with standards. That, and grant money (for academics) / the "no-vendor lock-in" marketing message (for companies)... - Eric Jain
@Allyson start small? Seriously though, I can transfer data in a spreadsheet without it being in a standard format. This is good enough to communicate with interested people. Once we have more experience of transferring data between ourselves, and we have a community, it makes sense to talk about community standards. Like Eric says, as long as humans are involved, you don't need standards, but you do need someone to do lots of cutting and pasting in Excel (running away now...) - Cameron Neylon
Let's say you have some interesting data that could be quite useful to a lot of people, but it's all un-standard and ugly. Should you let it languish in a closet until you have time to give it a makeover, or do you publish it anyway? If former, we'd still be waiting for the first release of Swiss-Prot :-) - Eric Jain
I should point out that most of the data I generate is essentially two column but with no agreed standard format (or one in an early stage of adoption) - Cameron Neylon
Are we talking about community data exchange standards like these? http://www.mibbi.org/index... - Mr. Gunn
well I'm talking about standards like this http://www.smallangles.net/wgwiki... - which is a good start but at the moment only a few instruments write it and virtually no analysis software reads it - Cameron Neylon
Interesting. - Mr. Gunn
there's nothing at MIBBI which really helps me at the moment. Most of it is way too heavy weight and it doesn't really cover the kind of experiments we do. If we had tools that automatically pumped it out then I'd be much happier. - Cameron Neylon
yeah, I guess that's the problem with any data standard. If you don't have devices that generate it, it's not much use. I'm trying to convince the engineers at my job about the need for something other than a wodge of .csv files, but I'm a ways off from even being able to have this kind of conversation with them, yet. - Mr. Gunn from IM
The key to getting attention is to observe how much time is spent loading CSV into Excel moving columns around and then graphing, as opposed to load data, graph. There is something about the visibility of columns of data in spreadsheets that makes people comfortable though. But the repetitive actions are what get to me and are pushing me towards more sophisticated tools (at least as soon as I can figure out why my #$&%! python variables are in the wrong scope) - Cameron Neylon
LOL. They've written dedicated software for handling our data, but there's something about engineers writing software though. They're not exactly UI experts and I still end up doing a lot of messing about with columns of data, post "processing". - Mr. Gunn from IM
I guess we are very much concentrating around data standards vs open data, how about data provenance. You collected some data and put it in excel because you don't need any standard or format. Now next person don't know how you generated the data or what was the context. It gives him freedom to use it anyway he/she want. - Abhishek Tiwari
So many community efforts are ongoing, off course they are not doing it without any data. - Abhishek Tiwari
Provenance is important but can be managed by a container if necessary (blog, repository, wiki, whatever). The data is just an object at the end of the data, the context doesn't have to be stored within it, it just needs to point at it. Takes us back to recording the record of what was done versus what was generated. - Cameron Neylon
Actually I think that is a central problem with many data standards. They try to record both the experiment and the results, without making a clear separation between them. This can lead to all sorts of nightmares such as: how do you represent the average of five independent experiments; I realised a parameter was set wrong when I processed the data, but now if I duplicate the file it looks like I've done the experiment again... - Cameron Neylon
Well I don't see either that you need to store metadata in data, but most of standard are trying to document both what was done (metadata) as well as what was generated (data), and I guess that is not bad. - Abhishek Tiwari
My problem is that they generally try to do this within one file - which in my experience breaks much more often and creates more dependencies. If I was to take the extreme position I would say that there should be separate standards for recording the process and the different types of results. But in practice, especially with modern instrumentation, it is helpful to pull some of it together. - Cameron Neylon
Without metadata, it;s just numbers. It's the context that makes it data. It definitely makes sense to break things out. - Mr. Gunn
I recently faced this problem of metadata. I had to store a set of spreadsheets and I wanted to annotate each file (this is a tab-delimited-document, this is a statistical result...) and each column ( this is a marker, this is a genomic position, etc...). Franck Gibson redirected me to the information-ontology http://tinyurl.com/3n6lcu . (Not easy to use for me). - Pierre Lindenbaum