Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Frank
The Triumvirate of Scientific Data - http://peanutbutter.wordpress.com/2008...
I wanted to take the article a step further highlight three significant properties of scientific data that I believe to be fundamental in considering how to curate, standardize or simply represent scientific data; from primary data, to lab books, to publication. These significant properties of scientific data are the content, syntax, and semantics, or more simply put -What do we want to say? How do we say it? What does it all mean? These three significant properties of data are what I refer to as the Triumvirate of scientific data. - Frank
Frank, don't you think that RDF contains those tree properties ? - Pierre Lindenbaum
I need to have a good think about this. But this seems like a useful basis to talk about things. I think your triumvurate is about scientific assertions rather than data though. The data itself is the "what", the underlying file surely? - Cameron Neylon
@Pierre, Certain syntax have the ability to represent both content and semantics, Yes RDF would be one of those, even some propriety file formats can capture and record additional content, i.e metadata about the data. The metadata itself can also have semantics - Frank
@Cameron, Its not necessarily "scientific". It could also be described as the three aspects to consider when curating digital objects rather than just scientific data. The content can be the data itself "what", but typically the content has to be described in some way, either in a way amenable to humans or computers, in other words metadata. - Frank
Scientific assertions about the world are no different than factual statements about the world in general. We need to agree upon the most simple syntactic rules for human beings to express those assertions that are machine-readable and machine-processable by Semantic Web technologies. From the standpoint of the average user, RDF is not simple enough. - Sean McBride
Frank, agreed - but you called it the "Triumvurate of Data" - not disagreeing with anything you wrote (yet), just that perhaps the terminology you've used has the potential to confuse us wet scientists of very little brain :-) - Cameron Neylon
@Cameron, You don't understand my semantics? :) - Frank
What comes to mind when I think of threes, triples, triumvirates and threesomeness in general: all propositions about the world consist of objects, properties and values. For instance "France [capital] Paris" where "*object [*property] *value". If there is any proposition about the world, scientific or otherwise, that can't be captured with this simple model and syntax, I haven't discovered it yet. (Some folks like to refer to objects, properties and values as subjects, predicates and objects.) - Sean McBride
@Sean: what you're describing is RDF - Pierre Lindenbaum
@Sen McBride: and here is your description of Paris in DBPedia/RDF http://dbpedia.org/page/Paris - Pierre Lindenbaum
Pierre - I would like to see semantic markup conventions on the user end that are more stripped down, minimal and simple than RDFa. Average users should be able to embed semantic assertions in plain texts and regular communications on the fly, with as little effort as possible. For instance {.RDFa /c semantic markup language} - I just declared that RDFa is a semantic markup language, where {.*instance /c *category} and "/c" is a formal property tag for "category." Friendfeed should do the hard work of plucking out, processing, organizing and managing these statements from the total information flow. - Sean McBride
Sometimes I think that doubles -- category/instance pairs -- could do 90% of the work of organizing semantic content. Category assertions are infinitely flexible. {.Forbes billionaire 2008 rank 1 /i Warren Buffett} (I suppose this is really a triple: {.*category /i *instance} in opv mold.) Most of the knowledge in anyone's head can be expressed as a single plain text list of category/instance pairs. - Sean McBride
One difficulty is that science (at least in my field) is not made up of facts (like Paris is in France) but of measurements with large probabilities of being significantly off - we have to apply fuzzy logic with confidence values based on the evidence and the interpretation of the evidence - Jean-Claude Bradley
Propositions with various degrees of certainty and probability can easily be asserted as opv (object, property, value) statements in semantic markup schemes, as can theories, speculations, questions, etc. - Sean McBride
Sean - yes I think to automate the scientific method you need fuzzy logic - it is too easy to come up with one experiment that invalidates a theory - Jean-Claude Bradley
This might work: "x [property?] y" where one isn't certain about whether x has the property y. Perhaps "x [property?7] y" where one is 70 percent certain that x has the property y. - Sean McBride
without wishing to get back into the quad and quint argument again..who is 70% certain? And when? And based on what evidence? Any assertion can probably be written as some form of triple - but in isolation those assertions have almost no value beyond some form of idealised machine reasoning. - Cameron Neylon
Triples have enormous meaning when they are well-grounded in reality (truthful) and gathered together in large collections (tens of thousands, hundreds of thousands, millions or even billions of triples). The set of all triples created by particular persons, organizations or groups will be highly meaningful in terms of profiling the creators. Why would anyone -- historians, biographers, scientists, financial analysts, medical researchers, etc. -- look at semantic assertions about the world in isolation? The value and power comes from integration and inferencing. - Sean McBride
My point is that from an experimental perspective any description of what happens in my hands in the lab will have flat out contradictions for any collection of more than about 20 statements. Sometimes things work and sometimes they don't. I don't yet see how we make the jump from reasoning over straightforward sets of assertions that are very widely agreed (Paris is a city) to the very messy world of experimental science. Believe me, I want to, I just haven't seen a convincing demonstration yet! - Cameron Neylon
Statistical patterns among contradictory scientific data are interesting. They should be data mined for possible useful inferences. Also, false or contradictory beliefs among agents (persons, organizations, groups, etc.) are interesting and offer a window on the real world. Get this stuff into triples, and one can play games with the data. - Sean McBride
Sean - well we are accumulating some simple data that we are reasonably confident of being close to correct in the form This compound in this solvent has a solubility of X - maybe somebody can help with converting to triples? http://spreadsheets.google.com/ccc... - Jean-Claude Bradley
@JC Bradley, I'm working on this, give me a few minutes :-) - Pierre Lindenbaum
@Jean-Claude: I put a RDF description of your spreadsheet here: http://anybody.cephb.fr/perso... . I'm not sure of 'what is what' and 'who is who' but you get the idea. - Pierre Lindenbaum
@Pierre - wow that was quick :) I added a link to your RDF file on the front page http://onschallenge.wikispaces.com/ - Jean-Claude Bradley
@Pierre - by the way solvent has an e, not an a - ce n'est pas en francais :) - Jean-Claude Bradley
@Jean-Claude: ah sacré nom de saperlipopette !! :-) ok, fixed. you can also validate this rdf file with the RDF-validator of the W3C (http://www.w3.org/RDF... ) and visualize the graph (works with copy+paste, but the engine failed to load 'by uri" (?)). - Pierre Lindenbaum
can I now like this again? I will accept this is a real space representation of experimental data - will need to think about how the people and the methods need to map onto this... - Cameron Neylon