FF doesn't meet all your requirements but it does seem to work well compared to the specialized services - at least in some fields
- Jean-Claude Bradley
Well I guess that's not surprising given my biases - at some level I'm more interested in what people think I've missed than my own predjudices though. FWIW I think a clever combination of DropBox, FriendFeed and some of the elements from StackOverflow, with perhaps a bit of the coordination ability of posterous would go very close to the mark. Still need better network and filter management tools though - somehow they need more configurability but less configuration...
- Cameron Neylon
OpenWetWare is looking to make a major overhaul in the next couple months, and has a bit over 1 year of funding left. I feel like this is an opportunity to at least try to do some of the things that most people think are necessary for SS4S. Not perfect, but better so that we'd have a better idea of what is really needed. I think the time frame (now; already funded) makes "not perfect" a...
more...
- Steve Koch
I really like what you said in point 10. It's something that I've seen far too many scientists being cavalier about. Federation, open protocols and specifications, along with open source, are very important to science.
- Christopher Granade
Might be worth seeing how far sourceforge meets your criteria. Certainly it's totally based around objects, i.e. software projects, and there are lots of high quality open source science projects whose code is hosted there. Although it has community/social networking tools I've personally never really used these and most visits I've had to sf have either been fleeting (to download...
more...
- Dan Hagon
Steve, absolutely we need to keep evolving with the resources available. OWW is a great place to do that.
- Cameron Neylon
Dan, there was a conversation around using Github in a similar way some months ago and I think these things have a lot of potential as a back end. I think federation is important enough that you'd want to use a DVCS rather than SVN as a back end though.
- Cameron Neylon
Sourceforge has several DVCS options in addition to svn these days. Although github is great I would be wary of anything that requires scientists to learn the intricacies of git. hg and bzr are much more friendly to non-developer types that don't need the full flexibility of git. I've had some success using them to collaboratively author LaTeX documents.
- Matt Leifer
Matt, ok, I'm behind the times (nothing new there!). The intracies are less of an issue as this would only be a back end. No SS4S that any significant proportion of scientists use is going to look _anything_ like a code repository. To start with your average scientist is never going to touch a command line. If you're dealing in Latex you're already talking about a minority I'm afraid....
more...
- Cameron Neylon
There are several wikis that use DVCS as a backend. This could be a starting point for developing the type of thing you are interested in.
- Matt Leifer
LaTeX isn't the minority in whole areas of math, CS, physics....I guess that brings up the same old complaint: "science" is defined as all biomed, all the time. I'll try to come up with some more substantive comments though
- Christina Pikas
Christina, didn't mean to say it should be excluded just that a non-command line system is non-negotiable so most online VCS aren't going to be good enough as a front end. Support for Word, Excel, video, images, XML and Latex are all non-negotiable characteristics of any such system.
- Cameron Neylon
Matt, not sure that a wiki is the right starting point - the document model doesn't seem right to me, although I'm way behind on the most recent developments in Wikis so I may be out of date on that as well. What is in my head is a DVCS back end with APIs providing access from e.g document authoring systems, databases, publishers, whatever. A feed system that looks a bit like friendfeed...
more...
- Cameron Neylon
I wasn't suggesting actually using one of the wikis, just that they have already done a reasonable job of abstracting the version control functionality (in fact, some of them support more than on DVCS in this way) so there may be some things in the codebase that are useful. It is also an example of taking a command-line DVCS and giving it a more user friendly interface. In addition, if...
more...
- Matt Leifer
Ah good to know - which do you think are the best examples of these wikis? I should take a look. In any case at this stage I'm just throwing ideas out. Have no resource to actually a build anything at moment.
- Cameron Neylon
Is there actually a need for social software for scientists? Or should scientists use and customize the existing social networking tools (FriendFeed, Facebook, LinkedIn, etc.)?
- Martin Fenner
I'm beginning to think the main issue will be that business models for consumers services are incompatible with what researchers need. So yes, customise might be better than build but if we have to go down that route we may as well have a good idea of whats required. One person's customisation is another person's build.
- Cameron Neylon
I'd be curious what you think of HubZero, Cameron.
- D0r0th34
Depends a bit on server setup. For Mercurial I like Hatta, but it requires persistent python processes, i.e. no good for most shared hosts that only allow CGI. There is a list of RCS backed wikis here: http://hatta.sheep.art.pl/Similar projects
- Matt Leifer
Cameron, I love and absolutely agree with the necessity of "scientific objects". If you lack those, then (as Martin points out) just use the general purpose sites. In that principle, I think there are some viable networks -- DVCS systems around scientific code, Mendeley around scientific publications, (eventually our BioGPS around genes). But I think we should be developing specific networks appealing to specific groups of researchers, rather than trying to serve the needs of all scientists...
- Andrew Su
Andrew agreed, but if these are federated then they can all still talk to each other. I'm thinking more framework than site or single service. Ideally all of these things can be plugged in or wired up together...my concern with general purpose sites is primarily that they don't provide the level of trust and stability that we would expect for "research enterprise"
- Cameron Neylon
Just one comment. There are protocols out there that allow different social networks to talk to each other. There are protocols out there that allow web resources to talk to each other. It's not really that hard if everyone supports some basic standards. RESTful API's, OAuth, OpeniD/Facebook Connect/Friend Connect, etc. IMO what's more important is that any sites we design have the...
more...
- Deepak Singh
@D only really had a chance to have a quick look. First impressions are that it is very slick but looks as though everything has to be on the inside - I don't see much mention of pulling stuff in and out. The multimedia talks are nice but why not pull them in from e.g. slideshare to pick an example.
- Cameron Neylon
completely agreed, federation through standards...
- Andrew Su
Twitter is far from perfect, but look at the infrastructure that has evolved around it e.g. 3rd party apps, services). You don't get that kind of traction around a social networking site just for scientists. Imagine what email or the WWW would look like if there were separate versions just for scientists.
- Martin Fenner
from iPhone
Absolutely but that actually means we can build something better, and as long as it hooks into Twitter (RSS/OAuth...Deepak's list basically) we get all the benefits and all of the functionality we want - as well as a way of drawing people in. Assuming this framework is any good of course. Imagine PubMed if it had been built for the consumer web (actually maybe not such a good example...
more...
- Cameron Neylon
Sort of responding to Deepak a few comments earlier. Something like a social network is useful for at least one reason: recruiting scientists who aren't ready for open science, or cannot communicate openly for one reason or another. So, a reasonably secure way of making data private and shared with a limited network is a good thing, I think. I think ultimately that will lead to much more open science (my own lab started out with a private wiki before doing ONS)...
- Steve Koch
Steve, but does it have to be a social network per se, or a site for say sequencing geeks (I am looking at you SeqAnswers) with the appropriate features built in. Social networks don't have to be all in the open. Facebook is a social network. 90% of my communication on there is private and you should see how much of my Twitter usage is DM's
- Deepak Singh
Deepak, I think I was just using terminology incorrectly. I was assuming Facebook = social networking.
- Steve Koch
"I mean, what's more likely -- that I have uncovered fundamental flaws in this field that no one in it has ever thought about, or that I need to read a little more? Hint: it's the one that involves less work."
- John Dupuis
Just noticed that we used different notions of "public funding environments" in the mind map so far. What I had in mind was to have "funding environments" in public, much like what fundscience.org plan to do. Some of the added comments seem to have used the term in the sense of environments for "public funding". Both notions are certainly valid, and we should think of ways to keep them apart.
- Daniel Mietchen
good point re making this difference clear(er) in the map
- Claudia Koltzenburg
Yes, Jean-Claude, contests and prizes with a competitive element are definitely on the list. If you have good examples from the recent past, please post them here.
- Daniel Mietchen
"More money for science is always good. Or is it? Six experts tell Nature what concerns them most about the US stimulus spending and suggest ways to ensure that it benefits research and society in the long term." - http://www.nature.com/nature...
- Daniel Mietchen
Daniel - the thing I like about contests is that barrier to participation is orders of magnitude lower than traditional funding - there is no need to convince anyone that what you are attempting will actually work before doing anything. Of course this limits the type of projects that can be run but it still applies to a large number.
- Jean-Claude Bradley
Daniel, not that I have anything against HHMI, but that mantra is not exclusive to them. For example Max-Planck Society has exactly the same approach (and I would say that at 10% of HHMI's budget and having twice as much Nobel prize winners, MPG looks a bit more effective ;) ).
- Pawel Szczesny
Didn't mean this to be exclusive, and I am well aware of MPG approaches (been there for a while).
- Daniel Mietchen
"I wish there was a universal format for submitting grant proposals; authors could post proposals (once!) & then the funders bid on them." (rephrased from http://ff.im/5VwEI ). I would add that the process should be public. fundscience.org plan to go this way.
- Daniel Mietchen
How do funders and scientists rank "more attention to technological shifts" against the "scientific expertise they have"? One says "change" the other "keep doing what you know"! Are those two things not disagreeing each other? In other words, who would you fund first, the "crazy new idea" or the "conservative stuff"?
- joergkurtwegner
I would think funders should have (as they do now) the liberty of choosing their priorities, and in many cases this will be a mixture of many incremental projects and some revolutionary ones. The main shift in the system would thus be to have just ONE avenue for proposals, and to make it public.
- Daniel Mietchen
No mention of friendfeed, so what about writing a correspondence piece on this? It could be based on http://ff4s-paper.wikidot.com/start and perhaps also put the recent NIH grant for a "Facebook for Scientists" ( http://ff.im/beKk7 ) in perspective by providing an overview over existing tools along these lines and why they are not widely used.
- Daniel Mietchen
http://www.cell.com/authors... / Correspondence: "The Correspondence format provides our readers with the opportunity to respond to an article in Cell—either a research article or Leading Edge article—that has been published within the last 2 months. Correspondence should be no more than 900 words in length with up to five references and should be of interest to the broad...
more...
- Daniel Mietchen
Now that sounds like a good idea! I'm all for it - especially mention the gazillion "facbook for scientists" already out there.
- Björn Brembs
333 words so far, and once the generic FF description and some highlights from the spreadsheet are in, we will be near the limit. So probably no time to dwell on fb4sci, though I would still like to mention the NIH grant in the hope that those people will build on the ideas we lay out.
- Daniel Mietchen
Maybe steer away from a "but we want to talk about friendfeed" towards more "there is a much richer set of tools out there...and here is a good example..."? Might mean the Fb4Sci stuff can get squeezed in?
- Cameron Neylon
I would actually prefer the Fb4Sci stuff in there, and the article would be more balanced if we were to name a few more services that offer microblogging (I listed some in the Organization part of the document). FF can then be described in two sentences as a particularly useful example because it provides hierarchies of threaded conversations in which the most current and the most popular entries compete for the top of attention.
- Daniel Mietchen
Correspondence has to be submitted within two months, so we got four weeks to go if we are to submit something on the matter. Perhaps we can indeed expand this into a general overview on the potential of web 2.0 stuff for science. To this end, I just started a vote on the "open science breakthrough of the year" at http://ff.im/cidKG .
- Daniel Mietchen
thanks guys - a very interesting read (the paper, these responses, the etherpad document). I've added a couple of possibly-relevant points to the etherpad doc. :)
- Allyson Lister
...bumping to remind me to try and do something about this before deadline...
- Cameron Neylon
To those coordinating this: let me know if you need any extra help with anything...
- Allyson Lister
Allyson, help with shortening the FF part and with adding in something on the non-FF alternatives would certainly do something good to push things forward at this stage. Thanks!
- Daniel Mietchen
Edited a bit and tried to merge the new contributions into the draft. The word count for the FF part now stands at ~570 excluding FF real science examples. I still don't see how we can give an overview of more than one of these services and accomplish anything better than a boring enumeration without spirit. On the contrary, people will just get the impression that scientists can't make...
more...
- Björn Brembs
Thanks, Pierre, was already mentioned. Just added some examples from this spreadsheet. Word count is now at 760. Tasks remaining (if you agree on the general structure): polishing and final, concluding paragraph. Tasks remaining if you don't agree: re-write :-)
- Björn Brembs
have removed a few words, tightened things up. will do more as time permits
- Allyson Lister
953, so some trimming needed. Mentioned the NIH grant in the roundup section. Which references to take?
- Daniel Mietchen
Good job, Daniel! I think the references are fairly clear, most of them are in the text already (i.e., papers from FF). We have until December 30 to get it all finalized, so we have some time, but I'd rather get it there sooner than later. I think a few more runs of polishing and honing and we should get the final author list together and submit. I suggest everybody who wants to be an author leave the URL to their FFfeed at the end, that way readers get an idea of what FF looks like.
- Björn Brembs
What about signing with a group pseudonym (something like D H J Polymath; http://arxiv.org/find... ) and a link to this thread or the etherpad?
- Daniel Mietchen
I have inquired with them whether links count as references.
- Daniel Mietchen
What about the title? "Should you be sharing science online?" would be my favourite but it is not reflective of the current emphasis. Any suggestions?
- Daniel Mietchen
Pierre - good one. Perhaps add FF as initials?
- Daniel Mietchen
BTW, the doi does not resolve - anybody has the correct one?
- Björn Brembs
I like Clay's idea for a title: "It's not information overflow, it's filter failure " :)
- Allyson Lister
884 words, and a few more slight tweaks. This means we could probably fit an entire sentence about other approaches' existence, if we wanted :)
- Allyson Lister
Right now this sentence is a mixture of DOIs & links: which to use? : "Such conference coverage has even received direct (e.g. ISMB09 http://www.iscb.org/ismbecc..., BioSysBio09 http://dx.doi.org/10...) or indirect (e.g. ISMB08) support from the conference organizers, see e.g. http://friendfeed.com/ismbecc... ." We can convert them all to links, & save some of the 5 publications, but all three examples here have papers associated with them (well, ISMB09 paper is accepted)
- Allyson Lister
Ah - actually it looks like the ref we would use for ISMB08 is actually ref 1 - am I correct? There isn't much detail in ref 1 yet. That could solve part of the problem
- Allyson Lister
I'd also like to find that out, but the DOI does not resolve (for me?). Haven't looked at ref1 yet, to determine if it's redundant.
- Björn Brembs
Sorry - yes, @Daniel, the DOI seems broken, but the genomebiology link is the correct one. If we're limited for references, we could just link to the FF room, which is http://friendfeed.com/biosysb...
- Allyson Lister
We have 5 references and thus I added Allyson's to make it 5 :-)
- Björn Brembs
Question as to whether its advisable to include reference to the RW room. I think someone raised this somewhere but I can't see the discussion now.
- Cameron Neylon
Otherwise made a few very minor changes
- Cameron Neylon
@Cameron - yep, a few of us have brought up that point (me and michael and some others I think in the etherpad doc). I'm happy to go with whatever the owners of the room, or the general consensus, wants :)
- Allyson Lister
RW room discussion is in the header of the document. IMHO there are several crucial reasons for finally going public: it's a grey area probably still fair use; more subscribers mean more access; readers will see the usefulness of this room, even if they don't get any of the other features; the kinds of hoops we have to jump through to get access need to be made public and the room has a significant record now.
- Björn Brembs
I think we need to drop ref 6 since we only have 5 and it's not a journal article, correct?
- Björn Brembs
With Etherpad deleting everything by March 31, we should think of ways to archive existing pads - particularly relevant for this one, as it was meant to be citable. As far as I can tell, none of the currently available options preserves the version history, so if we want to have that, we should do a screencast.
- Daniel Mietchen
Indeed, we need to think of something!
- Björn Brembs
Incidentally, the threat of such services disappearing certainly contributes to the hesitation of people to adopt social networks, and the best ways I see to cope with that problem is to have either open standards on data portability, or - better still - social networks (or at least one of the most suitable ones) that are built entirely open source platforms, with open configuration (and of course data portability too). Any suggestions on whether and how this could fit into the concluding paragraph?
- Daniel Mietchen
Isn't it already in there, sort of? Where we write that these tools are in development and NIH funded?
- Björn Brembs
from iPhone
Haven't seen mention of open source and open standards in the news on these NIH grants, so it may be worth making more clear that this is needed.
- Daniel Mietchen
Upon feedback from Graham, I took the RW reference out. Still think some mention of Open Source would be good. http://www.nih.gov/news... does not mention it. 816 words.
- Daniel Mietchen
Can we be part of that feedback, please? I find the RW functionality so convincing for non-social web users that I fear the whole article might be wasted, i.e, preaching to the converted, without this component.
- Björn Brembs
It was in a DM that I just forwarded to you (dunno whether that works), and I asked him to comment here too.
- Daniel Mietchen
Did anyone manage to do a screencast? I could try and do that today if its useful? But maybe better to wait until you feel is finished?
- Cameron Neylon
I think we should wait until it's basically submitted.
- Björn Brembs
Nothing wrong in testing, otherwise I'd also wait till it's submitted. @Björn - sent you screenshot.
- Daniel Mietchen
I'll comment once I get back form work (only have internet access here during lunch hour).
- Graham Steel
Right. 1) Having consulted with Bill, we have (the same) mixed views vis a vis raising the visibility of the RW room. 2) We don't feel that we "own" the room though, it belongs to everyone who uses it. 3) We agree that a poll should be set up for subscribers of the RW room to vote on the issue of whether or not they feel it appropriate to raise visilbility of the room outwith FF. 4) The poll is http://www.micropoll.com/akira... and I'll post a link to it in the RW room shortly.
- Graham Steel
Apart from inclusion of the RW room, the title has not been decided yet. Two suggestions are in there now (I threw away my older one).
- Daniel Mietchen
Also, what about the "like=bookmark" discussion? I would like to see that paragraph go back in.
- Daniel Mietchen
I thought that like=bookmark was clear from the context? If not, then it should be easy to add a sentence to make it explicit.
- Björn Brembs
Björn - see chat bar - Michael was not comfortable with the notion. Any other opinions? Also turned Shirky quote from title to quote and set the title to "Social filtering of scientific information - a view beyond Twitter".
- Daniel Mietchen
Besides, FF search has now been unusably slow for weeks, so I wonder whether we should take this formerly excellent feature off the draft. See also http://ff.im/cO3Jw .
- Daniel Mietchen
Two weeks left to submit. I plan to do it on Sat (Dec 19) around noon UTC. Still to address: RW room and perhaps ephemerality of non-Open Source services like FF. I think I saw somewhere that FF have released (part of) their source code, or plan to do so. Anyone know details?
- Daniel Mietchen
Cyndy Parr tweets ( http://twitter.com/cydparr... ) "This is kind of what PLOS One envisions -- it goes up there, and then it could get chosen to be part of a hub". Iz true?
- Karen James
Thanks, Graham. Having just had a paper rejected by two journals in a row, I'm fed up to here *points to own eyebrows* with spending hours if not days re-formatting to meet the ridiculously precise but in no way substantive guidelines of different journals. It's not even rewriting, it's just pointless fiddling and a silly waste of time. If the taxpayers only knew...
- Karen James
There are two issues at hand here. One, a universal format for submission, Two, a bidding process on papers. The Neuroscience Peer Review Consortium points to how the second part of this kind of deal is working right now in some disciplines (http://nprc.incf.org/), the really really sad part about the first issue here is that the big publishers don't care what format you submit in (let...
more...
- Ian Mulvany
+1 for more standards for paper submissions, starting with reference styles. And for allowing submissions in the NLM DTD format.
- Martin Fenner
Ian, you may say they don't care, but when one is submitting a manuscript, one is trying to do everything one can not to give the publisher any possible little excuse to reject your paper without review.
- Karen James
Second time in a week that someone stated "publishers don't care about format of submissions". Again I ask: if that's the case, why do all journals make a huge deal about it in their instructions to authors?
- Neil Saunders
As for universal format: easily solved by writing our papers on the web. Imagine a simple forms-based interface with fields for title, authors, abstract, introduction... Imagine a button in Google Docs that says "submit this document to <insert journal here>" !! But currently, we all like to use our own word-processing software on our own machines, then upload a document in a multitude of formats. It's going to take a big shift in thinking and work practices.
- Neil Saunders
What Neil said: if journals don't care, why do they make such a damn song and dance about it? Why not explicitly say you can *submit* in any basic AIMRAD format? Worry about format after acceptance: either the journal can send it to India per Ian above, or if they make the authors do it at least they only have to do it once. My next paper (quit laughing) is going out in basic AIMRAD...
more...
- Bill Hooker
Note: this is easier for me to do than many, because I've basically given up on an academic career as currently constructed.
- Bill Hooker
"it's just pointless fiddling and a silly waste of time. If the taxpayers only knew" I think they should / deserve to know!
- Björn Brembs
+1 Neil "why do all journals make a huge deal about it in their instructions to authors?" and +1 Björn "I think [the taxpayers] should / deserve to know!"
- Karen James
Neil & Bill: maybe "don't care" is to strong a phrase. A manuscript does need to be structured correctly to fit into the journal's content management system (an application note looks different to a letter looks different to a research paper), have images properly resized and references in the right format so that they can be processed by systems that convert to them links etc.
- Euan
Also: what happened to that Wolfram word processor for papers that was supposed to do what Neil mentioned above with Google Docs?
- Euan
the publishers i know would be delighted to standardise to NLM DTD for submissions -- would save lots of editorial time and production costs -- the publisher i know best sends accepted papers to be manually turned into xml which can then be used for PMC deposition and the semi-automatic generation of the HTML and PDF versions. But things like the Publicon app have taught publishers that implementing the technology to do something doesn't mean that it will happen in significant quantities! :)
- Joe Dunckley
I used Publicon when it was released a few years ago. Essentially a dead product now. Lemon8-XML does what Neil describes as "Imagine a simple forms-based interface with fields for title, authors, abstract, introduction... ": http://network.nature.com/people....
- Martin Fenner
There is a nascent version of that working in neuroscience http://nprc.incf.org/. Journals have formed a consortium where if an author submits to one journal and it gets rejected, the author can specify that the reviews follow the paper to another journal so that it doesn't need to be re-reviewed. This was viewed as a way for papers that have nothing wrong with them but which don't fit the scope of the journal can be published more quickly and easily.
- Maryann Martone
What about replacing "papers" and "journals" in the subject line with "proposals" and funders?
- Daniel Mietchen
What if journals said here's our LaTeX template. Put the right text in the indicated field, lotion in the basket, and anything else won't be accepted.
- Mr. Gunn
@Daniel Mietchen: Yes, that too! @Mr. Gunn: What I'm advocating is that there's a single LaTeX (or whatever) template - not that you'd re-paste for each journal.
- Karen James
karen, yes. The idea being you give them the text and they do whatever they like with the formatting.
- Mr. Gunn
Aside from making life easier for authors, it would allow sane computational use of papers. With PDF, you don't even know which image a figure legend refers to, except by guess work. The difficulty is that the journals don't see it as their problem. The solution is for the authors to make it the journals problem
- Phil Lord
I like the idea, Karen. Publishing an exciting paper should not a be a torture (for us!)
- Betül
Getting access to research papers is already too expensive. Wouldn't it just be more so if we invited a bidding war on each paper? Write good papers, and submit them to PLoS.
- Ted Slater
I'd like something similar for the review process. Instead of having to register for each journal/publisher managing logins and passwords for each, have a clearing house that manages reviewer information that the journals subscribe to.
- John Hogenesch
Crud - sorry for the re-post folks, there was something in that version that shouldn't have been (and if anyone's downloaded the file and figured out what it is I'd appreciate it if you didn't spread it around - some details to be ironed out yet)
- Cameron Neylon
darn. now I wish I'd downloaded it! *curious*
- D0r0th34
Object lesson in not trying to bowdlerize documents too late at night. And that it is so much easier when _everything_ is public. But its not my thing to make public as yet...and no, I'm not moving jobs :-)
- Cameron Neylon
We're thinking along similar lines to this (without Wave as yet) and trying to generate interest in Aus for a linkup with Jeremy Frey/PMR. Cameron if you wanted a remote experimental Chemistry partner let me know - our open science project officially starts tomorrow (!) for 3 years, so we'll start to generate data in Sydney we want to share.
- Matthew Todd
Cool, JGF mentioned he'd been talking to you and I meant to follow up. If you're up for writing a letter of support that your interested in the project that would be very cool. I can do the same obviously for anything going in on your side. The more (diversity) the merrier, although we can't stump up for endless disk space for everyone so we might need to find a re-charging mechanism if...
more...
- Cameron Neylon
Sure, OK. Could you email me an address for the letter? m.todd@chem.usyd.edu.au. Data would be small size but lots of little files - mainly NMR/IR/MS. I'm guessing what you're proposing though would need installation of a little software on local computers attached to spectrometers. The institutional inertia is the problem there.
- Matthew Todd
Yes, there has to be a widget installed on the computer from which the file gets copied one way or another and this could clearly be a problem in many cases. But one problem at a time.
- Cameron Neylon
Side musing: How long does it take a researcher using method X to generate 1Tb of data? In microimaging, this may easily be a day, and some particle or astrophysics may be even faster.
- Daniel Mietchen
Not long in many cases - images stack up pretty quickly.
- Cameron Neylon
Looking good, Cameron, albeit judging from a really quick skim-through. Makes me want to work on this, in fact.
- 'Mummi' Thorisson
I don't get out of bed for less than 1TB per day. An impending deluge of >1 PB that I'll have to manage does give me sleepless nights though :)
- Dan Hagon
By the way - is AtomPub (http://en.wikipedia.org/wiki...) being considered for publishing & editing web resources via a Atom store back end? I don't know much about the Amazon cloud storage and apparently can't be asked to find out (!). RSS/Atom is a nobrainer for monitoring for updates obviously. But is content create/update perhaps all supposed to be done via Wave / XMPP?
- 'Mummi' Thorisson
One practical issue which might be important to clarify is whether or not you'll be able to develop the Wave components on a local Wave federation server and/or robot server (I forget the exact terms used in the draft spec.). If you're looking at wave for rapid prototyping, especially of robots, having these in place from the start could be crucial to how far and how fast the you can get with the tools you intend to develop.
- Dan Hagon
@Matthew interested to see how you get on; We've put our undergrad OS lab on hold, but my projects(equivalent to honours)/postgrads are waving whether they like it or not (and actually most of them seem to like it so far :>)
- Anna Croft
@Mummi, AtomPub is an obvious approach but we don't want to tie ourselves down in advance. Certainly most likely contender so I should put it in though. @Dan, almost certainly on the central Wave server, which does mean we may end up with a performance hit I admit. Assuming that increases in performance combined with these being small waves in most cases will mean things are ok....
more...
- Cameron Neylon
That's had various bits taken out of it, specifically support from commercial partners and the financial details, as well as the names of suggested referees.
- Cameron Neylon
I'm going to do a round of looking at some of the Science Social Networking sites again. Is anyone active on ResearchGate, Epernicus etc. and interested in testing functionality?
I'm willing to keep an open mind but so far FF surpasses these in terms of networking and ease of use. But if you want to experiment I have accounts in many of these and I would be willing to try.
- Jean-Claude Bradley
I'm really just looking to make sure that things haven't moved on and improved significantly, particularly in the light of the NIH projects.
- Cameron Neylon
I tend to migrate to social networking sites based on "pull" - virtually the only time I go on LinkedIn or Facebook is when I get an email alert to something relevant to my interests. I would assume that if there was anything really cool going on in these new sites I would get these alerts generated by actions by you and my other friends.
- Jean-Claude Bradley
BTW Cameron - that is one of the issues I'm finding with Wave - I tend not to check it because I don't get alerts that there are updates - is there a way to get an email alert for Wave updates?
- Jean-Claude Bradley
Yes, there is an email alerter. I'll add you and it to Wave...
- Cameron Neylon
Agreed to the general point though - if there isn't a pull, I'm not going there really. And I think that is a big issue with Wave - people just aren't checking in.
- Cameron Neylon
@Jean-Claude I don't think there's currently a way of doing this with the current interface without adding a robot but I saw there's a robot on the Haskell public wave which has similar support http://wave-xmpp.appspot.com/public...
- Dan Hagon
I'd be interested in testing (I recently started looking over Epernicus for an article on NGS). Where is the email alerter for Google Wave? Currently, I'm using Waveboard (Mac), which alerts you when there's activity. However, it needs to be running in order to do so.
- Walter Jessen
Just added you to a Wave with the email notifier Walter...
- Cameron Neylon
I have accounts on Epernicus, SciLink, Laboratree, and maybe could consider BenchFly a social networking site too, but like JC, I don't go to any sites besides FF and Twitter (and those are typically through 3rd-party apps), not even Facebook or LinkedIn, unless I get some alert. But I would be happy to see if anything's changed in those science-oriented sites I mentioned
- Shirley Wu
from twhirl
I do get alerts that new people have joined the organic chemistry group in Research Gate but there is no discussion and my questions have not been answered there by anyone so not much motivation to check in.
- Jean-Claude Bradley
I have accounts at NN, Epernicus, BioCrowd and SciLink. I have begged for account deletion at the latter for months, to no avail and have not visited most of the others for as long as I can recall. So: active - no, interested - no. It's all FF/Twitter for me.
- Neil Saunders
It's alright - this is a benefit of the doubt exercise - making sure that things haven't changed or that we've missed something. My brief look around yesterday suggested that nothing much has but I wanted to make sure I'm not missing something.
- Cameron Neylon
What about the criteria for comparison other than some "pull" functionality (which they all seem to have, to different extents)? Does usability boil down to feed import/ export and (hierarchically) threaded conversations ordered by novelty and importance, as at FF?
- Daniel Mietchen
It would be worth doing a compare and contrast - also things like Math Overflow and even some of the chemistry blogs act more like community sites. Seems particularly apposite with respect to Pawel's blog post yesterday about the idea to set up a next generation sequencing community site.
- Cameron Neylon
I have a ResearchGate account but don't actively use it. I currently do some FriendFeed, Nature Network (where my blog is hosted) and Google Wave, but mostly Twitter.
- Martin Fenner
The last issue (November 23) of the German computer magazine c't has an article on social networking for scientists. They like ResearchGate and Mendeley, but also include ResearcherID, Scholarz (a German network), Nature Network, SciLink and Scientist Solutions: http://www.heise.de/ct...
- Martin Fenner
That c't article (which shall come out in some OA fashion soon) may serve as guidance but I found the choice of networks therein rather arbitrary, and the comparison between sites was done on a more general level rather than on the basis of specific criteria.
- Daniel Mietchen
The article makes two obvious omissions: a) no mention of CiteULike (or Connotea), b) no mention of the recent $12 Mio social networking NIH grant to U of Florida/Cornell University. There are some more things in it I don't like, so I wrote a letter to c't magazine.
- Martin Fenner
Cameron, what criteria were you thinking of using?
- Mr. Gunn
Key questions: a) What is the immediate impression on signing up? Is there a pull for people to come back? b) What functionality is being offered? Is it immediately available? How dependent is it on having a network in place? c) Funding model and stability d) User numbers, ideally active users and accounts, but whether we can get those is another question. Those aren't very objective criteria and they are built on my biases but nonetheless
- Cameron Neylon
Chris - when you talk about "credit" are you expecting tenure and promotion committees to count it or do you have some other system in mind? If you set something up I have content that might be suitable to play with. As for citability - in our last few papers we have used blog posts and wiki pages as references and have not had any problems with that - so I think the system is quite flexible and can accommodate the types of activities you are proposing.
- Jean-Claude Bradley
I think Chris means system credit or karma. The idea as I understand it is somewhere between Friendfeed and Stack Overflow
- Cameron Neylon
Thanks Cameron, yes, that's what I meant by 'credit' - however, by quantifying and metricising that credit, there is a possibility that one day tenure and promotion committees may want to use it as another measure of a scientists influence in a field. Apologies to Cameron for hijacking his thread. There is another discussion on this blog post here: http://friendfeed.com/chrisle...
- Chris Leonard
That's fine, it's not my thread, it the communities thread :-) Pointers are good, they link up the information.
- Cameron Neylon
Blog postings to replace (journal) papers and (in-depth) peer review a luxury that can only be acquired if paid for and to be replaced by blog comments instead? Weakening both readability and certification? That does not sound like a healthy idea.
- Wobbler
Wobbler: why should blogs lack any aspect of peer review? the standard of any publication depends on how editorial powers are used
- Mike Chelen
...and we already pay for peer review. It just isn't a cost transferred as actual cash.
- Cameron Neylon
But blogs do not have any editorial powers? What advantage do blog postings have over (journal) papers? They lack format = lack of consistency = lack of efficiency = lack of scalability. Are you seriously suggesting that blogging/blog posts have the potential to replace journal publishing/ (journal) papers as the primary scholarly communication model/channel? Upgrading the traditional...
more...
- Wobbler
@Cameron: that's true, but now peer review is at least mandatory for the primary scholarly communication model i.e. scholarly publishing. Replacing that with something else and having peer review only on request/payment is a very different story.
- Wobbler
Wobbler - there is a difference between requiring the peer review to be performed before making some information public and allowing it to take place after that. I do not see why the latter option would generally fare worse than the former. In fact, we already practice it here at FF, with numbers of likes and comments roughly indicating the popularity of a topic, while the quality has to be sought in the individual comments (and of course the source item that started the thread).
- Daniel Mietchen
... it isn't a cost transferred BY YOU as actual cash. Yet. It should be, in my not-terribly-humble opinion, however, because the market disconnect in the current system has proven ridiculously unsustainable. Wobbler, some of my blog posts have had more measurable impact than anything I've ever written. Sure, it's a lightning-strike sort of thing, and most of my blog posts languish in...
more...
- D0r0th34
@Daniel: I'm not talking about post-"publication" peer review. That's still different from random blog commentary on blog posts. There's no evidence that what we're doing here isn't just a "niche" thing that works well because we're a niche. There's certainly no consistency in quality in our blog postings (well, at least not in mine :p ). Not to mention a lack of consistency in...
more...
- Wobbler
@D0r0th34: No, we should absolutely not ignore lighting strikes. But we should see them as lightning strikes and consider them to be an exception more than a rule and focus our attention on something that provides that level of quality more as a rule than an exception. Blogs as a complement to (journal) papers is great. But once you start to see it as a primary source, a replacement for...
more...
- Wobbler
We don't know about our OA bets. As for slow-and-steady, a well-run blog isn't? Lightning strikes aside, building a reputation and a readership is hardly an immediate thing.
- D0r0th34
@D0r0th34: That's one more reason why blogging as the primary scholarly communication model is a broken idea. "Popularity" and "building a readership" will be important for blogs (and other post publication peer review models) to be visible/significant. But aren't we going after journals for using their JIF to attract peeps to read their stuff? How is "blog (poster) popularity" to get a...
more...
- Wobbler
I think the most important property of non peer-reviewed scientific communication is that the content be easily indexed and searchable. Relying on comments and rankings can be very misleading indicators for utility in long tail systems. For example we get over 100 searches a day for our solubility data via Google and Wikipedia but we have never had a comment or any type of feedback from the people who searched for and found information.
- Jean-Claude Bradley
Shrug. System-gaming goes on everywhere; there are a number of studies of citation-impact gaming, if you look. Also, why is connectivity a bad thing? We are talking about scholarly *communication* after all, right? Restricting "what counts" only to what goes through the baroque serials-publishing process is IMO an extraordinarily blinkered and limiting view of how knowledge really advances. Sure, it's not easy to come up with more inclusive views -- but that doesn't mean it's not worthwhile.
- D0r0th34
The problem is that I'm not sure we can talk about "gaming the system" rather than "an intrinsic part of the system that everybody will be forced to play or greatly risk invisibility" when it comes to blogs and other models relying on postpublication "peer review". PLoS ONE is, intentionally or not, already trying to stake their claim on an as large a readership/community as possible....
more...
- Wobbler
@D0r0th34: And connectivity can be unfair if your serious/scientific works are getting more attention than others simply because you've managed to draw a bigger crowd through non serious/scientific stuff. On a slightly more personal note: for someone who occasionally complains about the (lack of) readability of (journal) articles, I had expected that you, of all people, would appreciate...
more...
- Wobbler
I have to say reading down this I am unsure of whether the complaints apply to blogs or journal articles. Consistent structure and copy editing would be nice but it is rare for both blogs and journal articles. Quality is an issue across the board. Going back to peer review - it's only mandatory for the author, refusal rates for reviewers are going through the roof and unless we acknowledge that cost the system will collapse sometime soon.
- Cameron Neylon
@Cameron: Consistent structure and copy editing are rare for journal articles? They are? Not entirely sure about copyediting, but surely most, if not all, journal papers have a recognizable structure? And I don't think they're as rare or rarer than for blog postings. I also think the issue is with peer review, and not with the (journal) paper (format). As such, we should find ways to...
more...
- Wobbler
Of my recent papers, only one received close copy editing by anyone but me. And that was the Nature piece for which to be honest I would have been happier if the editor had got a co-credit. And formats are all over the place - maybe consistent for a single journal but that's not use to me. The costs of both peer review and publication are so high we need to find a way to lower them -...
more...
- Cameron Neylon
@Cameron: I'm not sure that's a convincing enough argument for me. Maybe your other papers were written clearly enough already? You're a prolific blogger/writer, Cameron. It's not weird to assume that your ability to communicate concepts clearly is higher than the average scholar. Maybe high enough to not warrant copyediting (in a lot of journals)? My impression of journals is that...
more...
- Wobbler
Well others can pitch in but perhaps a different anecdote. Until I started getting into arguments with Maxine Clarke I didn't even realise that journals might do copy editing. Nature and similar are very different beasts to the average of course.
- Cameron Neylon
So, generally speaking, only the high profile/impact journals provide copyediting services? Hmm, that is definitely not what I expected. If you had to estimate the % of journals that provide copyediting services, what % would that be? The (top) 10% of all journals?
- Wobbler
I have the same experience as Cameron - the only time my manuscript was copyedited was when I published in Nature
- Jean-Claude Bradley
So far as I'm aware, no-one here wants to replace peer-reviewed journals entirely by blogs. Yet that seems to be what you're arguing against, Wobbler. For some functions, journals are a lot better than blogs. But for other functions, blogs are a lot better than journals. At the least, I really can't imagine how, say, DHJ Polymath or Galaxy Zoo or the Open Dinosaur Project or [fill in...
more...
- Michael Nielsen
Most of this is as a response to an FF comment by Chris Leonard on the 23th of November in this thread, who is arguing for exactly that.
- Wobbler
Cameron, any progress on the roundup? Is there any information I can provide from Mendeley?
- Mr. Gunn
Right - getting there slowly! Have set up a wiki page (ignore the state of the rest of the site I am working on it!) at http://wiki.cameronneylon.net/index... You should be able to login with openids, any problem give me a yell. I would suggest a week by week schedule to dive into and try and use a specific site, give it a good shot and then report as we go. I...
more...
- Cameron Neylon
Cameron, what do you mean by "stability" - things like a service being bought/shut down vs. server outages? What about one week to agree on parameters and sites to check? I added data portability.
- Daniel Mietchen
I was thinking more of medium to long term financial stability - but technical stability is a good criterion in terms of functionality. Data portability is a good point!
- Cameron Neylon
Cameron, I spoke with Drew Endy, Bill Flanagan, and a couple other PIs that use OpenWetWare (Maureen, Pam) last week about the future of OWW. There are two major issues (a) funding and (b) overhauling the platform. I think funding will work out, if we can figure out what is the best way to do (b). Bill and Drew have some good ideas at this point, but in my gut I think we're still not...
more...
- Steve Koch
I guess my easy question for everyone who's familiar with OWW: Do you think with the resources we have (one full-time excellent lead developer) we can transform OWW into a killer openscience resource for many more people going forward? One thought that keeps coming to me is that something could be (needs to be) done to tap into the energy of the user base. I.e., obsessed students who...
more...
- Steve Koch
Another thing that keeps coming into my head since the conference call last week: FriendFeed is quite possibly very similar to what many people need for OpenScience. As far as science goes, we generate information from all kinds of different sources (Machine-specific data; gel photos; microsoft word; evernote; scratch paper; blogging; etc.). This needs to be aggregated and shared in a...
more...
- Steve Koch
Oh, and to clarify a bit: I don't want to replace FriendFeed with OWW. I want to use the FriendFeed model as a starting point for the new OWW. As an OpenScienceAggregator / Networking tool. As others have pointed out, much of the value of friendfeed is that it's not limited to scientists generating data.
- Steve Koch
Steve, that's a great way of asking the question. I'd go one step further and say how can we make it the framework in which we can integrate all the other things we do on other services. It's never going to be a no-brainer to move from what you use to something else - there is always the simple problem of the activation barrier to change - its a question of the balance. But my guess is...
more...
- Cameron Neylon
Cameron, I agree with you exactly: I don't want people to switch, and indeed I want to think "one level above." Do you think there's a real possibility for doing that?
- Steve Koch
If we could coordinate a series of activities and get proper funding then yes. Quite a lot of interest in the pieces of this (including the grant I'm currently rushing to finish), Chris's ideas further up this thread, OWW obviously, Mendeley/Citeulike/Zotero. But coordination is the hard bit - and getting agreement that its what enough of us want. Do I think we have a clear idea of what...
more...
- Cameron Neylon
Should we include some discipline-specific ones or are we going for general-purpose only?
- Daniel Mietchen
thank you for sharing, sounds very interesting, Cameron, Q: is this 'Impact summary' just one summary among others for this project proposal or is it the only summary?
- Claudia Koltzenburg
ah :-) on text one, some more points emerge: -- it is unclear to me what CRU refers to; -- in 1) "The proposal will also work towards" - I guess this is intended to say "The proposed project...; -- in 2) "Will will" should maybe read "We will"; -- does the structuring 1) 2) 3) correspond to the work packages of the proposal? if not, how *does* the summary structure relate to the main...
more...
- Claudia Koltzenburg
re text two "easy to use tools that get out of the researcher’s way" - do you mean "tools that are so good that researchers do not realize they are using these tools"?
- Claudia Koltzenburg
Re: text 2 absolutely - that is the aim - probably not quite attainable but a worthy goal. Re: text 1 - yes organization needs improving as to what the point of the three points is. And my sentences are too long. CRU is the Climate Research Unit at East Anglia of leaked emails fame which I will actually take out as a direct reference because it will just get people's backs up for the wrong reason
- Cameron Neylon
Should also point out that we're aiming to make stuff available, not necessarily completely tackling the usability question. Might be best framed as "how many research objects and how much metadata can be collected without bothering the user?"
- Cameron Neylon
from twhirl
agreed, well framed, sounds to the point, it seems to me. Yet: "the user" - what threshold level of experience are you implying for each of the areas you are addressing? is there a phase aspect to what "the user" is, e.g., is this entity made up of other sub-entities at the start of the project than by the time it ends? (am referring to "driving adoption and uptake" in text 2)
- Claudia Koltzenburg
The user in this context is the individual research scientist. Was using it interchangeably with "researcher" to try and keep some variety but its a good point that its ambiguous, particularly when I talk about crowd-sourcing at the end. The "audience" for this document is a panel of primarily biological scientists but the person "handling" the proposal can be expected to have some reasonable IT or information management expertise, as should the referees
- Cameron Neylon
Neil, great post. And you're right, we do make things too complicated sometimes, but do we do that at the level at which we ask questions, or at the software implementation level? My take is the latter, cause you need to ask questions the way you want to, but that doesn't mean what makes it all come together has to be one complex mess
- Deepak Singh
Glad you like it. One of those that bubbled up out of frustration at inability to achieve! I feel that science is the business of turning complex (real-world) things into simple models - and that we've moved away from that idea.
- Neil Saunders
I'm a sucker for this kind of ambitious thinking. Go Neil!
- Bill Hooker
I think it's a good sign that things like this are now obvious. Things start out as a complex mess of disconnected things, overlapping complicated ways of connecting them are devised, then it becomes obvious what the simpler thing to do is.
- Mr. Gunn
Great ! But aren't you re-inventing something like RDF Neil ? feature/probe/value is nothing but a RDF statement...
- Pierre Lindenbaum
No, I don't want to reinvent anything. If RDF will work for me, I'll use it. I'll also use SQL, NoSQL, key-value pairs, document-oriented or whatever it takes. I just think that trying to integrate data by combining other peoples large, complex representations is not working. We need to simplify the whole business.
- Neil Saunders
I think there is a middle road here - we need high level generic descriptions like what Neil is proposing (and like my "We have stuff, we do stuff to it, which makes stuff"), but also a way of pointing to more sophisticated information that might be useful in specific contexts. I think we can have the best of both worlds as long as the data representation is separated from the metadata and the organization of each can be described in a machine readable (and agreed!) form
- Cameron Neylon
I'm too old school, leaving comments on blogs... who does that any more. I’m sure you’re aware that you’ve just described a model using *triples*. Which means you could start storing these kinds of simple relationships in a triple store like virtuoso etc. As you say, you don't have to reinvent anything, just simplify the use (conventions) of existing approaches (e.g. RDF). I would like...
more...
- Greg Tyrelle
I like blog comments :-) Yes, my example looks like RDF triples. No, that was not really my intention. Let's ask these questions: (1) what data relationships would make sense to a biologist? (2) what are the commonalities in the data, which a biologist may not have considered at an abstract level? As I wrote in the post, many datasets that look different are really different ways of looking at the same thing.
- Neil Saunders
The joys of data modelling :-) For (1): I'm afraid asking for a definition of some data relationships is building an(other?)) ontology.
- Pierre Lindenbaum
Let's put it another way. What we have, presently, are quite complete, often large and complex, but useful and usable descriptions of individual experiment types. "Integration" essentially means "parse them individually and mash-up the results". That's what makes it difficult. Perhaps we need an "ontology of integration" :-) But let's keep it really, really minimal.
- Neil Saunders
I actually think you will struggle to find data commonalities across bioscience. Even the simple proposal of target, measurement, value could break down in many cases e.g. we tried ages ago to get some intensity data from a bunch of microarray experiments and we gave up because we couldn't get across what we needed. What are you really measuring? Does it mean the same thing to different...
more...
- Cameron Neylon
I think there's a good case for storing, in the first instance, raw values. Figure out how to process them later (that's statistics). Focus on trends (up, down, stayed the same). Focus on well-defined variables that do mean the same to everyone (intensity, in theory = amount of transcript, regardless of the very real difficulties). And I think more experiments fall into...
more...
- Neil Saunders
@Pierre freebase is exactly what I had in mind, however the web client (the best part) is not open. @Neil Store the data first, ask questions later. Nice. One of my hopes for semantic web technology was that is could be come a universal mashup system (RDF+ontologies+triplestores). But you start down that path, and you suddenly realise that the semweb is asking you to get your data...
more...
- Greg Tyrelle
But for me your example of a gel isn't raw data. The raw data is the image. Which might have several targets or assays on it. Up/down stayed the same is only really of interest in particular types of science. And I challenge you to find any well defined variables :-) Intensity to me is a measure of optical density but questions of background, object size, masking, averaging algorithm...
more...
- Cameron Neylon
from twhirl
But agree with what you and Greg are saying, first thing get the data somewhere, with allt the metadata you can automatically collect. Then worry about capturing more metadata as people do stuff with the data. Writing this grant proposal right at the moment.
- Cameron Neylon
from twhirl
And in microarrays, "raw" data is the image of the slide. But aside from a cursory inspection to ensure that it isn't complete rubbish, nobody much cares about that. I'd argue that there's a point in the preprocessing at which a numerical value emerges which could be called "useful" and which encapsulates the object being measured. It needs more work (e.g. normalization) to get information from it, but it's the "value" in feature/reporter/value.
- Neil Saunders
To me this about finding something a bit like an upper ontology that describes the general category that objects (targets, assay, value, inputs, outputs, data, process, sample) fall into. That lets you do the general integration, and the more detailed local data structures become more useful as you can agree more and more on what details are important. So I absolutely agree with what...
more...
- Cameron Neylon
Heh heh It was exactly that image that we did care about - which was the problem :-) I will admit to being an edge case, but in some ways we're all edge cases, they're just different edges...
- Cameron Neylon
Neil, may I link to this FF thread from Book of Trogool?
- D0r0th34
:-) Sure, different questions, different "levels" of data. I guess my angle is more a statistical one: how do I compare (seemingly) quite different datasets - what numbers can I extract and crunch? Less interested in the capture and description of data at every stage in the process.
- Neil Saunders
Sure, and those are very complete descriptions of experimental components. But what I want is: "I saw A on my gel, B in my LC/MS, C on my expression array and D on my SNP array and when I plug all that into some Bayesian predictor, it says cancer" :-)
- Neil Saunders
Ontologies are not the issue, it's more low level than that. I also work with microarrays, proteomics, metabolomics, and numerous physiological data sets. To keep all the data in one place I use a relational database, in this case postgresql because I like to store raw intensity values in array datatypes, along with pylons based web interfaces to display various views of the data to my...
more...
- Greg Tyrelle
My argument would be that the reason you're less productive is not because of the RDF and ontologies per se, but because the ontologies aren't really built for what we want to do. They're for describing certain types of outcomes, not for integrating data in a discovery phase. But Neil's (entity, probe, value) is still an ontology of sorts. It is just a higher level one. My belief is...
more...
- Cameron Neylon
But keep the discussion going - this is exactly the problem that e.g the SAGE project will have - http://sagebase.org - and as a notional member of the data working group I could do with all the ideas and help that's out there...
- Cameron Neylon
We are thinking too much in terms of data representation here. In the end what you are looking at is a data warehousing problem. You have different front end systems and you want to be able to pull data in for offline processing into a warehouse. That's pretty much what you do at any company doing a lot of analytics/business intelligence. Different types of data being collected in...
more...
- Deepak Singh
Neil, I was under the impression that normalization across arrays and labs wasn't actually a solved problem, yet. Surely that would have to come first before stripping things down to just assay-key-value?
- Mr. Gunn
Normalization ... aaargh! Most definitely not a solved problem
- Rajarshi Guha
Normalizing within your own experiments is hard enough, never mind across unrelated datasets. It's something we have to solve though, to make the most of public data.
- Neil Saunders
Neil, you may be intersted in looking at the Ontology-Based eXtensible Data Model (OBX) that was developed by Richard Scheuermann's group at UT Southwestern. It is being used for the ImmPort database (www.immport.org) The OBX model utilizes the BFO / OBI ontology as guides in creating a data model that is robust to new datatypes. You can see a presentation about it here:...
more...
- Burke Squires
Thanks Burke. ImmPort looks very impressive, I must say.
- Neil Saunders
This reminds me of what the TCGA is starting to do, by defining "data levels". For microarray data, Level 1 might be the raw images, Level 2, the intensity calls, Level 3, the normalized intensities, and Level 4 information on whether it's up or down regulated across multiple samples. For people like me, doing integrative analyses, it's easy to focus just on the higher level data and...
more...
- Chris Miller
which is exactly why you need separation of the layers and tools to bring data together for the downstream stuff
- Deepak Singh
from IM
Neil, I think you have just explained why tab-delimited files are often more useful than complex XML representations of the same data ;-)
- Lars Juhl Jensen
Tab-delimitted files would be grrrreat for me in my lab. If any of the rest of you would like to share our data, however, then you're completely screwed. Is the problem not that we're all duplicating each other's work by writing the same kind of parsers for the same kind of data? Proteomics (for example) has a standard (http://www.ebi.ac.uk/pride/). Is it really so hard to use / develop the community-based tools that are being generated around this standard?!?
- Neil Swainston
Well, the ratio of usable tools to schemas/ontologies is a whole other debate :-) But sure, in principle the tools are there - for individual types of data. What I highlight in the post is the difficulty of genuine data integration, as opposed to the current "write a parser for everything and mash it up" approach.
- Neil Saunders
#1 rule of data integration - if a format exists, it will be used
- Deepak Singh
...and if it doesn't exist there is a 70% chance someone will create it :-)
- Cameron Neylon
Chris M makes an important point wrt data levels, analogous to trace archives vs sequence dbs. Extending the sequence analogy, obsoleting levels will become important (it will rapidly become cheaper to resequence rather than store sequence).
- Chris Cotsapas
I think, to be an OA challenge, you should not have phrased it as "list of publications on PubMed or" but "list of publications freely accessible via PubMedCentral or". Anyway, mine are not all biomedical, so I go for the "or" option: http://dbm.neuro.uni-jena.de/people... .
- Daniel Mietchen
NB deleted a comment because I don't know what Paulo is up to here, but I don't want to spoil it inadvertently.
- Bill Hooker
My only OA cred here is paying the OA fee for my dissertation, and pushing for that on an upcoming publication. Also doing a lot of repository work behind the scenes.
- Mr. Gunn
I'm afraid to get a virus on this link you just posted.
- Paulo Nuin
Ok I'll bite. Still working to get mine up on a single website but put a list in reverse chronological order on Paulo's blog. Basically of 12 papers since 2008, five are in OA journals, one is a CC-BY chapter in a book, and two others can be put online in final form six months after publication (which I haven't done in one case where I could). The ones that aren't proper OA I wasn't...
more...
- Cameron Neylon
"if Nature asks you for a piece you don't readily turn them down" - hard to disagree. At the same time, it's probably the best summary of the issues we are facing. Most of the discussion I had about future of science and impact of "2.0" meme onto academia were finished with a sentence along these lines :/
- Pawel Szczesny
Well there's two sides to it - one is the "Nature paper gives you credibility in grant proposals" element, the other is that it remains a more effective way of getting the message out than writing online or in OA journals.
- Cameron Neylon
Nature has been pretty accommodating - they agreed to make my book review OA http://www.nature.com/nature... - the other great thing about Nature is that they will publish work appearing on their open Nature Precedings
- Jean-Claude Bradley
That's true - I probably should have asked about that - didn't occur to me at the time...
- Cameron Neylon
Terrific. Are we still maintaining that list of "outputs resulting from FriendFeed"?
- Neil Saunders
I was planning on doing a demo of annotation at PLoS before the end of the year - perhaps this article would be a good candidate. As always, anyone willing to join is welcome.
- Daniel Mietchen
i added a note once, but now it won't let me add any other notes :( I don't see a rule about one note per person. I should have held off for a good one.
- Christina Pikas
I also just noticed that my "annotation" - provided the link to StackOverflow - shows up in the general discussion, where the title "Link" certainly is not helpful, and there is no way I can edit it.
- Daniel Mietchen
maybe something is broken, my note appears in general comments but also in that portion of the text as a comment. maybe that's why I couldn't add other notes?
- Christina Pikas
Not sure why you can't add more notes. Certainly been able to in the past. I see both notes where they are supposed to be I think. But they will also appear in the general comments as well I think.
- Cameron Neylon
Great article! I really need to add some comments or notes, just to prove the authors' point :-)
- Björn Brembs
BTW, when does PLoS finally get karma? I've been asking for proper 'show off' userprofiles for like ever :-)
- Björn Brembs
Cameron, et al. - What's the most useful thing I could do to nurture and support this renewed interest in article level metrics? (not from a competing data product point of view, but a let's get some good technologies out there with good visibility)
- Mr. Gunn
@Cameron: Exactly! I even think having a profile where you can post a pic and see how many papers and comments were published, papers edited, etc.was the very first thing I asked for when I signed up :-)
- Björn Brembs
But it needs to be federated across publishers... :-)
- Cameron Neylon
if authors put in their 'customer' weight, this will go faster, so why not go syndicate :-)
- Claudia Koltzenburg
I think I'll use this paper in my spring thesis class -- this is the main one where I discuss publishing models -- and maybe I'll demo Diigo with this as a class project next to an article that discusses IF.
- Mickey Schafer
While we're on the subject of functionality wish lists, I would also like an embed functionality for PLoS papers. Collecting my publications together but don't want to duplicate copies and reduce googlejuice for the journal - at least not for the OA papers anyway...
- Cameron Neylon
BTW, why isn't there a way to register this thread with the article? Why are we posting here and not on the article? There's got to be a lesson to be learned from this :-)
- Björn Brembs
from iPhone
I've included a link to this thread in a blog post: Article-level metrics getting attention http://ff.im/bGuNY
- Jim Till
+1 Bjoern :-) another question along these lines would be: why does Cameron's intial FF message link to CiteULike and not to http://www.plosbiology.org/article..., or plainly doi:10.1371/journal.pbio.1000242 ?
- Claudia Koltzenburg
Because that was the way I brought the link in. I think that that pointer is appropriate. It is a pointer to the fact that I bookmarked it. Other people linked to the paper directly. Perhaps the issue is that we accidentally aggregated around the "wrong" item to talk about the paper. I'm not sure this is a problem as long as the referral works - its a UI irritation not a problem with...
more...
- Cameron Neylon
well, not directly, maybe in this ff-thread we're just providing some material for what you say in your paragraph "Technical Solutions to Social Problems", namely: "approaches that gather information from processes that are already part of the typical research workflow are also much more likely to succeed." - even though ff may not be part of 'the typical research workflow' (yet?) - and...
more...
- Claudia Koltzenburg
That's true, and certainly conversation sparked by the paper. But how to capture that in a way that is useful further down the line might be tough...
- Cameron Neylon
"It would be in the interest of PLoS to combine their article-level metrics with an author identifier as soon as possible, most likely the proposed CrossRef ContributorID, rather than the Elsevier Scopus Author Identifier or the Thomson Reuters Researcher ID."
- Mr. Gunn
from Bookmarklet
if the metrics are easily copied, then each repository can include and mirror data generated by the others
- Mike Chelen
The TELSTAR project is pleased to announce an event to showcase and discuss innovative ideas and developments in the use of Reference/Bibliographic Management software. The event, to be held in Milton Keynes on the 14th January 2010, will be free to attend, although we are limited to 50 places which will be allocated on a 'first come, first served' basis. Once all the places have gone, we will maintain a waiting list and contact people if extra places become available. As well as presentations from a variety of perspectives, there will be plenty of time on the day for networking and discussion.
- AJCann
from Bookmarklet
Sadly, I can't go to this, so that's a place freed up for you ;-)
- AJCann
4.ª CONFERÊNCIA SOBRE O ACESSO LIVRE AO CONHECIMENTO (open access conference) #ConfOA09 LIVE In Linux, open a konsole and write mplayer mms://streaming.uminho.pt/OpenAccess (some in english and some in portuguese)
"AcaWiki is like "Wikipedia for academic research" designed to increase the impact of scholars, students, and bloggers by enabling them to share summaries and discuss academic papers online. AcaWiki turns research hidden in academic journals into something more dynamic and accessible."
- Daniel Mietchen
from Bookmarklet
Good idea, but unlikely to take off, IMHO.
- Björn Brembs
To my understanding it all rests on the willingness of people to write summaries of articles far more substantial than the already available abstracts.
- Jean-Claude Bradley
Why do you think academic PI's would prefer put summary on acawiki but not in blogs?
- Alexey
from iPhone
Alexey - I can see situation where people who don't have blogs would post on AcaWiki - but I don't know if there is a critical mass of people willing to do that
- Jean-Claude Bradley
I think there would need to be some sort of reward. I suggested perhaps for literature reviews (plenty of thesis chapters out there that never get published) that providing a doi and submission to the new Rapid Research Notes database might be interesting in that regard.
- Cameron Neylon
Interesting. DOIs cost money, but a fund for assigning DOIs would be good to have.
- Jodi Schneider
@Jodi, they do, but not that much. If it's worth the CV fodder then I don't think a small payment for a doi is a big problem (if its around e.g. the $10 mark)
- Cameron Neylon
"4.ª CONFERÊNCIA SOBRE O ACESSO LIVRE AO CONHECIMENTO Universidade do Minho - Braga Data: 26 e 27 de Novembro de 2009 Local: Anfiteatro B1, CPII - Campus Gualtar"
- paula simoes ☃
from Bookmarklet
Looking forward to this but will only be there on the Wed night and Thursday unfortunately.
- Cameron Neylon
Some day all science journals will contain scientific knowledge which is as verifiable and as immediately reusable as this.
- Dan Hagon
from Bookmarklet
It would have been great if I could easily have found one... one(!)... example of supplementary data in a re-usable format. But time and agin I found PDFs, with the odd Word document!
- Chris Rusbridge
The example I quoted was a chemical one (http://www.rsc.org/suppdat...); do any of you have any suggestions on what might have been better ways of coding those supplementary materials? There are some chemical method descriptions; CML? Fitting the data to a mathematical model, not sure how it might be best described. And some NMR spectra and other images I don't personally recognise.
- Chris Rusbridge
No criticism of these authors intended, but what would be better nthan PDFs in such a case?
- Chris Rusbridge
Almost anything :-( Seriously the problem is that people think of the online space as a dumping ground, not just a place for data. The assumption is that you have to dump everything into one file. What would be helpful would be good upload systems that let you put the raw files in and help you organize and index them. But even then, when you are in the final stages of manuscript preparation are people really going to send around a directory of files?
- Cameron Neylon
Someone told me recently that Nature have appointed a staff member to make supplementary data more relevant, which is encouraging if true.
- Chris Rusbridge
I suppose it IS "supplementary MATERIALS". But I would hope there is something a little higher than "raw data files" to recommend?
- Chris Rusbridge
I think Nature are doing some good work in this space but I don't know the details. The problem is that it has always been seen as the dumping ground for things "not important enough" rather than an opportunity to include more of the important stuff. But it all comes back to attitudes about data sharing. If people thought it was important they'd make the effort. The fact that they don't...
more...
- Cameron Neylon
There's a structural problem with expecting publishers to deal with usable supplementary (or even primary!) data: publishers enter the research process too late. By the time a publisher sees anything, the data may well have been ruined by a well-meaning researcher. While admitting my own bias -- I still think this is (for now, at least) an institutional problem.
- D0r0th34
Even the institution possibly comes in a bit late really - we're back to needing to re-educate scientists again aren't we?
- Cameron Neylon
yes, I think we are. so for the new breed, we can start when they're in school. the oldsters? not much left besides their library. if they'll listen to it. which many won't.
- D0r0th34
This is, of course, a culture change >> than asking people to put their papers in IRs. So it won't get fixed any time soon. Doesn't mean don't start, however. And there are some beacons of data sharing out there. Now one of them needs to get a big prize & claim that data sharing helped!
- Chris Rusbridge
I don't think there is a general solution here - the most convenient data formats vary between fields. As I've said many times for chemistry the JCAMP-DX format is vastly superior to the common way of submitting NMR spectra as PDFs in Supporting Info if the intent is transparency. There is no new technology to invent here - it requires the researcher to take a few seconds to do the...
more...
- Jean-Claude Bradley
@D0r0th34, not clear that the structural problem is avoidable. If well-meaning researchers are gonna destroy their data, the institution can't do anything about it! "Sheer curation" (just means managing your data well while you're creating & using it, I think) might help, but that's for the good guys anyway. If publishers (or reviewers) start saying no to articles where the data are mangled up in PDFs, then maybe we'd get somewhere? At least an incentive where none exists?
- Chris Rusbridge
There is some rays of hope in the work going on for instrumental data standards like AniML. If instruments default to open standards we will get further much faster. Also currently writing a grant that attempts to link a "just save it all somewhere as blobs" approach to helping the user connect things up together to try and provide metadata. Doesn't solve the format problem of course but it could be a start.
- Cameron Neylon
Cameron - my understanding is that AniML is not up to speed with storing and rendering NMR spectra using Open Standards and Open Source Software - do you have different info on that?
- Jean-Claude Bradley
Chris, it depends on the reasons for the data destruction. If it's apathy, the answer is incentives, as has been noted. If it's just ignorance, though, the answer has to be education and collaboration -- and crucially, as early in the research process as possible. That to me screams "institutional involvement."
- D0r0th34
No, I don't think AniML is up to NMR yet but what I have heard is that some instrument manufacturers are getting interested in open formats. Not necessarily NMR but big players in the instrument world who are sick of having to develop their own complete software suites to handle proprietary data formats.
- Cameron Neylon
I think the AniML format has been discussed for many years. JCAMP already solves the Open Standard and Open Source issues and it does not require compliance on the part of instrument manufacturers - so far we've been able to find simple ways of converting NMR data from Varian and Brucker, IR, UV, MS, etc.
- Jean-Claude Bradley