What a snippy little rant! The closest thing I could find to an argument buried in there was "OA Journals aren't wildly successful right now, so they won't ever be." This is the closest I've seen someone come yet to Godwin-ing an OA discussion.
- Mr. Gunn
I try to look at these things as a way to test the model. This guy (?) seems bright enough, so when I have the energy it's worth going a few rounds -- I've drunk pretty deep of the Open Kool Aid, so there are going to be things I won't see clearly, and stuff like this helps me keep perspective. That said, I haven't got the energy right now so I've marked this one "read later". :-)
- Bill Hooker
I'll summarize it for you, Bill. "I don't think OA will work because: 1. It's not working now. 2. All OA journals are low impact factor and will remain that way. 3. Publishing is expensive, so no publisher in his right mind would let people get articles for free."
- Mr. Gunn
from IM
Or more fundamentally "I don't think OA will work because: 1) I define _work_ as: must do exactly the stuff Toll Access does today which includes having an exorbitant, cross subsidized business model"
- Anders Norgaard
I agree with the broad thrust of his argument, but I would suggest that the conclusion should be that OA publishers have to find a model that can support high-impact journals. PLoS seems to be going in the right direction.
- Bob O'Hara
I think post publication sorting is the better model for OA high impact classification - like "Frontiers" http://www.frontiersin.org/aboutfr... - there is little reason to incur heavy editing costs up front
- Anders Norgaard
+1 Anders; I like the PLoS ONE/BMC Res Notes model myself, where upfront editorial decisions are limited to scientific and ethical soundness, without reference to impact/sexiness/trying to predict the future. What OA adds to that is the possibility of rich and varied post-publication metrics and sorting, which TA makes impossible (except under the Thomson Reuters all-your-metrics-are-belong-to-u$ model).
- Bill Hooker
Post-publication sorting would be a pain for me - I want to read good papers when they are published. Plus I want to read papers that are good rather than sexy. In practice "good" includes assessments such as importance of the results to the rest of science.
- Bob O'Hara
For me, his position was summed up by one of his comments on the first post "We have to have high impact journals, and people have to be paid to edit them". For me the question is "Can we afford the cost of producing high impact journals" and indeed pace Bob "can we afford not to"? Must...write.....blog...post....
- Cameron Neylon
Don't understand "pace Bob" in this context -- Bob would seem to be arguing FOR the idea that we can't afford not to...?
- Bill Hooker
err yes, that's kind of what I mean in as much as I ought to phrase the question both ways to keep him quiet...yes, kind of backwards usage I admit
- Cameron Neylon
@Bob If good papers could be published sooner with less up-front review, would it not be fair to let people who want to read them ASAP do so? And let others who want more filtering wait? And do professional editors have an especially impressive record for predicting importance? Could such a judgement not be applied post publication (F1000)?
- Anders Norgaard
may be we need to check out who is really benefited with Open Access policy, according to recent study in Science -influence of open access (percentage increase in citations ) is very strong in the developing world (mention not India, China, Iran and others- where open access publications are mushrooming very fast) compared to developed countries (more than twice actually).
- Abhishek Tiwari
@Anders - (1) I agree that pre-publication release (e.g. arXiv) is great. I get updates from arXiv from a small part of my research interests, but it's mostly not interesting: I can't see how I could filter the good stuff if I got alerts about pre-prints in every area I work on. And F1000 is a read herring - read my comment again.
- Bob O'Hara
@Bob Sorry, what is the problem with the F1000 suggestion more specifically? Does F1000 choose "sexy" over "important"?
- Anders Norgaard
No. Read this: "Post-publication sorting would be a pain for me". Do I have to spell it out?
- Bob O'Hara
@Bob, can't speak for Anders but you really would have to unpack that for me to get it. You seem to think that editorial guesses about which papers are better than which other papers, as approximated by the woefully inadequate Impact Factor or by your own personal gut-feeling Journal Rank, will do a better job of sorting your literature inputs than you could do yourself with well crafted searches and some skimming of abstracts. I have more faith in you than you do, it seems. :-)
- Bill Hooker
Bill - I don't think you're as clueless as your comment suggests. Editors don't use IFs to decide which papers are better. Can you clarify what you mean?
- Bob O'Hara
Bob: 1. iiuc, you want pre-publication sorting, so that you can use where something is published as a guide to its likely quality -- this means relying on editors' guesswork; 2. you have some kind of personal method for deciding which journals signify quality -- the IF, or your gut-feeling Journal Rank, or whatever -- this means relying on editors' guesswork at one further remove; 3. I don't see how approximations to someone else's guesses can be as effective as careful searching and reading on your part.
- Bill Hooker
Ah, you did mean that I use IFs etc., not editors (grammatically, editorial guesses were what you claimed were being approximated).
- Bob O'Hara
I don't use metric to work out which journals publish good papers, so you are being incredibly presumptuous. Journals get reputations, and I use those combined with my experience in reading them to decide which are worth following. The fact that this works suggests that that editors are doing pretty well at "guessing". Not perfect, but then nothing is.
- Bob O'Hara
@Bob What is the problem with "post publication" filtering? That there is an extra delay until the papers are filtered? Wouldn't that be offset if jounals had a less laborious publication process?
- Anders Norgaard
I'm not presuming anything -- you are not reading very carefully. My point does not rest on HOW you decide which journals are good or bad, but THAT you do so and then go on to use that as a filter. You say "this works" and "editors are doing pretty well" -- I don't understand how you measure this. Perhaps you mean it works well enough for your purposes; OK, but it doesn't work at all for me. How many anecdotes do we need before we can call it data?
- Bill Hooker
Too late. I like to keep abreast of science. A quicker publication process wouldn't help too much, it would just mean I could read the papers earlier! I don't mind post publication filtering per se, but it doesn't work when I most want it in the publication process (i.e. when a new paper is published).
- Bob O'Hara
I cannot agree more with Bob. I really don't want to wade through piles of crap to get to the really interesting stuff. I don't care if it takes a year longer before the paper is eventually published - at least I am getting a pretty much finalized version rather than some half-assed first attempt. Plus, I can immediately filter out papers of little impact by just looking at the kind of journal they were published in.
- niewiap
@Bob I can see that currently there is sort of a "general availability" feeling to "when a paper is published". But would it not make more sense to think simply in pre filtering and post filtering? You seem to prefer to read papers after filtering (not when put on arXiv). How would a system with emphasis on post publication filtering change this?
- Anders Norgaard
Bora, if your question was addressed to me (I am assuming that it was about me being sure to judge impact based on the journal), no, I am not sure, but I am about 80% sure and that's enough for me. As for the remaining 20%, I can do the post-print assessment myself thank you very much, but I can still do 20% of the work rather than 100%. You read my post-publication review rant already, so I will not elaborate on other reasons why post-publication review alone is not going to do the trick.
- niewiap
I am not against post-publication assessment per se, but I am very skeptical about it being a replacement for the old school pre-pub peer review/tiered journals system. I think that the future is in a hybrid system, where each paper will have a few numbers attached to it, rather than just one. I commented on it already on Cameron Neylon's blog.
- niewiap
@niewiap: if you're confident that your cadre of trusted journals gets you 80% of the good stuff, that means the remaining 20% is spread over all the untrusted journals -- so to get it, you have to search/filter/wade through crap anyway. How is that "20% of the work"? -- since the same searching/filtering/wading would have presumably found the 80% as well.
- Bill Hooker
Actually, it's the 80% where we differ -- if you really can get 80% without wading, that is good enough for most purposes and gives you a flying head start even for those times when you need the remaining 20%. I just don't have anywhere near that level of confidence in pre-filters. Is this field specific? I'm biomed, Bob is stats, what are you?
- Bill Hooker
I am actually biomed as well. I am not saying that I can automatically detect bullshit science in high IF journals, though. I am just saying that if something gets published in a journal I have never even heard of, it's probably crap. The filter is leaky, I admit, but based on a combination of journal title, authors, and skimming through the abstract I can tell in 80% of the cases whether I will find interesting data inside and whether the article is fairly rigorous scientifically.
- niewiap
Besides, if you read Part 3 of my "crusade against OA" (referenced by Bora above), I have pointed out two main reasons why post-pub community-based peer review is unlikely to pan out as the only solution. First, it's prone to abuse. Second, nobody will have any incentive to actually do it. IMHO, pre-pub review (possibly supplemented with post-pub) is the only quasi-reliable solution, period.
- niewiap
Does it really take so much imagination to see that the beloved "I want the editor to do the sifting for me" will be enhanced in a new publication model compared to now? Now, we, the users have no choice of who does the filtering for us. There isn't even a track record! Of course there will still be people sifting through papers even if publishing reform actually takes place for real (and not the band-aid cosmetic changes around the edges now)...
- Björn Brembs
...They will be competing for our money to find what is best for us. Such services can then develop a track record: how often did they find papers which we liked and how often were they crap? In this way, they still can do the expert-sifting for us (which is valuable!), but they don't meddle with our careers any more. Plus, I get to decide who's best for my work and don't have to wade through tons of ToCs each month/week!
- Björn Brembs
In short: why should professional editors decide over careers, when all we want them to do is to provide some expert opinion on our work? Why should the people who we pay for a very limited and well-defined service, also decide our hire-and-fire policy, for which they are neither qualified nor paid nor invited?
- Björn Brembs
Bjorn. I really don't know what kind of a reformed system you are talking about, but if you show me one that is more objective, no more prone to abuse than the current one, and which will actually give me an easy way to tell at first glance whether I want to read a paper or not, I will bow down before the genius of Open Access and never speak badly of it again. All I am hearing right now is "Like, when we have the new system it is going to be sooooo kewl!"
- niewiap
"Cloud computing vendors such as Amazon and Google still aren't ready to meet corporate IT needs, according to Wakeman and other Premier 100 conference attendees. Security concerns topped the list of shortcomings, but they also cited reliability, availability and manageability issues."
- Anil Thomas
for systems biology it will be great add on if it live up to certain level, model simulation + model annotation, same for other simulation intensive areas
- Anil Thomas
The principles aren't too different. What I can't get my head around right now is what those basic compute units are going to look like
- Deepak Singh
"Big question is not whether data is open or not, but if data is well structured and annotated so that one can reproduce the results and re-use it without going out of context."
- Madhu Pandey
But without the data around, how do you decide how to structure and annotate it? We can't assume to know everything about the data
- Deepak Singh
Annotations using raw data is of little use, annotation requires extracted knowledge like ontologies (Gene Ontology, SBO), Domain dictionary and Knowledge models. Just attaching every numerical/string value we got does not make sense. We have each kind of data out there, sufficient to standardized the formats.
- Madhu Pandey
Standards (open or otherwise) that exist detached from actual data or implementations (or that are too attached to one specific implementation) often end up being not so useful (or worse).
- Eric Jain
Don't know the context of the original quote about open data being more important than open source. But most people here will probably agree that it's more important that GO remains open than that it is produced with open source tools, just to give an example.
- Eric Jain
BTW, anyone knows how I can make FF aware that by blog item that will show up on FF too, is actually a comment on this item?
- Egon Willighagen
I have to +1 to the importance of standards. Related post from Frank sums up my opinions on that one: http://peanutbutter.wordpress.com/2008... Of course you can't work in isolation with the data itself. That's where community standards come in: that is, standards that are developed by the community, for the community. Takes much longer that way, but you end up with happier people and greater uptake.
- Allyson Lister
Allyson, standards are very important, especially for data sharing, but best come up when people have access to the same data and you can get a community around them. Private data usually ends up leading to many different standards (which is an oxymoron if there ever was one). Completely agree on the community standard bit
- Deepak Singh
What Deepak said: before you can have community standards you need a community. And the best way to build a scientific community is around shared data.
- Cameron Neylon
Makes complete sense, Deepak. Agreed :)
- Allyson Lister
@Cameron - yikes, I think I see a chicken-and-egg situation. For how else to have good shared data except to all use the same format? :)
- Allyson Lister
Allyson, it is, but the whole concept of get it out there and don't try and make it perfect applies here as well
- Deepak Singh
You don't *need* standards to share or use other people's data. Standards can (but not all do...) make doing so a lot less painful. The pain in fact appears to be a major motivator for bothering with standards. That, and grant money (for academics) / the "no-vendor lock-in" marketing message (for companies)...
- Eric Jain
@Allyson start small? Seriously though, I can transfer data in a spreadsheet without it being in a standard format. This is good enough to communicate with interested people. Once we have more experience of transferring data between ourselves, and we have a community, it makes sense to talk about community standards. Like Eric says, as long as humans are involved, you don't need standards, but you do need someone to do lots of cutting and pasting in Excel (running away now...)
- Cameron Neylon
Let's say you have some interesting data that could be quite useful to a lot of people, but it's all un-standard and ugly. Should you let it languish in a closet until you have time to give it a makeover, or do you publish it anyway? If former, we'd still be waiting for the first release of Swiss-Prot :-)
- Eric Jain
I should point out that most of the data I generate is essentially two column but with no agreed standard format (or one in an early stage of adoption)
- Cameron Neylon
well I'm talking about standards like this http://www.smallangles.net/wgwiki... - which is a good start but at the moment only a few instruments write it and virtually no analysis software reads it
- Cameron Neylon
there's nothing at MIBBI which really helps me at the moment. Most of it is way too heavy weight and it doesn't really cover the kind of experiments we do. If we had tools that automatically pumped it out then I'd be much happier.
- Cameron Neylon
yeah, I guess that's the problem with any data standard. If you don't have devices that generate it, it's not much use. I'm trying to convince the engineers at my job about the need for something other than a wodge of .csv files, but I'm a ways off from even being able to have this kind of conversation with them, yet.
- Mr. Gunn
from IM
The key to getting attention is to observe how much time is spent loading CSV into Excel moving columns around and then graphing, as opposed to load data, graph. There is something about the visibility of columns of data in spreadsheets that makes people comfortable though. But the repetitive actions are what get to me and are pushing me towards more sophisticated tools (at least as soon as I can figure out why my #$&%! python variables are in the wrong scope)
- Cameron Neylon
LOL. They've written dedicated software for handling our data, but there's something about engineers writing software though. They're not exactly UI experts and I still end up doing a lot of messing about with columns of data, post "processing".
- Mr. Gunn
from IM
I guess we are very much concentrating around data standards vs open data, how about data provenance. You collected some data and put it in excel because you don't need any standard or format. Now next person don't know how you generated the data or what was the context. It gives him freedom to use it anyway he/she want.
- Abhishek Tiwari
So many community efforts are ongoing, off course they are not doing it without any data.
- Abhishek Tiwari
Provenance is important but can be managed by a container if necessary (blog, repository, wiki, whatever). The data is just an object at the end of the data, the context doesn't have to be stored within it, it just needs to point at it. Takes us back to recording the record of what was done versus what was generated.
- Cameron Neylon
Actually I think that is a central problem with many data standards. They try to record both the experiment and the results, without making a clear separation between them. This can lead to all sorts of nightmares such as: how do you represent the average of five independent experiments; I realised a parameter was set wrong when I processed the data, but now if I duplicate the file it looks like I've done the experiment again...
- Cameron Neylon
Well I don't see either that you need to store metadata in data, but most of standard are trying to document both what was done (metadata) as well as what was generated (data), and I guess that is not bad.
- Abhishek Tiwari
My problem is that they generally try to do this within one file - which in my experience breaks much more often and creates more dependencies. If I was to take the extreme position I would say that there should be separate standards for recording the process and the different types of results. But in practice, especially with modern instrumentation, it is helpful to pull some of it together.
- Cameron Neylon
Without metadata, it;s just numbers. It's the context that makes it data. It definitely makes sense to break things out.
- Mr. Gunn
I recently faced this problem of metadata. I had to store a set of spreadsheets and I wanted to annotate each file (this is a tab-delimited-document, this is a statistical result...) and each column ( this is a marker, this is a genomic position, etc...). Franck Gibson redirected me to the information-ontology http://tinyurl.com/3n6lcu . (Not easy to use for me).
- Pierre Lindenbaum
"Given the recession — and the amount of time we spend on Facebook — a bunch of hungry, motivated young guns is the last thing we need around here."
- Anil Thomas
Anil - thanks! I would really appreciate some pointers on that side of things. I freely admit to being worried about getting copyright *wrong*, so I may have overdone it. My questions are: 1. I have 1 image that is GNU - am I violating copyright on it because my overall presentation is CC? 2. How do I best ensure that I fulfil copyright obligations, but not have it be too distracting from the presentation itself?
- Allyson Lister
it will be better if you put all copyright information at the start of the slide rather than on each slide or at the end. normally this is assumed as better practice.
- Anil Thomas
This would help with "my" copyright, i.e. the one for the presentation - I can just put it on the title page. A separate issue is the copyright for the images: is there a better, more concise way of doing that too?
- Allyson Lister
@Anil Thomas, I am not sure that is assumed at all. Each image has a copyright associated with it. You should certainly drastically reduce the size of the text size, when you giving the talk people dont want to see the copyright text, but it should be there. The other option is that you could create an index at the end having thumbnails of each picture used and the appropriate credits. You could state the copyright at the start and then just use the logos in each page after that.
- Frank
@Frank - good idea for the thumbnails for each license type. But then where do you put the URLs to the original source url of the images? :)
- Allyson Lister
@Anil - thanks for the heads up. Seems from that URL that the one problematic image might shortly have a CC-BY-SA. I'll keep my eyes open, but if it doesn't change by the due date in the wiki, I'll replace the image. In the meantime, I'll re-post later today to move my license info from the footer.
- Allyson Lister