The idea that "it's not information overload, it's filter failure" combined with the traditional process of filtering scholarly communication by peer review prior to publication seems to be leading towards the idea that we need to build better filters by beefing up the curation of research output before it is published. Here I argue that this is backwards and that the 'filter failure' soundbite is maybe unfortunate in the context of scholarly communications. The web won't reduce the cost of curation, but it has reduced the cost of publication. This means that instead of building filters to prevent stuff getting on the web it is more productive to focus on enhancing discovery. A focus on enabling discovery can both deliver for researchers and provide business models that are more aligned with the way the web works.
- Cameron Neylon
Hmm, never occurred to me that people would take 'filter failure' to mean pre-publication, but the last spate of articles/posts indeed seems to indicate that. My impression is that we need some sort of powerful demonstration (YouTube video?) of how this discovery process can work. Just recently, a certain person in publishing referred to my description of it as 'magic'. Clark's third law aside, if people don't understand the technical possibilities, how can we ever be convincing?
- Björn Brembs
Seems to me more like an abundance of alliteration :-)
- Michael Nielsen
The single easiest way for publishers to make scientific information more discoverable would be to actually integrate the material properly with the web. The more hyperlinked papers are to each other, to the blogosphere, to news sites, and so on, the better search engines will be able to help us find what's out there and worth reading. At the moment, almost no publisher does this. Even the arXiv is essentially isolated from the web as a whole, and so we're stuck wading through papers manually to find anything. PLoS is the only source I can think of offhand that makes much of an effort to integrate with the web, and it makes a huge difference already in Google. (PLoS articles could be far better hyperlinked, however.) But the integration ought to be much more widespread.
- Michael Nielsen
(cont) Until journals do this, I'll have a hard time believing that they have any real interest in making it easier for scientists to locate high-quality information. The notion of sorting by journal brand sort of made sense one hundred years ago; today, it's ludicrous.
- Michael Nielsen
Björn - as I was writing this I was reminded of the "extraordinary power of data" meme which hadn't been where I started. I think the argument that "more data makes better decisions" and therefore more publication will make better tools for finding stuff in all that published stuff could be quite powerful. Screencast of doing a google search should make the point?
- Cameron Neylon
Michael, yes agreed. Their whole business model is predicated on blocking access at the moment. I think the whole making the paper "of the web" is the key to effective communication in both directions - which speaks to the discussion on the OKF open science list as well. The end game really is that the publishers need to sort this out or Google will do it for them. And I think things will work out better for the research niche if there are players that do serve our particular needs.
- Cameron Neylon
The whole being summed up in Weinberger's dictum "Filter on the way out, not on the way in", i.e. post-publication.
- AJCann
...and alliteration may be the only way I'll be able to compete with Shirky's snappy soundbites so I'll take anything I can get :-)
- Cameron Neylon
I was thinking more like a 'today vs. tomorrow' kind of thing. But maybe that's a bad idea? I thought of showing how we are forced to struggle with the literature today (eToCs, press releases, WoK, GS, PubMed, etc. all isolated and with overlapping functionality and coverage, etc.) and then what it could look like tomorrow with the technology we already have.
- Björn Brembs
Talking about discovery: I just read title and abstract of this paper: http://www.plosone.org/article... I'm not going to read it, so I'm not going to comment on it, rate it or bookmark it. But I think a lot of people might also want to read title and abstract. Why isn't there a flag for: 'this might be interesting?' And I mean that in general, not just for P1.
- Björn Brembs
I like that, Björn. On one hand of course, reading just the Abstract (as we all know) = 'don't judge the book by the cover' but on the other, a flag such as this at least leaves a recorded 'stamp'. Interesting suggestion !
- Graham Steel
Like Bjoern, I'm gobsmacked that anyone thinks "filter failure" refers to pre-publication filtering. I suspect disingenuity and deliberate spreading of (F)UD on the part of publishers there. I also very much like the idea of demonstrating discovery. I have been told that I cannot possibly be reading the right papers if all I am relying on is my search strategies and not using journal impact factors to pick which papers to read!
- Bill Hooker
Maybe I don't get it right, but to me 'discovery deficit' hides two quite different situations: when I know I need something and when I don't know there's something I need. The first is rather simple (technology is here) issue. The second situation is often called 'ignorance', but I wouldn't mind to have it solved as well.
- Pawel Szczesny
@Michael - I'm not sure that's true. Google and other major search engines have arrangements with most publishers to index the full text of paywalled sites and it's a poor journal platform that doesn't link out references or, nowadays, to related news stories, articles or videos etc. I'm pretty sure that PLoS doesn't gain any extra Google juice from being OA... though of course a larger audience will have the chance to actually read the content.
- Euan
.... in fact STM publishers probably do *more* than most to make their content discoverable on the web - you get far richer metadata on many journal webpages than on, say, the NY Times. I think it's more of a tools problem: search engines are only just starting to pay attention to stuff like RDFa and microformats. Definitely getting better though....
- Euan
@Pawel - agree those are two separate problems but if we had a real market people would be working on solving both of them with an eye to making money. @Bill Yep, I hear that one all the time. "If you're not focussing on journals with high IF you're not getting the good stuff"
- Cameron Neylon
@Euan don't disagree with anything you say but I'm still going to call bullshit. If STM publishers were serious about building discovery in they would be putting at least rich snippets into every figure, demanding that the underlying data for every graph be made available _and_ exposed and that's just for starters. No-one is doing that, not NPG (with a small number of honorourable exceptions) not Wiley, not Springer or Elsevier, and neither PLoS nor BMC. When a publisher make a serious effort to surface and expose underlying data I'll laud it to the moon and back. But I've not seen anything much from anyone yet. It's not easy and its not cheap but adding value never was. The point of markets is to squeeze margins. That's what they do. The market makers create new spaces for value creation. Lets see a bit more of that.
- Cameron Neylon
Euan: speaking in broad brushstrokes, the journals are essentially a walled garden. Very few encourage hyperlinks out, from the main text of articles. I've lost count of the number of times I've had the text of URLs deleted from my references (never mind the actual link). And I don't even bother trying to include hyperlinks in the main text, I know they'll be deleted. And since the journals show no interest in linking to the outside world, and are so restrictive of access, it's no surprise that no communities have built up offering commentaries (and links) to the journals from the outside. The journals may be on the web, but they're not of the web. (I'm speaking in broad brushstrokes here, and there's all kinds of details wrong when you get to the level of individual journals. But broadly speaking, I believe that what I'm saying is correct.)
- Michael Nielsen
Sorry, that last comment reads harsher than I meant it. My point is there is a big gap between the current best standards, which don't amount to much more than providing hyperlinks for references, and really optimising the ability of people to come to specific points in a paper for specific things. OA is an assumption here really because otherwise people can't develop businesses to develop search and discovery tools - but beyond that we need much richer markup and layering of different "surfaces" that can be found and presented by those tools.
- Cameron Neylon
@Graham: so are we going to do it? If so, how? Could we come up with a script, have some people here leave a few comments and then think of a way to realize it?
- Björn Brembs
I think there's a disincentive in our current system that prevents development of discovery tools: where the article is published would become even less relevant than it is now if it fits your discovery criteria. OTOH, journal name could be one discovery criterion, so I may be wrong.
- Björn Brembs
Cameron, it's not that publishers aren't moving to provide the kind of amplification of data that you are telling Euan you want; it is that it takes longer to build into pre-existing platforms than I think is immediately obvious. (At least that's what technology providers are telling me.) But I think you'll begin to see some significant changes from content providers over the next 12 months.
- Jill O'Neill
this discussion dovetails nicely with another recent discussion on ff a few days ago on embedding bibliographic data into references. Jill is right that the publishers aren't the leaders in this - the indexing and abstracting tools traditionally performed this function and they're separate from the publishers (or were, before many of the mergers that have taken place). So we're seeing this integration happen more slowly and with disconnects. And data, as Cameron noted, is much less developed than the references.
- Elizabeth Brown
@Daniel - That is the best contribution to this discussion so far. As we know has a research process at least four stages (probably more) http://dx.doi.org/10... . So, though we can discuss single stages do we have to acknowledge that the "research cycle" requires all stages to work properly, or we will face bottlenecks we "could" call a deficit. Still, it is not clear to me why this is the case, it might be well be that the deficit is caused by information overload or filter failure, at least I would assume some relationship between those concepts.
- joergkurtwegner
The current bottleneck is funding decisions - for everything else, open platforms exist, and I think it does not really matter into how many steps we decompose it (that table has 6, Cameron's cycle 8).
- Daniel Mietchen
Maybe there should be then more "open funding" discussions, especially with focus on how to be fair in rewarding "open contributions" (and at which of the 4,6,8,whatever stages the reward will occur, uh, that is a tricky one)? Just a thought http://picasaweb.google.com/joergku... and "free riders" and "conditional collaborators" are a challenge http://picasaweb.google.com/joergku... since at the end people "might appear less altruistic than you think"!
- joergkurtwegner
@cameron yes, point taken about the lack of any publisher trailblazing - but being properly integrated with the web in 2010 *doesn't* involve semantic enrichment, open data or changing the way scientists worldwide think about papers (unless you use a very different www to me). You're talking raising the bar for anybody who deals with scholarly works (be it arXiv, Precedings, NPG, Elsevier, PLoS or PubMedCentral) - I agree, it should be raised, but it'd be wrong to say that publishers don't also take the issue seriously already. It's a hard problem and in addition to what Jill said not necessarily something you can or should do unilaterally.
- Euan
Also, to complete a cliche ;) - not sure OA needs to be in the mix. You can have a scholarly search and discovery type business now (see DeepDyve, novoseek and most obviously Google / Google Scholar) you just need to sign more pain in the ass license agreements. Better for projects to do this kind of thing to not bite off more than they can chew...
- Euan
What I read Michael Neilsen to be saying isn't anything so much to do with semantics, RDFa, or any of those wonderful things under development, but rather integrating with the web 1.0. Links in a published research paper almost always go to somewhere else within that paper or on the publisher's site, and rarely to another paper or another website. Even citation links go to the bibliography entry at the bottom of the paper instead of the actual paper being cited. Publishers have had almost 2 decades to get that right, so any argument saying essentially "it's hard but we want to do it and we're making progress" needs to be backed up a little better.
- Mr. Gunn
I agree Mr Gunn that the publishers keep the publication links sequestered. I would explain this by history - traditionally other outlets (like WoS, etc.) added this info. Even today I don't think they feel it's worth their time to do it, even if there's promises to do so. The discipline-based abstract sources added interlinked citations earlier - I recall even these vendors took several years to add them after being on the web.
- Elizabeth Brown
And now in chorus: "We don’t need [pre-]publication filters, we need enhanced discovery engines."
- Daniel Mietchen
@Euan, @Jill - yes agree things are changing and the next twelve months are going to be very interesting. But just to be a little more precise in response to Euan, I didn't mean necessarily that we needed semantic integration (although I believe that will be the ultimate route) but exposure of elements that support the state of the art in internet search - I used "rich snippets" advisedly, not because I like that approach particularly but because it implies working to enhance the ability of search engines to really dig into the content in a rich and faceted way. This is Search Engine Optimization in a true sense - working to optimize the ability of people to find what they are looking for via third party providers.
- Cameron Neylon
On the open data front I would disagree. Signing contracts isn't just a PITA it simply doesn't scale to web scale effectively unless they are open and presumptive contracts (actually contracts just scare me in a federate world - I'm not allowed to sign contracts relating to work stuff because I'm not competent - and every time contracts people get involved the amount of time to get things sorted is enormous. I can't see that working at scale). But then I would say that ;-)
- Cameron Neylon
@Cameron: Very much agreed! This PITA is a serious threat to civilization... I wonder how much money is currently lost on the legal department because is needless licensing, acquisition and defending of patents, ... instead of actual service providing ...
- Egon Willighagen
Disclosure: a link to this FF thread has been included in my blog post: "Finding influential OATP news items" (July 25, 2010), http://tillje.wordpress.com/2010...
- Jim Till
I'd like to see more of such backtracking to ff threads (ideally in some automated manner), but why label it "disclosure"?
- Daniel Mietchen
Because there are no automated trackbacks, I chose to add one manually. Perhaps I should have used "Trackback" instead of "Disclosure"? (Use of "Disclosure" was probably influenced by past experience with "Internet research ethics" - do all those who have contributed comments to this thread regard FF as a "public forum"? If so, no need to use the word "Disclosure").
- Jim Till
Trackback might have made some more sense to me - I did wonder. My personal view is that this is public but we've had discussions in the past about linking in and it does upset some people who feel this is "semi-private". I don't think it's an issue for this thread certainly. In any case, thanks for the link and the thoughtful blog post!
- Cameron Neylon