Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Bosco Ho
The bioinformatic-journal/software hydrid - http://boscoh.com/protein...
Yes, yes. Very much like. - Neil Saunders
Bosco, this is a superb idea. Along with starting up a new journal/software hybrid, it will be great if existing journals insist users to submit source code, executable or VM of a bioinformatics software / database / server to a centralized repository like 'biohub.org'. - Khader Shameer
This is a good idea. - Michael Barton
While not linked to an actual repository (but rather, provides a snapshot of the s/w and data for the article), Journal of Statistical Software, does pretty much this - Rajarshi Guha
I would take this further and the article text remains in the revision repo. The reviewers are sent to the article, not the other way around and it can be forked in just the same way the software can - Frank from iPhone
@Frank, this makes sense, since otherwise the paper would be static and refer to old versions. But then this assumes that as the s/w is updated, so is the paper - Rajarshi Guha
@Rajarshi not neccessarily the paper should state which version/revision it refers to. It does not have to keep up with the sw. That is what documentation is for :) - Frank from iPhone
me likey too - Deepak Singh
The more I think about it, the more I think some big-wig bioinformaticians should do a deal with Google Code to edit a journal. That might even align with Google Scholar. - Bosco Ho
@Frank, in that case, why bother with a VCS? Why not just put a tarball with the source code for the version that goes with the paper? - Rajarshi Guha
Great idea, but I can't see it working for data sets. Yes data sets evolve and should track provenance somehow, but having been in and around standards groups for some time now, this is an impossible task for a publishing group to take care of, especially considering the nature of big-data bioinformatics. Plus if goes against best practices for software source control (use factories, don't store your database...) - delagoya
There are some interesting and non-trivial questions around this kind of idea as to what peer review should look like. Should such a journal provide virtualisation environments so that the code can be run? Example data should be a requirement presumably? Are peer reviewers expected to evaluate code "quality". Anyone thoughts on this would be extremely useful...and help guide a project like this into reality. - Cameron Neylon
My answers to Cameron's points: (1) no, (2) yes, sample data would probably be used to run tests which should pass, (3) quality is somewhat subjective - minimum requirement should be that code runs and generates output as expected - but reviewers could certainly suggest code improvement where appropriate. - Neil Saunders
So if the answer to 1) is no, does that mean that you can't necessarily expect referees to actually run the code? Or compile it? Or just that you pick referees appropriately? Or conversely that "refereeing" becomes a process of building up enough positive comments or karma points in the repository...? It seems to me that you want to bring the best of versioning systems and best practice (tickets etc) into the "journal" while at the same time maintaining the sense of a formal gateway process. Getting the balance right will be important. - Cameron Neylon
Referees should certainly be able to run code - I'm just not sure that virtualisation through the web interface is the way to do it. Seems like an additional layer of complexity that might get in the way of making this idea work. - Neil Saunders
@Cameron & Neil: If it could be figured out how to to handle the virtualization (or having remote access to machines), I think that'd be a highly valuable addition to peer review. Easy for me to say (not knowing how to implement it), but I think it's a great goal to strive for. It doesn't seem too crazy to have the journal have a bunch of machines on hand so the authors can remotely upload / install code and referees could then remotely log in to look at and try out code. - Steve Koch
I can't figure out where to jump into this thread. Personally, I think we just need a place to publish locations, i.e. the code is here, data is there and this is the version we used, etc. That must be maintained and being able to maintain that should become part of the funding process. Since funding agencies are the ones who are funding this research they need to include the ability to pay and sustain repositories. For code this is easy, for data, this is non-trivial. I do think distributed VCS', cloud services, etc are making this easy. People are just not using them. - Deepak Singh
My feeling is that being able to run the programs somewhere on a server without downloading them is important - but that is very much a user's perspective. I often look at useful things that are made available and just have no clue how to actually make them work. A good range of downloadable executables would probably do the job for me though. Additional question: what are the standards for web services? - Cameron Neylon
Which is why VM's and cloud services are such a big deal for demo's and provenance now. You can package up a VM with the exact stack that you want and make it available, either as a service or a VM you can launch yourself. It's too easy not to do it - Deepak Singh
@Deepak : Cloud + VM is an an interesting combination, but should have an accessible pricing that is affordable to a larger research community - Khader Shameer
I think there should be strict guidelines while reviewing bioinformatics software / database / servers to test the resource. I had a recent experience : a reviewer wrote extensive list of points to reject a server that we developed with out trying what exactly it is doing or to know how does it differs from other existing resources. I strongly support the hybrid journal model, also it will be equally important if some (or all) of the existing bioinformatics journals could support a centralized archive as suggested by Bosco. - Khader Shameer
Let's talk specifics. VM images are great, but you are tying your release to a particular release of a particular platform. A better approach is to start from a base OS (like a linus distro ISO) and have a set of build instructions for system set up and application building. My favorite of the moment would be Chef. - delagoya
Second, academics love to solve a problem with a novel algorithm and then move on. In fact it is in their best interest to move on after milking a project for all it's worth, publication wise. Maintenance, or even robust testing (couch... Tophat ... cough ... Bowtie .. cough ) is not even on the radar. Frankly I am not so sure it should be. Maintenance requirements may slow the pace of innovation. I think Deepak is right that the call (and incentives) for robust maintenance must come from on-high, but maybe there should be "orphan adoption" grants to pick up potentially useful project from researchers that have moved on... - delagoya
@delagoya, good point. If I have made significant improvements, why update the old paper? better to try for a new paper! - Rajarshi Guha
delagoya, chef's fine too. Find a common medium/mechanism that works for the community. The resources are certainly there. It's a matter of trying things out. As someone I know says, start simple, and iterate - Deepak Singh
Khader, that's where the funding agencies come in. They need to provide mechanisms for sustainable funding here. - Deepak Singh
The nice thing about a hybrid journal is that it might be possible to have new dois/database entries for "significant" updates. Not perhaps just place holding papers as is the case sometimes in the NAR database issue but when something has changed significantly you can get a new paper without needing a new algorithm or service. I like the idea of funding to support "orphan" code and services as well. Make it worth money and people will do it. - Cameron Neylon
Delagoya - as a naive user I disagree. I really don't want to have to build, I want to use in the lowest stress way possible and a hosted VM seems like a good way to enable that - as well as allow for longer term preservation. We may not be able to run linux on future hardware but will probably be able to handle VMs for longer (actually having written that I'm not sure its true - would be interested in more expert perspectives) - Cameron Neylon
I almost missed this discussion. I really like the idea but I wonder how discovery type projects fit in. I mostly use code to look for trends. If anything I might make some predictor to enhance existing data. For these reasons most of what I do is one off scripts around perl and R. Maybe this sort of project does not belong in a bioinformatics journal at all. - Pedro Beltrao
Pedro, great question. Personally, if we included all glue code, small scripts, etc this would be unsustainable and defeat the purpose of peer review as well - Deepak Singh
@Pedro, I don't see a journal/software hybrid as replacing all bioinformatics journals. I think there's a place for journals that discuss pure algorithms and ideas. These would do exploratory type programming. Normal journals service these papers quite well. For me, a hybrid model targets specifically those papers that describe a program that is meant to be used by other people. In that case, by merging these papers with an open-source decentralized repository will make the project much easier for others to use, contribute to, and evolve. - Bosco Ho
Bosco, you're thinking along the lines of a communications journal aren't you. And then people can go to work on the code if it is on github or something - Deepak Singh
@Deepak. Yep. The disconnect I see is that pragmatically, it's the open-source project that counts. The article in the bioinformatics journal is so that we can get a place-holder to collect citations that contribute to our academic CV. The journal/software hybrid provides the most efficient way to this goal. - Bosco Ho
Very nicely summary of the problem. Really, the whole concept of a journal article about software is stupid. What does an academic article do? Alert people to a new finding/discovery. But in the case of software - well, the software is the finding. And people are "alerted" by finding it on the web, downloading it and using it. As Bosco says, the sole role of an article here is a CV tick - hence the hybrid approach. Non-academic programmers must find all of this very odd. - Neil Saunders