Bosco, this is a superb idea. Along with starting up a new journal/software hybrid, it will be great if existing journals insist users to submit source code, executable or VM of a bioinformatics software / database / server to a centralized repository like 'biohub.org'.
- Khader Shameer
While not linked to an actual repository (but rather, provides a snapshot of the s/w and data for the article), Journal of Statistical Software, does pretty much this
- Rajarshi Guha
I would take this further and the article text remains in the revision repo. The reviewers are sent to the article, not the other way around and it can be forked in just the same way the software can
- Frank
from iPhone
@Frank, this makes sense, since otherwise the paper would be static and refer to old versions. But then this assumes that as the s/w is updated, so is the paper
- Rajarshi Guha
@Rajarshi not neccessarily the paper should state which version/revision it refers to. It does not have to keep up with the sw. That is what documentation is for :)
- Frank
from iPhone
The more I think about it, the more I think some big-wig bioinformaticians should do a deal with Google Code to edit a journal. That might even align with Google Scholar.
- Bosco Ho
@Frank, in that case, why bother with a VCS? Why not just put a tarball with the source code for the version that goes with the paper?
- Rajarshi Guha
Great idea, but I can't see it working for data sets. Yes data sets evolve and should track provenance somehow, but having been in and around standards groups for some time now, this is an impossible task for a publishing group to take care of, especially considering the nature of big-data bioinformatics. Plus if goes against best practices for software source control (use factories, don't store your database...)
- delagoya
There are some interesting and non-trivial questions around this kind of idea as to what peer review should look like. Should such a journal provide virtualisation environments so that the code can be run? Example data should be a requirement presumably? Are peer reviewers expected to evaluate code "quality". Anyone thoughts on this would be extremely useful...and help guide a project like this into reality.
- Cameron Neylon
My answers to Cameron's points: (1) no, (2) yes, sample data would probably be used to run tests which should pass, (3) quality is somewhat subjective - minimum requirement should be that code runs and generates output as expected - but reviewers could certainly suggest code improvement where appropriate.
- Neil Saunders
So if the answer to 1) is no, does that mean that you can't necessarily expect referees to actually run the code? Or compile it? Or just that you pick referees appropriately? Or conversely that "refereeing" becomes a process of building up enough positive comments or karma points in the repository...? It seems to me that you want to bring the best of versioning systems and best practice...
more...
- Cameron Neylon
Referees should certainly be able to run code - I'm just not sure that virtualisation through the web interface is the way to do it. Seems like an additional layer of complexity that might get in the way of making this idea work.
- Neil Saunders
@Cameron & Neil: If it could be figured out how to to handle the virtualization (or having remote access to machines), I think that'd be a highly valuable addition to peer review. Easy for me to say (not knowing how to implement it), but I think it's a great goal to strive for. It doesn't seem too crazy to have the journal have a bunch of machines on hand so the authors can remotely upload / install code and referees could then remotely log in to look at and try out code.
- Steve Koch
I can't figure out where to jump into this thread. Personally, I think we just need a place to publish locations, i.e. the code is here, data is there and this is the version we used, etc. That must be maintained and being able to maintain that should become part of the funding process. Since funding agencies are the ones who are funding this research they need to include the ability to...
more...
- Deepak Singh
My feeling is that being able to run the programs somewhere on a server without downloading them is important - but that is very much a user's perspective. I often look at useful things that are made available and just have no clue how to actually make them work. A good range of downloadable executables would probably do the job for me though. Additional question: what are the standards for web services?
- Cameron Neylon
Which is why VM's and cloud services are such a big deal for demo's and provenance now. You can package up a VM with the exact stack that you want and make it available, either as a service or a VM you can launch yourself. It's too easy not to do it
- Deepak Singh
@Deepak : Cloud + VM is an an interesting combination, but should have an accessible pricing that is affordable to a larger research community
- Khader Shameer
I think there should be strict guidelines while reviewing bioinformatics software / database / servers to test the resource. I had a recent experience : a reviewer wrote extensive list of points to reject a server that we developed with out trying what exactly it is doing or to know how does it differs from other existing resources. I strongly support the hybrid journal model, also it...
more...
- Khader Shameer
Let's talk specifics. VM images are great, but you are tying your release to a particular release of a particular platform. A better approach is to start from a base OS (like a linus distro ISO) and have a set of build instructions for system set up and application building. My favorite of the moment would be Chef.
- delagoya
Second, academics love to solve a problem with a novel algorithm and then move on. In fact it is in their best interest to move on after milking a project for all it's worth, publication wise. Maintenance, or even robust testing (couch... Tophat ... cough ... Bowtie .. cough ) is not even on the radar. Frankly I am not so sure it should be. Maintenance requirements may slow the pace of...
more...
- delagoya
@delagoya, good point. If I have made significant improvements, why update the old paper? better to try for a new paper!
- Rajarshi Guha
delagoya, chef's fine too. Find a common medium/mechanism that works for the community. The resources are certainly there. It's a matter of trying things out. As someone I know says, start simple, and iterate
- Deepak Singh
Khader, that's where the funding agencies come in. They need to provide mechanisms for sustainable funding here.
- Deepak Singh
The nice thing about a hybrid journal is that it might be possible to have new dois/database entries for "significant" updates. Not perhaps just place holding papers as is the case sometimes in the NAR database issue but when something has changed significantly you can get a new paper without needing a new algorithm or service. I like the idea of funding to support "orphan" code and services as well. Make it worth money and people will do it.
- Cameron Neylon
Delagoya - as a naive user I disagree. I really don't want to have to build, I want to use in the lowest stress way possible and a hosted VM seems like a good way to enable that - as well as allow for longer term preservation. We may not be able to run linux on future hardware but will probably be able to handle VMs for longer (actually having written that I'm not sure its true - would be interested in more expert perspectives)
- Cameron Neylon
I almost missed this discussion. I really like the idea but I wonder how discovery type projects fit in. I mostly use code to look for trends. If anything I might make some predictor to enhance existing data. For these reasons most of what I do is one off scripts around perl and R. Maybe this sort of project does not belong in a bioinformatics journal at all.
- Pedro Beltrao
Pedro, great question. Personally, if we included all glue code, small scripts, etc this would be unsustainable and defeat the purpose of peer review as well
- Deepak Singh
@Pedro, I don't see a journal/software hybrid as replacing all bioinformatics journals. I think there's a place for journals that discuss pure algorithms and ideas. These would do exploratory type programming. Normal journals service these papers quite well. For me, a hybrid model targets specifically those papers that describe a program that is meant to be used by other people. In that...
more...
- Bosco Ho
Bosco, you're thinking along the lines of a communications journal aren't you. And then people can go to work on the code if it is on github or something
- Deepak Singh
@Deepak. Yep. The disconnect I see is that pragmatically, it's the open-source project that counts. The article in the bioinformatics journal is so that we can get a place-holder to collect citations that contribute to our academic CV. The journal/software hybrid provides the most efficient way to this goal.
- Bosco Ho
Very nicely summary of the problem. Really, the whole concept of a journal article about software is stupid. What does an academic article do? Alert people to a new finding/discovery. But in the case of software - well, the software is the finding. And people are "alerted" by finding it on the web, downloading it and using it. As Bosco says, the sole role of an article here is a CV tick - hence the hybrid approach. Non-academic programmers must find all of this very odd.
- Neil Saunders
Me too, re: Google Wave gadget. Then add SBML support, use SBGN in the display, support MIRIAM annotations and I can retire penniless.
- Neil Swainston
@Neil, there are a few other tools which support SBML and SBGN (see http://sbgn.org/Communi...). Wikipathways seem to be inventing yet another pathway format and dont provide a conversion to any other existing "standard". Shame as it would benefit everyone if they did.
- Frank
Thinking about it a little more, I'd really like to see the above refactored as a collaborative Google Wave Gadget. I've been involved in about five network reconstruction "jamborees" now, which involve flying loads of people around the World to sit in a room and discuss things that they could do with PathVisio (if it supported SBML...) or Payao. Anyways, this costs a fortune (see the...
more...
- Neil Swainston
@Neil: WikiPathways is intended exactly for that type of collaborative pathway creation. WikiPathways pathway format is based on, and developed in cooperation with, http://www.genmapp.org. So admittedly it's not a widely supported standard, but at least it wasn't a complete new invention. SBML / SBGN support is on its way. Re Google Wave: unfortunately, all this work predated Google Wave by several years...
- Martijn van Iersel
Question for bioinformatics experts: given a gene name is there a good way to identify papers talking about it? An obvious approach is to do a Pubmed search and see how many hits. But seems a little rough. Is there anything better ?
www.novoseek.com/, www.nextbio.com, biosemantics.org/geneE/search.jsf, and www.ihop-net.org are some resources...
- Jeff Kiefer
Thanks for the pointers, useful to start with
- Rajarshi Guha
The trick is not to start with a PubMed search for the gene name, but to start with the NCBI Gene database. All of the Entrez databases are linked so you can go from a Gene record to publications. Use either EUtils as Pierre and Andrew said, or go from the Gene page (e.g. http://www.ncbi.nlm.nih.gov/gene...) and follow the links.
- Neil Saunders
geneRIFs might not be complete, but how complete do you need to be?
- Mr. Gunn
@Mr Gunn, my application doesn't really need authoritative information. Basically, I'm trying to get a rough idea of which genes are more popular than others in terms of publications. Given that a pub may mention a number of genes in passing, it's not a very reliable measure. But it's one other feature that I can use in a summary/ranking etc
- Rajarshi Guha
amazing stuff - it would have been nice to have a submission for our chemviz symposium at the ACS of some type of chemical application
- Jean-Claude Bradley
WOW Daily Fail, indeed. How can it be that they don't get it - scientific authority comes from having facts, not having a title or political support or degree.
- Mr. Gunn
"To heal wounds and improve communications between biostatisticians and the confused masses that rely on them, De Gruttola agreed to discuss the details of what p value means and does not mean with ScienceNOW. But, as you'll see, the probability that this will solve the problem is low."
- Noah Gray
from Bookmarklet
but unlike Brian, this article did indeed result in my feeling stupid...then I reassured myself by recalling the gazillion times over the last 20 years when students have asked me "how do you know that?" when suggesting simple vocabulary changes with massive effects. sigh. Expertise is not a good blanket on a cold night.
- Mickey Schafer
Statisticians never seem to tire of explaining p-values to "the confused masses". As De Gruttola says "It's the difference between I own the house or the house owns me. It's two different concepts." Biologists: confused masses no more!
- Greg Tyrelle
Ha! Is this a real interview? If so, I'd like to see it on video. Then: I'd like to see it autotuned.
- Steve Koch
Yeah, I'd love to hear the audio for this! Thinking in statistical terms really is alien to many people, and let's not even get started on Bayes Rule!
- Mr. Gunn
BMC Bioinformatics, Vol. 8, No. 1. (2007) BACKGROUND:The web has seen an explosion of chemistry and biology related resources in the last 15 years: thousands of scientific journals, databases, wikis, blogs and resources are available with a wide variety of types of information. There is a huge need to aggregate and organise this information. However, the sheer number of resources makes it unrealistic to link them all in a centralised manner. Instead, search engines to find information in those resources flourish, and formal languages like Resource Description Framework and Web Ontology Language are increasingly used to allow linking of resources. A recent development is the use of userscripts to change the appearance of web pages, by on-the-fly modification of the web content. This pens possibilities to aggregate information and computational results from different web resources into the web page of one of those resources.RESULTS:Several userscripts are presented that enrich biology...
- Yann Abraham
Yeah - I'd love to see a community building greasemonkey scripts for life-science sites - pubmed, citeulike, etc. Hell, maybe I'll start working on some...
- Chris Miller
Mike, that was sort of the set up in SVN too, which is why there still is userscript/trunk (though we never made tags of branches, which is why there is no matching userscript/branches or userscript/tags ...
- Egon Willighagen
Heya! Do any of you know where to find one of those graphs that show how much sequence data has been deposited in sequence databases (preferably recent ones)? Like this one: http://farm1.static.flickr.com/15... , but more recent.. I would be much obliged!
Come to supercomputing. My whole talk is on this subject ;). Trying to figure out if I should start writing about it before or after
- Deepak Singh
I understand the points in the comments about using flat files for better speed. However compared to when I used to use miscellaneous scripts and data files to do my research I find that using a 'Ruby on Rails' type of database-backed approach is much better for me because of the shorter development time and how much easier the code is to maintain.
- Michael Barton
@Deepak I did consider mentioning Hadoop/NoSQL (see last paragraph of earlier draft http://bit.ly/ztjS9) as it's obvious to discuss these types of approaches when dealing with very large datasets. However I think these tools do still require a fair amount of work for maintain and use compared with a more standard kind or MySQL approach. I say that because I tried using map/reduce across the university cluster and had quite a few teething problems.
- Michael Barton
More generally, "any DB + any ORM" is A Good Thing. I can see why people stick with (My)SQL. It's tried and tested. I find a lot of the newer developments interesting, exciting, fun - but often, "too agile" for real work. Libraries change too fast, documentation (if any) goes out of date, code moves to new repositories, in the space of 3 weeks.
- Neil Saunders
@Neil I originally tried using DataMapper instead of ActiveRecord but so many Rails centric libraries assume ORM == ActiveRecord. This meant using DataMapper precluded the use of the factory_girl and shoulda libraries which I have come to find very useful. I think Rails needs to be is truly ORM agnositic and that the current changes in Rails 3.0 doesn't go far enough to address this.
- Michael Barton
I agree. I really like DataMapper (and other ORMs - sequel, mongomapper), but using them with Rails components = ugly, not fully-functional hacks, as things stand. Be interesting to see how the new ActiveSupport looks. I'm even considering abandoning Rails for now and just plugging together components myself as required (e.g. ramaze/sinatra if web frontend required).
- Neil Saunders
Michael, I am not talking just about Hadoop/NoSQL, but the fundamental challenges of operating at high scale. How you handle disk failures, node failures, approaches to managing that data, etc. The rules change once you are working in the multi TB range (and when I talk Big Data I am mean several TB's).
- Deepak Singh
I've had a ton of trouble every time I tried to use that phylofacts site for anything. Maybe it's just not intended to do what I wanted...
- Donnie Berkholz
@Donnie, I'm sorry to hear that. What did you want to do?
- Ruchira S. Datta
I'll let you know next time something comes up, I don't remember exactly what things it was anymore. One example of something I've done recently is, given a residue number in a PDB file, find the sequence, find all homologous sequences (given some cutoff), their % identity to the original, and the equivalent residues (a per-residue mapping of input:result) in an automated fashion. By the way, say hi to John Davidson!
- Donnie Berkholz
Yes, I can see how that would be hard to do in the current site. We've been thinking of making per-residue information more evident for the site redesign that's in progress, but I at least had been thinking more about the display rather than doing it "in an automated fashion"--I presume you want to script it? It's good to know what people are interested in, thanks. I'll tell John you said hi!
- Ruchira S. Datta
@Ruchira: Yeah. I had a set of 150 or so proteins I wanted to do that with, comparing each one with all of its homologs. Not reasonable to do it manually.
- Donnie Berkholz
Second the MGI resource. Mammalian Orthology: ftp://ftp.informatics.jax.org/pub/reports/index.html#orthology
- Walter Jessen
pdfetch is a small web app that automagically fetches the PDF reprint of a PubMed article given its PMID. If pdfetch cannot find a local copy of the reprint, then it downloads the reprint from the publisher's website to the local repository (of course only if the reprint is free or if you have authorized access to it, e.g., via your university library).
- Pierre Lindenbaum
Mark Pilgrim's excellent book "Dive Into Python" was republished on Amazon.com, under the terms of his GNU Free Documentation License. This is driving his publisher nuts. I'm surprised it doesn't happen more - why aren't books by Doctorow, Lessig et al immediately republished by other publishers?
- Michael Nielsen
I wonder if we'll see more prominent examples of this. A plausible story to follow: some publishers will refuse to publish under CC (or GFDL) licenses, prominent authors like Doctorow, Lessig, Benkler et al will move to self-publishing, and services like Lulu might be off to the races.
- Michael Nielsen
How much more prominent than Lessig and Doctorow were you thinking exactly?
- D0r0th34
"more" as in "other". I've never really understood why (e.g.) Doctorow's publisher goes with a CC license. Any other publisher could easily release Doctorow's work, and could underprice it, since they wouldn't be paying royalties or an advance. And legally, if the CC license stood up, they'd be completely within rights.
- Michael Nielsen
Ah, gotcha. Well, I wouldn't touch Doctorow's work with a ten-foot pole, if I were a conniving publisher. 1) I'm still competing with free. 2) Doctorow would rip me up one side and down the other on BoingBoing for harming his print publisher. 3) Doctorow's fans are very engaged with him, so such a rip would very likely be bad for business.
- D0r0th34
D0r0th34: Point (1) is equally true of his print publisher. As for (2) and (3), any money made here is pure gravy for such a publisher - so what if you annoy a lot of fans? Heck, you can even reduce your risk, by arranging print runs based on how well the book debuts. (I'm not advocating it, especially, I just think this is likely to happen increasingly often, because there's a lot of commercial upside, and virtually no downside that I can see, if the CC licenses hold up in court.)
- Michael Nielsen
We'll see what sales end up looking like. Put it this way: "pure gravy" isn't, quite, because somebody still has to go out there and find books that can be exploited in this fashion and then typeset them (badly, admittedly; but consider a book like Pilgrim's, where bad typesetting harms meaning so much that nobody with half an ounce of sense will buy the offprint). Will there be enough gravy to cover these acquisitions costs? Honestly, I doubt it.
- D0r0th34
My guess is that someone like Doctorow gets a 6 figure advance. You can buy a lot of acquisitions and typesetting for that amount of money.
- Michael Nielsen
... I seriously, SERIOUSLY doubt that. Doctorow's good, but he's still pretty much midlist from a mass-market publisher POV. I guess I can email and ask him.
- D0r0th34
Very interesting, but I fail to understand why anyone would want to buy a paper copy of the book rather than reading online as it is completely example driven and you need to be sitting in front of a computer to run the examples. I think a simple temporary solution to "third party" publishing would be to license the electronic version under CC-ND, whilst reserving all rights for the...
more...
- Matt Leifer
Even if he's getting a $30k advance, the republisher is still saving a huge amount of money.
- Michael Nielsen
Michael: You may be missing something about Doctorow's use of CC. He uses BY-NC-SA. If that's legally enforceable, and it probably is, he could sue any publisher who republished his material *for profit*--that's the NC clause. So, a key point is (4) There's an enormous downside if Doctorow wants to make a point (possibly with CC's support): You'd have a VERY weak legal stance. CC isn't waiving all rights, not unless it's CC0.
- Walt Crawford
Walt: Thanks for pointing that out, I'd completely forgotten. I imagine Lessig et al are similar. I wonder if the NC part of CC has ever been tested? Something like this might be an interesting test case. (Especially if a not-for-profit started up that republished, but not for profit. )
- Michael Nielsen
Well, Creative Commons spent a lot of expert lawyer time making sure the CC licenses were bulletproof. I'd guess a case involving a commercial publisher republishing a BY-NC book would be a slam-dunk. A nonprofit *that was not making profits from the book*--that might be interesting. There's been a LOT of discussion, and a survey, as to what us CC users think "NC" really means.
- Walt Crawford
BTW, the reason I got into asking academic publishers about this is that I am interested to know how I should license content that I am primarily intending to make available online, allowing as much freedom as possible, but that I might want to publish in print at a later date. Of course, most publishers don't have an actual policy on this and are very wary of discussing hypotheticals,...
more...
- Matt Leifer
Book or journal publishers, Matt? It makes a difference.
- D0r0th34
Book. With journal articles it is easy because I am in math/physics and they usually have a clear policy on arXiv preprints. Also, the availability of a free online version is not strongly correlated to journal sales at the moment, whereas it would be for a book.
- Matt Leifer
Okay. How important is the accumulated prestige of the publisher to you? Or is what really matters that the book be published and that you retain the rights you wish to? (I swear there is method to my madness here.) There are some full-OA uni-press-type outfits out there, but I don't know which of them do math/physics. Will research.
- D0r0th34
Hm - I can't find the "republished" version of Dive into Python on Amazon: http://www.amazon.com/s... - does anyone know if it was yanked or maybe published under a different title? Does the GDFL require attribution (if not, then perhaps it was published under another name)?
- Hilary
@Matt: I may have misunderstood your comment above, but I don't think CC-ND prevents people from just printing the original version w/o significant alterations. The license says "The above rights may be exercised in all media and formats whether now known or hereafter devised" but that "you have no rights to make Adaptations" - an Adaptation is defined as "a form in which the Work may...
more...
- Hilary
Yes, you are right. I guess what we need is a license that gives the publisher exclusive rights for the print version, but applies CC-like provisions to electronic versions. I realize that this partly defeats the object of CC, but I figure that print versions will eventually become obsolete so it is only a temporary measure designed to allow academics to benefit from the prestige of an academic publisher, whilst still allowing freedom of information online.
- Matt Leifer
@Matt: I really like your idea. In some sense, every open content license is transitional, pending even more openness, so I agree that such compromises can help. That's part of the genius of CC in the first place, after all. Before CC, there were very few gradations in pre-written licenses; you could go all rights reserved, BSD-style or GFDL, and that was about it. Now, you can specify if attribution is needed, if derivative and commercial uses are OK, etc., and I think we are far more open for the options.
- Christopher Granade
Do you know of any database that provides a list of Transcription factors and their targets in the Human genome? A resource that should provide a list of Transcription factors mapped to their target
genes. I knw abt Transfac, anything else ? Thanks in advance !
Depends a bit on how strictly you want to define "target gene". I have some huge lists of matches to all JASPAR motifs in the human genome, but of course those lists don't tell you if the TF really binds or if the binding is functional. I've only seen a couple of databases that try to collect TFs and their regulated genes: http://rulai.cshl.edu/cgi-bin... (Michael Zhang lab) and ITFP (http://itfp.biosino.org/itfp) which, unfortunately, seems to be down at the moment.
- Mikael Huss
Thanks Mikael, I will go through them. I am looking for literature curated information about TFs and genes with TFBS on its upstream region.
- Khader Shameer
So, I live blogged Cameron Neylon’s talk today at Newcastle University, and I did it in a Wave. There were a few pluses, and a number of minuses. Still, it’s early days yet and I’m willing to take a few hits and see if things get better (perhaps by trying to write my own robots, [...]
- Allyson Lister
nice Cameron! Indeed we'll only know the impact of Wave when everyone can participate
- Jean-Claude Bradley
are there any actual robots (that is, Wave-enabled machines) on Wave? it would be interesting (at some level) to see a Wave that was just software robots talking to hardware robots.
- Richard Akerman
Great question, Richard. I'd like to know, too. Cameron? Anyone?
- Michael Nielsen
@richard copied from http://www.slate.com/id... : The core feature of Wave is that it is a real time communication PROTOCOL. Right now that manifests itself as live time chatting, but Wave is not meant to be just a fancy new IM client like you seem to have been using it. It is extendable by developers and the real time nature of the protocol allows it to be potentially...
more...
- Pierre Lindenbaum
A google wave robot that takes lists of numbers and converts them to sparklines. Interesting because it does image insertion and other things but also because of the sparkline webservice it uses and the way it gets around appengine limitations
- Cameron Neylon
Good stuff. After looking at Cameron's ChemSpidey, Igor, and now this, I'm really getting excited about the potential for bots and in-line modification of the info-stream.
- Todd Harris
"A dedicated bioinformatics workstation - install it or run it live. Bio-Linux provides more than 500 bioinformatics programs on an Ubuntu Linux base."
- Mike Chelen
from Bookmarklet
Neat idea- but how much of the 4gb USB stick remains for holding data / analyses: need a bigger stick?
- Richard Badge
from Nambu
This was one of the first (and probably best) of these distributions (think there was a BioKnoppix at one time?) It's been around at least 6 years. But a software suite is only half the battle. The biologist needs to know how to use the packages, store and interpret the output. Which is why we have bioinformaticians and IT staff. I've never been convinced that "bioinformatics on a stick" is much use to biologists compared with expert advice/support, but I may be wrong.
- Neil Saunders
Are these targeted towards biologists, or informaticians? Don't see biologists getting much use from such a distro, but do see computational types making good use
- Deepak Singh
Target market is what has always confused me. I'd assume that bioinformaticians are happy to install their own software locally and for biologists with limited tech skills, a live CD doesn't help much. But I'm happy to be proven wrong by success stories.
- Neil Saunders
I gave it to one of my students - we'll see what he has to say.
- Björn Brembs
Neil: making the software easier can help decreasing repetitive tasks, allowing more efficient use of expert advice and support, which is definitely the most valuable and scarcest resource. newcomers can often manage to boot the OS and start playing with some software, and experienced users can check if their software of choice is included, and save a little time when setting up new machines
- Mike Chelen
Deepak: looking at the package list http://nebc.nox.ac.uk/tools... some favorites of both fields stand out, for example a biologist may run a BLAST search regarding a DNA sequence they are studying, while an bioinformatist could develop applications with Bio-Java and Eclipse IDE
- Mike Chelen
Richard: data could be stored on a network drive, or a larger flash disk could be used, since there are 8, 16, and 32gb USB sticks available now pretty inexpensively. also, additional USB drives can be plugged in limited only by the number of USB ports on the machine
- Mike Chelen
Björn: cool, would love to hear how useful others find it. the software packages can also be installed in current Ubuntu systems by adding their repository http://nebc.nox.ac.uk/tools...
- Mike Chelen
Follow on question? Would a VM be equally useful? For example, I use VMs a lot to learn stuff and configure environments.
- Deepak Singh
Deepak: yes absolutely! for exactly the reasons you mention, experimentation and reliability. found a VirtualBox VDI image: http://friendfeed.com/bioinf... any more formats such as VMware or EC2 AMI would be great too :)
- Mike Chelen
there's a bunch of good EC2 AMI's that I will be highlighting either here or somewhere else soon (from familiar names), but more the merrier
- Deepak Singh
from IM
@bjorn please do get your student to feed back to NEBC, a long time ago Bio-Linux was my baby and my full time job. It's come a long way since I left it and I'm very happy to see that it's still going. It has its rough edges, and things which could be done better, but out of the box it's a well set up system ready to go. It's already been used as the base for other more focused...
more...
- Daniel Swan
@Neil with Bio-Linux I can happily say that we turned a few biologists into informaticians, and one into a programmer when I was with the team! Even last week a biologist walked into my office, asked me to help it getting up and running in VM on his laptop so that he could do some work. The Live-CD version was really just a distrubution method, we used to send out a bootable cd-rom that would netinstall a Linux image from our servers. Inefficient at best :)
- Daniel Swan
Good to hear. I remember when NEBC were setting up many years ago, Dawn contacted me regarding compilation of Phred/Phrap under Cygwin after I mentioned it on Nodalpoint. The early days of the bioinformatics social network!
- Neil Saunders
Deepak: thinking about combining the Bio-Linux packages with some of the standard Ubuntu EC2 AMIs from http://alestic.com/ since they are optimized already, and contain other common tools
- Mike Chelen
Mike, that would be brilliant. Lots of our customers ask for starting points in this space and being able to point to something that they might be familiar with would be great. Let me know when you do that. I am thinking about writing up a post on all the available bioinformatics AMI's on AWS
- Deepak Singh
Mike - you might want to talk to Tony Travis about this (ajt@rri.sari.ac.uk) he has interests in taking the Bio-Linux base in a more 'cloudy' direction, and I'm sure Dawn and Bela and co. at NEBC would be happy with any feedback along those lines.
- Daniel Swan
Daniel, almost all the packages install okay, are there any particular applications that would be important to test? here is how the desktop looks on ec2: http://ff.im/8fb8j
- Mike Chelen
Great idea. Anything to reduce the tedium of wget; ./configure; make; make install is welcome and helps lower the entry barrier for people into the field.
- Todd Harris
Todd: it would be nice to start an instance with the least manual input, especially when running a particular application. for example software set up to use a biology AWS dataset http://developer.amazonwebserv...
- Mike Chelen
<-- pure biologist, willing to install on top of current Ubuntu to give it a try. Agree with Todd on extra installs. I essentially only use a java-based app to chomp on big files, also Bioconductor on rare occasion. Neil's point about need for expert advice is well-taken, but I find that a biologist, willing to use a linux platform, right away gets more targeted feedback when asking for help.
- Heather
Heather: that's great, did you find it easy enough to add the repository? here's a script that can save a little time, it is only a few lines though: http://github.com/mchelen...
- Mike Chelen
The ideal way to install the repository would probably be a .deb containing the apt sources and signing key. This is used for example with Ubuntu One https://one.ubuntu.com/support... and PlayDeb http://www.playdeb.net/updates... (expand the instructions). Maybe someone could prepare this given the existing NERC repository info?
- Mike Chelen
Deepak: the Bio-Linux image from JCVI http://www.jcvi.org/cms... really looks great, anything that helps this project can benefit other researchers interested in running bioinf software on EC2. maybe a repository mirror within EC2/AWS cloud, or a public AMI, would help too?
- Mike Chelen
Mike that is a public AMI, ami id is ami-6953b200
- Deepak Singh
from IM
does this image include a desktop environment? the screenshot sort of looks like remote x. in most cases that is probably best, it could be nice to have a desktop version as well. thinking about slower internet connections, where some compression (such as freenx) is usually needed for remote desktop
- Mike Chelen
Not sure, you'll have to ask Bioinfo ... My guess is no, since I think it's built on a server image, but I could be wrong
- Deepak Singh
from IM
Deepak: that's good, been trying to find a way to get the bio-linux packages installed on 64bit instance. desktop could be handy for learning and testing
- Mike Chelen
I honestly can't fathom how people sit in a conference room and listen to people read off of a paper. Why would you bother? If all I'm getting is the text, I could read that at home.
- Chris Miller
The popular "solution" to this problem, Chris, is not to make the paper available for you to read at home...
- Daniel Mietchen
"PowerPoint Karaoke is the nasty habit of some speakers to type full sentences on slides, and then read them out loud while pointing to the words with their laser pointer as they go along." - Hilarious.
- Andrew Lang
Hah - had a bit of an experience of this when I inserted a set of D's slides into a talk - fully attributed of course - but hadn't had time to check the transitions (which I never use so it didn't occur to me) which came as a bit of shock as my high speed patter crashed into D's artful organization of material entering the view...
- Cameron Neylon