Paul J. Davis › Comments

Michael Barton
GitWrite - blogging for nerds -
looks like a potentially great platform for a lab notebook... - Carl Boettiger
Sounds an awful lot like GitHub Pages is an awesome platform for hosting documentation. For example, we use it to host from this GitHub branch: - Paul J. Davis
There's a great desktop/cloud notebook app to be written using git as a back end. Anybody got into the guts of this and seen what is different if anything under the hood? - Cameron Neylon from twhirl
I know that the Mercurial vs. Git debate seems to have been won by git in the Open Science community, but if you want a saner life you should look at (Also see for a list of Mercurial/Git backed wiki engines.) True nerds should consider an Emacs Org mode backed blog or wiki which can also be combined with revision control. - Matt Leifer
Just thought about an Emacs org-mode + git solution, too. Especially as it can include executable code snippets via Babel: (at the bottom of this page you can find a description how this can be used for reproducible research). - Konrad Förstner
The server software used for display must be available for the content to be truly portable. Would be happy to see even if only basic features were available so far. - Mike Chelen
Hatta looks good, wonder how difficult it is to get set up? - Mike Chelen
Ended up using Github's Git-backed wikis Now if only there were a way to allow comments similar to blog posts... - Mike Chelen
Paul J. Davis
I haven't seen anything pop up on Convore so I thought I'd mention it. Its somewhat like friendfeed but also different.
Very conversations based, but doesn't seem to aggregate anything. Everything has to be a typed message, right? - Rob Syme
That said, it's ridiculously easy to automate posting via the API. and - Rob Syme
Its definitely missing the built-in aggregation stuff. I haven't heard them talk about adding those types of features but its only been public for a couple days so I reckon anything could happen. - Paul J. Davis
Egon Willighagen
Mark Reinhold: IBM to join OpenJDK -
This is really good news indeed for the Java Science community! - Egon Willighagen
But not so good for the Apache Harmony guys. - Paul J. Davis
True. But I personally do not think that project should have started anyway... they should have joined Classpath in the first place... - Egon Willighagen
Bora Zivkovic
If you are in bioinformatics (or anything really), and are not associated with a University, what services and databases you'd like to have and use, but they cost money (so you don't, or you pay while cussing and cursing because it's impossible to work without)?
Although slightly rephrasing the question, I would like to be able to pay money to have a version NCBI that didn't make me want to gouge my eyes out with a spoon. - Paul J. Davis
I work at a small biotech company and as Neil said, access to journals is the biggest issue. We are associated with a university so when I'm there I can access journals, however normally from the company building I can't. The delay is frustrating. In terms of commercial tools/databases, currently I would like access to Genelogic gene expression database and Biobase transcription factor database and analysis tools. - Greg Tyrelle
Another small biotech here, and yes, access to journals is a huge pain. - Bill Hooker
I'm currently at a university, but was considering being an independent scholar for year or two (= a homeschooling mom with a research rather than a knitting hobby). Resource problems I anticipated: access to journals, Scopus subscription, Web of Science subscription. - Heather Piwowar
Thank you all - just as I expected: access to journals (and ways to find journals/papers) is the most expensive and most difficult to get thing if one is outside of University. How about space for office (lab?), equipment (poster printer?), software - if you were a researcher at home (freelance scientist) what would you need that costs? - Bora Zivkovic
Depends what kind of research. I'd need a biosafety hood, liquid nitrogen storage, glassware, balances, electrophoresis equipment, culture incubators, autoclave, chemical store... :-) - Bill Hooker
Software is the other bit. Biotech's tend to be a lot more budget constrained than many academic labs. - Deepak Singh
Access to journals and availability of equipment are both big deals, but whereas some equipment can be obtained from ebay or a local lab equipment company, academic literature database subscriptions can't be obtained at a discount rate. (AFAIK) - Mr. Gunn
For journal access, try getting yourself an unpaid appointment as affiliate faculty (aka courtesy appointment) at your local university. I've known a couple of people who have done that. - Donnie Berkholz
Lab space outside of a university or big company is very difficult to find. The so called "incubators" are expensive. It is still unclear to me what the rules are for running labs in commercially-zoned properties. - Jeremy Leipzig
Access to journals is a huge issue. My biggest expense for freelance science (computational biology) is the computing equipment; second would be software. - Walter Jessen
As I'm considering becoming an independent scientist (again!), some time ago I did back-of-the-napkin calculations and it turned out that I might need ca. $100 (reliable internet connection) + $500-1000 (per article payments, no subscriptions) + $500-1000 (computing cloud, storage and calculations) a month to work comfortably outside of academic infrastructure (without spamming all my... more... - Pawel Szczesny
Access to journals is the biggest issue. Obtaining this access by being an adjunct faculty is much more valuable than the salary they pay for your services. Obtaining funds to attend scientific meetings and to cover publication costs isn't far behind. Not always trivial to convince business types there is value in publishing your basic research. - Jeff Habig
Donnie, good to know re: affiliate positions, thank you. - Heather Piwowar
I'd be in Pawel's boat. On the software end, I can probably make do with open source. - Deepak Singh
Pierre Lindenbaum
Awesome find. I wonder how this stacks up against something like a nested containment list. - Paul J. Davis
Pierre Lindenbaum
The poor state of the java web services for Bioinformatics -
But the S stands for Simple! - Paul J. Davis
I'm a bit unclear as to whose "fault" it is for the high rate of WSDL parse failures -- the JAVA web services library or the service providers. Care to comment? - Andrew Su
@Andrew, I'm not a specialist of WSDL but I think that many WS use an old deprecated WSDL specification that is not anymore supported by the java API. And I also guess that people only test their WS with their favorite tool/language. - Pierre Lindenbaum
EBI wins the web services competition. Jessica Kissinger showed some examples of remote web services working with Galaxy based on the work her group has done, and I believe the demos were with EBI tools. The high failure rate is surprising since I thought Java was at the leading edge of WSDL. I've had issues in the past with Python, but thought it was because the library support was lagging. - Brad Chapman
Agreed that EBI is the most progressive in this regard, but the lack of (successful) adoption by other organizations I think is problematic. Reminds me in a vague way of the DAS world, where last I checked the vast majority of DAS sources were provided by the inner community of DAS protocol developers. - Andrew Su
Andrew, definitely agreed. Someone needs to be leading the way, and EBI seems to be doing a good job of that, but it's a red flag when others don't follow. Web services suffer from the perception that they're not easily interoperable, which Pierre's test reinforces. I mostly use REST with JSON return structures; they are not discoverable the way WSDL style web services are, but are easier for people to get rolling with. - Brad Chapman
EBI was the 'winner' of my test because their web services are generated using the java WS API ( e.g. see the top comment of ). I do love SOAP/WSDL web services because you don't have to write any code to parse some JSON/XML/whatever, all the code should be generated from the WSDL (see ) - Pierre Lindenbaum
Don't mean to hijack the conversation, but Brad, I wonder how you (or anyone else) thinks Galaxy fits into this landscape of bioinformatics tools (complexity vs adoption). That "Tool definition file" feels vaguely WSDL-ish... - Andrew Su
Andrew, Galaxy is awesome. The development philosophy is oriented towards very practical solutions. For some inspiring examples of what people outside the Galaxy development team have done with the infrastructure, take a look at the Cistrome project: and Gunnar Rätsch's machine learning work: - Brad Chapman
bye, bye ... linked service web ? - joergkurtwegner
@joergkurtwegner not necessarily: people are happily using those web services and taverna. It's more disappointing if you want to build your own program using a standard tool like wsimport. - Pierre Lindenbaum
I am a *big* friend of open standards ... we do not have time creating customized workarounds for each single outlier ... do we ? - joergkurtwegner
Yeah, then you love SOAP... it's *several* Open Standards ... :) - Egon Willighagen
Andrew Su
BioGPS powered by CouchDB now, time to relax. -
BioGPS powered by CouchDB now, time to relax.
Great work. I stumbled upon BioGPS the other day when Mr.Gunn pointed me in that direction. I really like what you have been doing. - Ricardo Vidal
Thank you Ricardo (and Mr.Gunn!)... - Andrew Su
Yay CouchDB in bioinformatics! - Paul J. Davis
Andrew Su
Examples of "crowdsourcing science"?
... reporter for a science mag wants some examples (aside from the Gene Wiki, of course). Already pointed him to Jean-Claude's ONS efforts. Others? - Andrew Su
Would you count the ornithology efforts at Cornell (e-bird) - Deepak Singh
Undergrad-sourcing: - Eric Jain
Don't know if anyone is using Amazon's "Mechanical Turk"? This is a more controlled approach to "crowdsourcing". Have considered using this for outsourcing base calling... - Eric Jain
Eric ... Mech Turk works great for any number of problems, but do you think you'll get the appropriate people for that for something like base calling? Would be an interesting experiment - Deepak Singh
Don't need a PhD in order to figure out whether the red peak or the green peak is higher :-) On the other hand only non-trivial cases need to be done manually and there a bit of practice is beneficial. Don't know if the Amazon setup allows you to give people a few training tasks first? In any case, could be a fun thing to try! - Eric Jain
Should look into it. Sit right next to someone from the team - Deepak Singh
@Deepak: Guess I'll have to invite you over now :-) - Eric Jain
I recall seeing a project where somebody used MechTurk for large scale image annotation, but these guys from MIT who made a game out of the process obviously do much better. I'm not only sure if that's a crowdsourcing. FoldIt should qualify, probably Synaptic Leap and ShareScienceIdeas (Noel Harem's wiki) as well. - Pawel Szczesny
Eric ... lol .. if only I had known earlier - Deepak Singh
I would say all of science is crowd sourced. When taken as a whole, science progresses with the will of the crowd. 'Crowd Sourcing' the buzzword is merely an exerted effort in trying to get people to solve a specific problem as opposed to letting nature take it's course. - Paul J. Davis
I like the recursive nature of crowdsourcing an answer to a question about crowdsourcing... - Daniel Swan
Daniel, I *think* I understand what you mean by "the recursive nature of crowdsourcing" (what I refer to a positive feedback loop between utility, users, and contributors), but care to clarify? Thanks all for the pointers... - Andrew Su
I thought Daniel meant that you were using a crowd to answer this particular crowdsourcing question, yes? :) - Allyson Lister
Andrew, yes, as Ally pointed out I was being more flippant than informative :D - Daniel Swan
;) got it... Then I should suggest that the reporter put his story on a wiki and let crowdsourcing write his article too.... - Andrew Su
@Eric ... just got some info ... we should talk. Or you should talk to my neighbor :) - Deepak Singh
Wikipathways and EOL are two that come to mind. Also GeoNet and NOAA's cooperative observer project. There are also a number of examples of distributed data gathering using things like cell phones (not sure if these qualify as crowdsourcing, but in the same category as SETI@Home): Noisetube, Sensing Atmosphere, QuakeCatcher ( - Hilary
What is with ChemSpider ? You could contact Antony Williams (on FriendFeed, too ! - joergkurtwegner
Hello everyone (esp. Andrew) — do you know where and when the article on crowdsourced science was finally published? Can't get my hands on it… Thank you! - Enro
Paul J. Davis
CanvasMol - An HTML5 molecule viewer. It even managed to render Taq polymerase decently on Chrome.
The link would probably help: - Paul J. Davis
Walter Jessen
Now let's hear from the Mac side (PC apps here: For bioinformaticians/computational biologists that use a Mac running OS X: what is your most useful and productive app?
I'll kick this one off: Quicksilver (a must have), TextMate, Fluid (I run SSBs for Ingenuity, Remember the Milk and FriendFeed) and Evernote. iWork Pages for writing manuscripts, grants, etc, Keynote for presentations, Excel for spreadsheets, for email. - Walter Jessen
Textmate - Deepak Singh
Although others I use all the time. Papers, Terminal :), Fluid, Evernote, Launchbar, Keynote, 1password - Deepak Singh
Oh yes, I forgot about 1Password. I prefer iTerm to Terminal. - Walter Jessen
For science: Papers, Geneious, EnzymeX, ApE, Sequencher, OmniGraffle For general: iWork (all), Textmate, Chrome/Google Docs, Dropbox, Droplr, Foreversave, RescueTime, SpiritedAway, Spotlight, Things, SelfControl, TextExpander - Ben Ferguson
Dropbox ... it just works so you don't even think about it - Deepak Singh
X11, gnu screen, and vim. I very rarely use Terminal, except when I can't get over the fact that I can't paste into an xterm. - Ruchira S. Datta
Oh, and the Python interactive shell. - Ruchira S. Datta
One more to add, Colloquy, which is my preferred IRC client - Deepak Singh
Perl, Aquamacs, Chrome, Terminal, Colloquy, Virtualbox, OpenOffice, Skype, Adium, Spotify :) - Roger Pettett
iWork (Pages and Keynote), Evernote, Google Docs, TextEdit, ZumoDrive, Filemaker Pro, Skitch, 1password, Qumana, Spotlight (a life saver), Adium, Adobe9 for the ScanSnap scanner, MyFax. - Sally Church
Does anyone have a great file compressor for Mac equivalent to Winzip or better? The Apple version never seems to shrink large files much >.< - Sally Church
I've also been test driving Daylite, Basecamp, Pipeline, BatchBlue etc looking at sCRM's, project management tools and contact management software - Sally Church
Sally, good old toast? :) - Deepak Singh
Deepak, ah neat idea I have Toast for windoze somewhere, I check it out on my old Dell and see if it will shrink massive files nicely. - Sally Church
Sally, bzip2 seems to be on my Mac (though I can't recall if I installed it specially). - Ruchira S. Datta
++Paul - Ruchira S. Datta
Adobe Creative Suite, most frequently Illustrator, then Acrobat, then Photoshop, occasionally InDesign. - Ruchira S. Datta
Lars Juhl Jensen
Looking at my already complex network of Web 2.0 and wondering how to best fit in Google Buzz
Wouldn't it be nice if everything would just automagically synchronize .. standards for comments and likes/unlikes. Then the competition would be more on the user interface than on just user base. - Pedro Beltrao
I also use the different networks differently depending on my contacts .. facebook mostly for friend and family and friendfeed more for work but this could be solved with proper (easy to use) group controls. When posting selecting if its something that you post for friends or work. - Pedro Beltrao
Google should make a simple checkbox list of contacts and what you are sharing with each (chat, buzz, localition, etc). - Pedro Beltrao
The other odd thing is that it doesn't seem to add in notices from other locations. Ie, I'm not seeing tweets from any of the people I follow. - Paul J. Davis
@Chris Yes, that is exactly why I disliked FriendFeed to start with. I didn't like the idea that comments to my blog posts would end up "somewhere else" without people reading my blog being able to easily follow the discussion. The fragmentation of liking, sharing, and commenting only gets worse with each new resource being launched. - Lars Juhl Jensen
On a positive note, I think there is a fair chance that Buzz could dramatically increase the number of scientists who are involved in social networking. Most of my colleagues neither have a Twitter nor a FriendFeed account, and they don't use their Facebook account for science. But most of them use Gmail, which is why I smell an opportunity :-) - Lars Juhl Jensen
@Neil, I think the right question is not why you'd want to share with your Google contacts, by why they'd want to read what you share. Unless you set things as private (which I know you don't), you are effectively sharing with the whole world when you put something on FriendFeed or Twitter. It is the readers who choose whom they want to follow. The big difference is that Buzz is opt-out whereas the others are opt-in; in Buzz you follow your contacts by default. - Lars Juhl Jensen
Michael Barton
Is there a best practice for microbial genome annotation?
I think its more like the Perl culture "There is more than one way to do it !!" Best practices in bioinformatics is currently in an ad-hoc state of practice.Just like Damian Conways's Perl Best Practices is one of the best guide for good coding practices for Perl - hope we will also have a book on "Best Practices in Bioinformatics" soon, may be by a group of authors from LifeScientists room - what say ? - Khader Shameer
@Khader thats why we need flexible guidelines and not the constrained best practice. Several minimal guidelines have been already worked out for the different aspects of the life science domain. MIBBI ( can be a good starting point in this case. - Abhishek Tiwari
I completely agree with you Neil, but some efforts towards developing well defined, documented workflows / protocols (can we call this as "Best Practices") to perform generic tasks (eg. annotation) will be useful for the community. I think several 'standards' (eg. MIRIAM/MIBBI) are developed to bring in a common frame work for routine tasks. I believe TLS is an ideal place to get a consensus about such practices and work on a wikibook of best practices in bioinformatics. - Khader Shameer
@Abishek : Best practices are not always "constrained", and constrained practices are impossible due to complexity of biological system - flexibility should be there. But my point is that even if MIBBI / other standards ( are available for a long time - I've never seen them in research papers - is it due to poor visibility of such projects or no interest in promoting such initiative ? - Khader Shameer
Khader, In my opinion the main motive of guidelines is to avoid the disagreement while best practices try to bring an agreement in community. Also, people are using these guidelines. Its just lack of awareness otherwise more and more people will adopt them. Take any Biomodels database model or CellML repository model, they are well annotated according to MIRIAM guidelines. Allyson... more... - Abhishek Tiwari
Thanks Abishek for the pointers to application of different standards. My point is the goal of both best practices and standards are the same - getting a consensus to do repetitive experiments / workflows. But as Neil's are discussing - the choice of individual bioinformatics projects is mainly to get a good fix, rather than an excellent code base. But hope some degree of consensus can be obtained if people can follow standards as a first step. - Khader Shameer
Science isn't set up to reward coding standards. Funding agencies reward quick biological results, not infrastructure and software development. I'd argue that for every 5 biological grants, the NIH should be funding one software/database/computational infrastructure grant. The amount of data is only getting bigger. - Chris Miller
@Michael / Neil : I am agreeing with "Science isn't set up to reward coding standards", but as a subject in the interface of science and technology - it is high time that bioinformatics should embrace the standards. For Michael's question I was trying to make a point that if there is a standard/best practice/generic protocol for microbial genome annotation - he could have just followed... more... - Khader Shameer
too right Neil. is there a best practice for violin-making, vision quests, or coming-of-age experiences? ;) - Ian Holmes
srsly tho -- there are plenty of papers describing microbial genome annotation. it's still an open research area, but there are commonalities (repeats, transposons, genes, typical errors, ...) so I guess the rough union of those vague concepts would constitute the current best practice. not exactly a recipe... - Ian Holmes
:D best practice for violin-making, vision quests, or coming-of-age experiences :D - Neil, in the current era of bioinformatics with Webservices and Work-flows - having an SOP/BP is always help you to kick start the work in minimal time rather than going through all genome project paper for the flowcharts for annotations. - Khader Shameer
@ Ian : OK, finally that's something that Michael/any one interested in annotation to get from this thread. - Khader Shameer
@Neil - ^(chicken|egg)? - It could and should be that kind of procedure though. All the advice in the world isn't going to help the people that actually *use* your annotations. The current 'system' for annotating anything is so mindlessly broken I'm surprised it works at all. Now all it needs is a catchy name. Blight of Bioinformatics maybe? - Paul J. Davis
Thanks for the comments everyone. I'm going to read as many genome papers as possible and try and put what I read together. - Michael Barton
Just remembered this article: whic is a good look at current annotation practices. I also finally found which describe's actual paramters that NCBI uses for gene prediction. - Paul J. Davis
Neil Saunders, I agree a lot of advice is available and it is definitely helpful. For example, I was not aware of something like MIARE (thanks to Abishek), am now implementing in our RNAi screen. But I can't agree with you if you define bioinformatics projects as non-agile. From a simple BLAST based sequence analysis to large scale data analysis is following agile approach. Think of n... more... - Khader Shameer
Thanks Paul,for the links to the articles. - Khader Shameer
Here's a paper that describes how microbes are annotated in Swiss-Prot: - Eric Jain
Neil : Just loved the definition "hack together something that works" :) - Khader Shameer
Would be good to get a group together, ideally in combination with the Genomic Standards Consortium (MIGS, other stuff, and they include the INSDC people) and maybe Ensembl types, and do the Minimum Information About Microbial Genome Annotation (MIAMGA). Sounds like it should have an exclamation mark after it -- MIAMGA! Could also maybe seed a wider (and neatly engineered so as not to... more... - Chris
I could do that... Really though I think doing it within the GSC (itself within MIBBI) would be the best way to get (+/-)instant buy-in from lots of worthies. - Chris from twhirl
I used to be all over the agile approach - TDD, BDD, automated builds, design patterns. However now I don't think this approach is agile at all for science. It means extra time creating libraries and tests which may never even get used because the scientific project is a constantly moving target. You can make a library for what you need to do but then find out the result it produces doesn't fit the scientific story so it just gets chucked. - Michael Barton
I completely agree with Neil Saunders now about hacking something together with unix pipes and some scripts. If I end up using it more than a couple of times then I'll probably consider rewriting it as a library. - Michael Barton
I agree with principle of writing clean and elegant code than is well designed and thought out. My angst comes from seeing how much of my time I spent writing this type of code during my PhD and 95% was never used used more than once if at all. - Michael Barton
@Frank I've been reading sequence assembly papers recently. I should start reading annotation papers though as that's the next step for me. - Michael Barton
Pierre Lindenbaum
Elementary school for Bioinformatics: Suffix Arrays -
Nice, Pierre! - Jan Aerts
Did you time that suffix array construction? There's a very nice approach to building them that should give you a pretty dramatic speedup against quicksort: There's an actual implementation at the bottom. - Paul J. Davis
@Paul, I just implemented a basic quicksort algorithm. Thank you for the paper, very interesting, I'll read it carefully. If i need to run this piece of code more often, I will most probably store the index in a file and read it in memory. - Pierre Lindenbaum
indexing the human chr1 (length= 249250621 ) takes about 13 minutes on a idontknowwhatisthatlinuxmachinehowdoifindthisinformation-3.5. - Pierre Lindenbaum
uname -a; cat /proc/cpuinfo; cat /proc/meminfo - Egon Willighagen
thanks Egon. I've never been able to know what is meaningful among all those parameters :-) iamabiologistnotanengineeryesiam :-) - Pierre Lindenbaum
Paul J. Davis
Keeping computers from ending science's reproducibility - via ArsTechnica:
While I'm entirely sympathetic to the general tone of this article, I can't help thinking that it rather optimistically exaggerates the extent to which traditional lab work can be reproduced just from a reading of the paper... - Andrew Clegg
@Andrew - Indeed. Though the contrast I think can be likened to algorithm papers that provide pseudocode. And they completely neglect the fact that all software depends on software. Ie, does their fancy point and click tool record the kernel and bios version numbers? - Paul J. Davis
@Mark I think it's improving, I believe Bioinformatics and NAR both have some kind of 'release or at least make available online' policy -- although TBH I'm too lazy to go and verify that right now. Of course it's just as unenforceable, apart from by slow and inefficient social pressure means. - Andrew Clegg from twhirl
I'm working on a framework for setting up and running computational biology experiments (specifically for molecular dynamics work for now). Specifically it provides built in logging of what you run and should work with version control (git) and is based on a modular (application-based) system. Hopefully it will help us avoid the problems outlined in this article... - David Caplan
Chris Miller
"we need a paradigm that makes it simple, even for scientists who do not themselves program, to perform and publish reproducible computational research." - Chris Miller from Bookmarklet
This has been one of my pet causes for a while. Being able to reproduce computational results is damn near impossible with the current state of affairs. - Chris Miller
Not for not, but the "Accessible Reproducible Research" link is hidden behind a login page. - Paul J. Davis
Sorry about that: (It's still Science subscribers only, though :( ) - Chris Miller
Another article on the topic was just posted in the life scientists group: - David Caplan
I posted this in the other group, but just in case people here are interested: I'm working on a framework for setting up and running computational biology experiments (specifically for molecular dynamics work for now). Specifically it provides built in logging of what you run and should work with version control (git) and is based on a modular (application-based) system. Hopefully it will help us take care of some of these issues. - David Caplan
Michael Barton
How do you know how much genome coverage you need for the organism you plan to sequence? I think the answer is as much as you can afford.
Very likely not - I don't recall exact publication, but people claim that too much reads lowers your chance to assemble the genome. Another thing is technology - what read length use can obtain (and at which quality level). And finally it depends on organism type and if close species is available so you can use other sequence as a reference. I think the answer is somewhere between x10 and x100, but it depends on many different variables. - Pawel Szczesny
Too many reads probably won't be a problem for us since we're using 454. I'm have trouble deciding how best to split up a 454 plate. We want to get as many genomes as possible but with significant enough coverage for a decent assembly on each one. - Michael Barton
Too much sequence is a problem I would gladly accept. - Jeremy Leipzig
"people claim that too much reads lowers your chance to assemble the genome" - what's the logic behind that? I can't think of a good reason why. - Chris Miller
Chris, I guess assembly software is stuck in suboptimal solution when fed with too much data (but I'm not sure what's the reason). I've seen example, when assembler with original data (E. coli ca. x60) assembled reads into three contigs, but wrote out one single circular genome when ca. half of reads were randomly removed. Michael, probably SEQanswers forum will be more helpful - I... more... - Pawel Szczesny
So if pre-454 8-10X coverage was good enough does that also apply to 454 data? I'm completely ignorant in this respect. There's a lot of literature on what to do with sequence data once you get it, but not much on planning a sequencing project. - Michael Barton
The other thing to consider is that you're also going to want to have paired end reads as well. I haven't had too much experience with de novo assemblies from short reads, but everything I've read pretty much requires paired end data. Completed genomes that are closely related could also help, but the some species (I know the Wolbachia's specifically) are notoriously bad for synteny... more... - Paul J. Davis
I think the desired coverage also depends on what you want to discover with the data. If you have a bunch of related (sequenced) organisms, it isn't difficult to get a good idea of presence/absence/modifications of known genes with much lower coverage short-read sequencing. - Rob Syme
I agree with Paul on the benefits of paired-end sequencing. We have sequenced denovo some 30-50 Mbase (fungal) genomes using paired-end and resequenced new strains of another (fungal) organism using single end. Chromosomal translocations/deletions/inversions mean that things would have been so much easier with the paired end data, even though we had a good template/scaffold to fit the single-end data to. - Rob Syme
@Neil. Thanks for the papers. I had read the last one you mentioned because it dealt specifically with very short sequencing. I should reread it though. As for hybrid approaches we're using only 454 because that's all we've got access to. - Michael Barton
@Paul and @Rob. I think 454 now comes with paired mate reads. Also there are three *Pseudomonas fluorescens* genomes already available but from what I've read *P. fluorescens* strains seem to share much less genome similarity in terms of orthologous genes (around ~60%) compared with other Pseudomas species (around ~80%). Comparatively genome assembly could however be a useful step since has Neil pointed out that a genome are rarely completely assembled without additional finishing steps. - Michael Barton
Granted, you could pull a "we couldn't close the gap so we added some N's and circulized." No joke, I've seen it. - Paul J. Davis
Well, human chromosomes aren't single contigs. AFAIK, the crossy point and the ends are pretty much not sequenced. That's quite different than "throw in some N's and call it good" which the one genome I'm thinking of did. - Paul J. Davis
@Paul I just tried running BLAST with "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN" but I couldn't find this genome of which you speak. - Michael Barton
Composition based statistics I reckon. At roughly base 346057 there's a stretch of exactly 100 N's. The paper is quite clear that they couldn't close it. My concern is that when dealing with these prokaryotic genomes no one reads all 1000 papers. Don't even get me started on annotation pipelines. XD - Paul J. Davis
Err, dust rather. Not sure why I said composition stats there. - Paul J. Davis
How about betting? I'd say: 30x - Max
Actually, neither dust or comp stats. Nucleotide databases munge bases that aren't A, C, G, or T. - Paul J. Davis
"There are cases where removing reads generates a better assembly." Yeah, but that's post-processing. And as for assemblers not being able to handle the data, well, that's a good problem to have - just means we need to update our algorithms. I'll take deep coverage any day. Better for SNP finding, better for copy number calls, better for assembly. If money and sequencing power were no object, we'd sequence stuff to 100x or more routinely - Chris Miller
And I think you can afford a lot more soon... - Jason Stajich
@Neil. What sort of tools to do you recommend for post processing? I assumed that the newer assembers being produced would be able to detect reads that prevent contigs from being joined - de Druijn graphs et cetera. - Michael Barton
Ok, thanks for all your suggestions so far, and from everyone else too. - Michael Barton
Chris, I'm not sure about other technologies, but from our experience 454 generates 1-3 indels per read (in the consensus it's only 1 per 2-3kb), so more reads generate more noise. I think it's not only about post-processing - the whole procedure seems like a bottleneck for really deep coverage. Michael, experience from NGS workshop in Rome suggests that for 454 data Roche's own... more... - Pawel Szczesny
Wait - you've just said that using the consensus improves your ability to remove artifactual indels. That wouldn't be possible without deep sequencing, right? With 2x, you don't know which of the two is right. With 20X, now you're getting somewhere. - Chris Miller
Iddo Friedberg
Gene and protein annotation: it’s worse than you thought #bioinformatics -
Error level (near 0%) for SwissProt looks interesting. People of protein-protein interaction data claim 2-9% error rate on manually curated sets, and the same level I would expect from SP. Some things are better than I thought ;). - Pawel Szczesny
Not too bad, but I think its even worse than they do. And evidence codes won't fix this. Manual curation and standards for automatic annotation are in dire need of a revolution, and even if we get that it'll still take years to fix. - Paul J. Davis
Trust Pawel to see the half full (or 60% full) part of the glass. I agree with Frank though: it is not feasible to go through NR and fix the annotations manually. We just have to accept that NR, TrEMBL and KEGG are (mostly) over-annotating, and remember that when we rely on them when delving into the protein family level. - Iddo Friedberg
Thanks Iddo. I lost count of how many times annotation errors came up in my discussion with experimentalists who lack experience with such databases. (Not surprisingly, they usually think these errors are negligible, especially when it comes to THEIR proteins.) Now I'll just send them a link to your post... - Mickey Kosloff
Deepak Singh
When “good enough” just doesn’t cut it -
Is this a symptom of writing software for publication and then moving on? - Michael Barton
It's a symptom surely of what the measured endpoint is - getting the data out for the paper - not producing something that has real utility. It's the good enough _for what_ bit that is the problem here surely? - Cameron Neylon
That's fair - constant suprise to me that I seem to know more about software development best practice than the academic researchers I talk to. I blame Greg Wilson of course...need to get Software Carpentry or similar course made compulsory for all science undergraduates :-) - Cameron Neylon
Or write it into the grant conditions. That spells out it out pretty clearly... - Cameron Neylon
Neil, I agree with you. I think that's going to change, as more and more software developers enter the life sciences, folks who care about maintenance, quality, etc. But the PIs are still a problem. Of course, this is not just academic research though. I've seen it in companies and perhaps that's the difference between between someone who stays middle of the road and someone (someone could be an entity) who excels - Deepak Singh
couple of off topic things: I think your RSS is not working, or you might have changed it. It doesn't show up on my GReader. Also your Fork Me link to GitHub is not pointing to your account. - pn
Paulo, the feed seems to be OK at this end. and yep, do need to fix that Fork Me link. Thanks - Deepak Singh
Survival of the fittest will show how good things are. In the (free) open source world quality/time_to_invest will show, and for commercial world the quality/price will do the same. - joergkurtwegner
Joerg, I think that's beginning to happen, especially with open source alternatives pushing purchasing behavior. Plus expectations have changed. No one is going to use an internal search engine with a several millisecond response time, when you are used to Google - Deepak Singh
Perhaps I'm the pessimist, but if all scientific software were merely 'good enough' I'd be in heaven. Good enough would at least imply that it compiles/runs/etc. - Paul J. Davis
I think "good enough" in software is favored when an individual needs to get something done and faces limitations in terms of time or financial resources in accomplishing the task. Within those constraints, "good enough" is the best way of making progress rather than waiting 'til someone writes the best possible code. It shouldn't remain that way, but if its cutting edge research, a clear market demand may not have been established as an incentive for some one to create a particular piece of software. - Jill O'Neill
Jill, I've seen enough evidence where that's not the case. PI's tend to lose interest when they have papers published, or if a grad student or postdoc leaves. In the case of commercial entities, it's a cultural thing. Constraints can lead to phenomenal code. - Deepak Singh
Isn't it similar to the evolutionary selection, with academics having a set of "pressures" different from those needed to develop #1-type software? Once the paper gets published, there is no pressure for researcher to improve the code, and things remain "good enough". While in a commercial setting there is always strong pressure from the side of the customer/competition, which drives the development further. I.e. to solve the problem one needs to bring some kind of pressure element to the academic setting. - Yaroslav Nikolaev
Deepak Singh
Zachary Voase’s Blog — Bioinformatics and the Semantic Web -
Also check out his other post: Awesome awesome stuff. Also, I'm more than a little relieved to find out I'm not crazy. Or at least that I have company in crazy town depending on the interpretation. - Paul J. Davis
Nice post, but I don't think it would sell me on using RDF. It's not like you can't agree on syntax and meaning (or even use ontologies) without RDF. Better to show how RDF helps deal with the fact that not all databases are ever going to agree on a single set of non-overlapping concepts for describing their data, and -- more important -- with more fundamental disagreements (such as what an organism even is). - Eric Jain
Jan Aerts
If anyone has a #google #wave invite to spare: think of me... Thanks.
Perhaps we need to make a combined list? :-) - Cameron Neylon from twhirl
I know @dgmacarthur hopes to get lucky as well. - Jan Aerts from Android
Drat. Me too! Nominate me! And I'd nominate you too if I get one XD - RK
Setting the barrier low: firstname.lastname google email address :-) - Jan Aerts from Android
Me too! - Benjamin Tseng
I'd actually do something with it if I had an account :P I guess I do have a thesis defense coming up in 2 months though... - Brian Krueger - LabSpaces
I want one too!! - Alejandro Montenegro
Just a reminder there is a Google Group for aggregating ideas and code for Wave in Research: and a Doodle Poll for a date for a UK meetup: - Cameron Neylon
@Jan... I get a Mail Delivery reply on ... :) - Egon Willighagen
@egon I was afraid the address would already be taken :-) - Jan Aerts
Anybody here got an invite already? Still no invite for me :( - RK
Nope... Website says "Google account not yet activated for Google Wave" - Jan Aerts
As of 8am GMT not seeing any further information in my inbox - will update as soon as I know anything - Cameron Neylon
Not spotted anything here either... - Egon Willighagen
Looks like we'll just have to be patient... It'll come when it'll come. - Jan Aerts
How do I invite? - Björn Brembs
Just read somewhere that invites will be available at 4pm BST. Didn't check sources, though. - Jan Aerts
100,000 invites sounds like a lot but worldwide soon thins out :( - Anthony Underwood
@Jan if you get the invite and you happen to have some invites as well, consider me please - george
Word on the street is that only existing wave users will get invites. The new invitees won't get any. - Chris Miller
Chris: But they did post that about the three groups of people that'll receive invites. *sigh* - RK
Me, too! I'll offer a drawing in return! - Kamilah Reed (K. Gill)
Cameron: Where did you read that? If it's true then there's hope after all :D - RK
Various things going around on Twitter and in wave suggesting that the release will be at 9am PST which would be 4pm UK time I guess but I would give that the status of rumour rather than fact. Haven't seen any convincing info that anyone has a new invite yet. - Cameron Neylon from twhirl
Cameron: Yea. Saw it on Brizzly too. Anybody got an invite? Should be out now :) - RK
FWIW - sounds like invites won't be out until this evening, so that the Sydney-based googlers will be awake to troubleshoot. - Chris Miller
Yep - all information I have suggests tomorrow morning Sydney time. Even if we assume they are in at 6am thats still some hours away yet. I'm off to bed me-self. - Cameron Neylon from twhirl
Wave invites starting to roll out over the next few hours: - Chris Miller
yes please - Thomas Power
I'm told that once you even get an invite it can take a couple days for the activation email to arrive. I wonder what email would be like now if they had required invites back in the day... - Paul J. Davis
So you people have invites now? - RK
Got 4 more wave invites, pls send me your email. - Khader Shameer
Thats all my invites just got over. - Khader Shameer
No invites yet either.:( - RK
RK, can I have your email ID please ? - Khader Shameer
I had 12 invites on Friday: - Björn Brembs
Paul J. Davis
Mozilla's Raindrop has been released - "Raindrop's mission: make it enjoyable to participate in conversations from people you care about, whether the conversations are in email, on twitter, a friend's blog or as part of a social networking site."
Zing! "We aren’t trying to invent new protocols or build new messaging systems, rather focusing on building a product that lets users get a handle on the systems we already use." - Paul J. Davis
Interesting... but the download page is blank? - Colby
Ah this message just showed up "Download Raindrop There is no official download yet. The Raindrop code is still under development but you can follow along via the code repository. Please see the Hacking page." - Colby
$ hg clone or use one of the links at the top of to get a tarball or zip. If you have issues installing CouchDB ping us in #couchdb on - Paul J. Davis
Sounds like a better friendfeed to me! - Mr. Gunn
Michael Habib
Fwd: Facebook for scientists gets millions in funding - (via Congratulations to Cornell/Florida/Vivo on their NCRR grant: "The University of Florida, Cornell University and a handful of other schools have been awarded $12.2...
Here's a link to UF's coverage of the event: -- I'm curious, though about this: "The new program will draw information about scientists from official, verifiable sources and make it available using a type of technology called the Semantic Web. For example, information about researchers’ positions will come from their employers and a listing of... more... - Mickey Schafer
How is this different from Biomed Experts, SciLink, etc? After seeing the failure of a dozen of these sites, I'm skeptical of the premise that there's real demand for them. You can build all the semantic infrastructure you want, but if people aren't going to use it, then it's a waste. - Chris Miller
Kind of what I was thinking, too, Chris. But the UF blurb does not address these concerns, so hard to know at this point. Maybe I'll send a message to Sarah Gonzalez tomorrow (one of the UF ref librarians who jump-started the idea) and see if she'll fill me in. - Mickey Schafer
It really hurts to see money be wasted like this on a platform that doesn't really address the issues plaguing these types of sites that already exist. I think someone needs to be given 12 million to figure out how to get scientists to actually use the technology! (Or code tools we'd like to use ;) ) - Brian Krueger - LabSpaces
Brian: what are the differences between this system under development and tools that might be considered ideal? - Mike Chelen
Mickey: scientists may be more likely to get involved for those reasons if they result in an effective operation. it is exciting to hear import and export of standard formats being given a priority, yet it may be longer before anyone sees if the process is functional - Mike Chelen
Chris: anytime someone mentions "facebook for ____ " it seems a little vague and hard to understand what might differentiate the service :D - Mike Chelen
Reading the press release, it doesn't sound like this platform is going to be any different from biomedexperts. I'm not sure there is an "ideal" system. It's going to be hard to offer every discipline the proper tools and content that will drive users and spawn collaboration. Having worked on my own site for the last 3 years, I've heard many scientists say the last thing they want to do... more... - Brian Krueger - LabSpaces
The question that has to be answered is what is the compelling reason for scientists to trust the people they encounter on "facebook for scientists". Non science social networking is low risk... - Richard Badge from Nambu
"The goal of the program is national networking of all scientists," said Michael Conlon, interim director of biomedical informatics for the University of Florida, in a statement. "Scientists have problems finding each other. We often find that researchers have pretty good networks with students or with scientists at institutions where they received their degree or worked before. But... more... - Attila Csordas
Previous discussion here: - Mr. Gunn
I have the same response to hearing this that I imagine many of you would reading a grant proposal that proposes to do an experiment that others have already done and which didn't work, and the results of which aren't cited in the new proposal. They need to address how they're going to work in the face of all these past failures. If their branding strategy is any indication, I'm not sure they're aware of the past failures. - Mr. Gunn
Mr Gunn nailed it. Where is the strategy for succeeding where so many have failed? - Bill Hooker
Mr Gunn +3 saving role against hype. - Paul J. Davis
I would like to point out that the Facebook for Science line is journalists trying to market this to the public rather than the investigators trying to address this groups concerns. I think that phrasing needs to be taken with a grain of salt. That doesn't mean the other criticisms aren't legitimate. I just think it is important to evaluate the project on its own merits rather than public mass market branding of it. - Michael Habib
One point on how it is different from some other projects. It is NIH funded. I am not aware of any other solutions with such a mandate from the NIH. Second, it is a huge amount of devote to the problem. Neither of these differences directly addresses the concerns expressed, but they are both factors that give this project an edge in potentially addressing the issues. - Michael Habib
"The University of Florida, Cornell University and a handful of other schools" any people here from those schools funded or know the people funded and can invite them? Would love to hear their angle - Attila Csordas
I'll be doing a post doc at UF. I think I'll contact the head there and see if they need any help :) - Brian Krueger - LabSpaces
I am at UF. I have met Sarah G. (one of the initiating reference libs) while doing a guest lecture in her class. But I don't know the other people. We could just forward this discussion to one of the contacts usually listed. - Mickey Schafer
Michael Habib, I agree with your observation that "facebook for scientists" is journalist-speak. And in terms of explaining things to the UF community, it is a good analogy as my students constantly and consistently categorize social networks as either twitter or facebook. - Mickey Schafer
I forwarded it to to Mike Conlon at UF. He said he'd take a look at this discussion and also for more information said we should read the RFA The RFA says that it wants the platform to be a federated network distributed by partner institutions, which is novel in the SNfS field. It'll be interesting to see what they come up with. - Brian Krueger - LabSpaces
Thanks, Brian (or really, should I use some southern-ism, like "Thaaank you, sweetie" which is actually what happens here, especially at places like Waffle House?). - Mickey Schafer
The research objectives section makes for a quick and interesting read -- love the "background" info! - Mickey Schafer
I think that background just shows how little actual background research was done before proposing this RFA :P - Brian Krueger - LabSpaces
I wonder if they'll talk to OWW, Epernicus, SciLink, Laboratree, and the dozen other SNfS services out there to import or otherwise leverage all the data that's already been contributed by scientists. I can see it being useful as an aggregator and motivating standardization and data exchange, but would hate to just see it reinvent the square wheel - Shirley Wu from twhirl
We have a few author profiles in Scopus as well :) - Michael Habib
Richard Akerman
ok copyright gurus, help me out - are scanned page images of a 400-year-old book actually under copyright? "©2008 Linda Hall Library - All rights reserved" - Richard Akerman "© 2004 Octavo. For research use only. All rights reserved." - Richard Akerman
The scanned images are the property of the entity that created the scans. If you can get your own hands on a physical copy of Sidereus Nuncius in the original latin and scan those pages in, then you will own the rights to *that* set of scanned images. Not to the intellectual property (text) written by Sidereus, but to the scanned images of his words. Yes, that is truly how the law is interpreted. - Jill O'Neill
Excuse my language, but that is f-ing ridiculous. No wonder this Google Books thing is a debacle. - Richard Akerman
Wait a minute. Are you suggesting that, despite any costs incurred in scanning books, scanned images should somehow be free for the taking? As if there was no labor involved? When Dover Books reprints copies of old titles, no one suggests that Dover should be giving those printed copies away for free. - Jill O'Neill
Bear in mind that Dover frequently just used old printed versions of texts themselves, not resetting type or anything like that. - Jill O'Neill
Jill, you're conflating atoms with bits. I *bought* Starry Messenger, I had no expectation that the print version should be free, even 400 years later. However, I do have an expectation that digital page images from a centuries-old book should be placed in the Commons. But anyway, I don't want to rehash what I'm sure has been thoroughly covered in the Google Books battle. - Richard Akerman
"Not to the intellectual property (text) written by Sidereus, but to the scanned images of his words" - true, but the text is out of copyright now. So someone could scan it and place the entire work in the public domain. - Nick Lothian
@Richard - I'm not sure this is all that closely related to the Google Books battle. Most of the argument there is about books that are still in copyright. This work isn't, and the text is available copyright free online. I agree it would be great if someone created public domain or CC-licenced photos of the actual book, but that's a different argument to the Google Books battle. - Nick Lothian
There is a free photo of the cover available on wikipedia, with wikipedia-compatible licence: - Nick Lothian
So perhaps the objection needs to be framed differently. I agree that some copy somewhere of public domain material should be made available at little or no cost to the public (ie as in the instance of Project Gutenberg). I just don't think we should yell at those who do demand some form of financial compensation for their effort. I don't remember the library community yelling at Dover... more... - Jill O'Neill
@Jill - Octavo has a right to claim revenue. I'd prefer if one of the custodians of the physical books took a scan of it and placed it in the public domain, though. (Out of interest, though - how does Octavo make revenue from that scan? Do they sell it or something?) - Nick Lothian
Octavo was a service provider; they were hired by museums and archives (and upon occasion monastery libraries) to provide them with archival quality scanned images of texts that needed to be both preserved and made accessible to scholars. They provided that service to those who protected the artifacts and that was/is their primary source of revenue; in turn, where permitted, Octavo... more... - Jill O'Neill
I understand the frustration of not being able to get to a "free" version of a text or document; I run up against it all the time. Richard was venting a little bit and expressing that frustration and I ought not to have turned this into a confrontation (and for that I apologize, Richard.) But I also get frustrated when I see the wrong people (translation vendors or publishers) blamed... more... - Jill O'Neill
Thanks, Jill, for adding the thoughtful comments to the discussion. - Peter Murray
Sorry for my tone - I'm just frustrated because I wanted to be able to show some students "look at this page from 400 years ago" and I can't. Do I have a "right" to be able to show them something that I couldn't even find before Internet search and digitisation? No. *Should* that be a reasonable right and expectation... well, I think it should be. Just for some added bizarreness, you... more... - Richard Akerman
Incidentally the full latin text is available at - so then my question becomes, what are my rights to use a screenshot of that site? Would the presentation of the text and accompanying images on the web page be some sort of protected object under Italian copyright law? (There is no copyright notice on the site that I can find.) - Richard Akerman
Incidentally in reading Jill's comments, I should have made the context clear - I don't want the text, I already bought it in translation and showing the students the original latin won't go very far - it was actually the page image itself that I was interested in showing. But naively, isn't this just "faithful reproduction" of a public domain work? Which I thought for e.g. art that's... more... - Richard Akerman
If you ran the scanned pages through OCR, the resulting text wouldn't be under copyright. Similarly, if I take a photo of the Gutenberg Bible, that photo is mine and I have copy rights to it, but I can't stop anyone else from taking such a photo, nor can someone else stop me from publishing my photo. (I'm not sure on that last point, as the Gutenberg Bible may be considered art, and I... more... - Kevin Fox
I think this will pretty much explain it all, using US & Canada law. The answer might surprise most people here, since it doesn't agree with what most of you have said. - April Russo (FForever!)
@Richard - you can show them the page from wikipedia (linked above). Also, the explicitly say "For research use", and your usage may apply. - Nick Lothian
Actually, Glen, it's not accurate to say that you're admitting infringement. From section 107 "fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, *is not an infringement of copyright.* " (emphasis added) - lris
Even so, Steve's point is that this new work may not be copyrightable because it is not sufficiently original. - lris
"Slavish reproductions" of public domain works are not eligible for copyright protection. No matter how much work went into making a scan, or how much skill it may have taken to do it, it's a "slavish reproduction". Now, if they added something original to it, not found in the original work, then it would be covered by copyright. But just digitizing by scanning doesn't produce a derivative work that can be copyrighted. - April Russo (FForever!)
If you exceed fair use, then yes, it's infringement (and yes, the definitions aren't set in stone). But that doesn't mean that claiming fair use is admitting to infringement. - lris
For anyone that doesn't understand what constitutes a slavish reproduction, this is an example, that can't be copyrighted: This is an example of the same works, reproduced in a way that IS covered by copyright, because it is a new work of art: - April Russo (FForever!)
I _think_ Richard is in Canada, so fair use might not apply (I'm in Australia, and we don't have fair use. We do have exemptions for educational use, though) - Nick Lothian
I think Canada may have a copyright collection agency, which automatically collects money from educational agencies for photocopying (and - web browsing). In Australia, we have one, and I think it would cover this situation. (IANAL etc) - Nick Lothian
Has anyone asked an art professor about this? Lots of art history classes taught from slides of 'non-copyrightable works'. I would imagine there's a hefty precedent for what's OK in terms of lectures. Multi-discipline FTW :) - Paul J. Davis
I am very surprised, I have to say. I usually work with databases, and what I am always told is that it is irrelevant how much hard work goes into making it - unless the result is a "creative work", it is not protected by copyright law. I find it very hard to comprehend how scanning a book can be considered creative and give rise to copyright. If I take screen shots of things from Project Gutenberg, do I get copyright on my bitmaps? What if I print out the texts and scan them in again? - Lars Juhl Jensen
@Lars - the tendency to give monopoly privileges to non-creative things like database compilation is very different around the world. The EU is particularly bad. More here - Anders Norgaard
I am in Canada, and i think our equivalent is "fair dealing", plus we have law that covers photocopies but I think is not so clear in the digital age. There is an underlying point that one shouldn't have to be an international copyright lawyer in order to work with digital objects. If I have time I will try to track down more Canadian info on the topic. - Richard Akerman from BuddyFeed
Greg Tyrelle
If we're to get continued funding, and support JBrowse to meet all the requests that you have justifiably demanded, it's absolutely critical that we demonstrate an overwhelming demand for JBrowse from the user community -- that means you.
"Also, in another subproject of the same grant, we plan to develop a JBrowse version of the GBrowse_syn synteny viewer developed by Sheldon McKay, so you can browse syntenic regions of homologous genomes side-by-side." definitely convinced me that a letter is in order. - Paul J. Davis
Thanks for posting! - Mitch Skinner
Chris Lasher
What are you using to visualize biological networks? -
What are you using to visualize biological networks?
[Image: "network" by Simon Cockell, link to original source for attribution] In particular, what are you using to view networks in a dynamic way? Cytoscape seems the behemoth for visualizing biological networks, but have you used other solutions? A group mate and I have been discussing how we could *really* use a JavaScript library that rendered as SVG in a web browser. This would allow us to call for more data from our databases and present data on the fly and only-as-needed. He's convinced we could start writing one; I'm concerned it's a lot more difficult than he thinks, but we're both fishing for other ideas. - Chris Lasher from Bookmarklet
for working with SVG and Javascript, maybe try svgweb or Raphaël - Mike Chelen
We're using Raphael in the Protein Geometry Database <> for rendering our graphs. It's been a huge boon to cross-browser compatibility since IE *still* can't do SVG. - Donnie Berkholz
You might try a particle demo on Chromium's canvas element. Other than that I think the optimizations for performance could end up being the bigger project. Not undoable, just maybe not worth the time. Also, specifically Chromium as they're canvas element is current the best, though hopefully FF and Safari aren't too far behind. For IE, just give them a link to Chrome Frame ;) - Paul J. Davis
Is this meant for big graphs? Is the focus on layouts etc? Prefuse is a pretty nice toolkit for graph layout/calculations. igraph also is an excellent library with a bunch of layout algos - Rajarshi Guha
Many thanks for your input, guys. @Rajarshi We have graphs of 5k-12k nodes and 20k-80k edges. I don't know if that counts as large but they're not small. @Paul I'm worried about performance (responsiveness) given that Firefox can take a pounding in some of the demos at That's why I wonder if Cytoscape is really the only option (right now). @Mike and... more... - Chris Lasher
Thanks, Chris! I certainly hope it enables some folks to do interesting science instead of just looking pretty, though. =) - Donnie Berkholz
Mr. Gunn
The Dataverse Network Project (via @communicating) -
"Via web application software, data citation standards, and statistical methods, the Dataverse Network project increases scholarly recognition and distributed control for authors, journals, archives, teachers, and others who produce or organize data; facilitates data access and analysis for researchers and students; and ensures long-term preservation whether or not the data are in the public domain." - Mr. Gunn from Bookmarklet
do uploads require java? - Mike Chelen
Don't know much about it, just kinda thought it was interesting and very ambitious sounding. - Mr. Gunn
yes, looks cool, and the sort of thing people can get running to host their own collections - Mike Chelen
Impressive - they do proper citations w/ persistent identifiers and everything: . will have a closer look at this. They use Handles for this - the DOI system is based on Handles as well, so one can resolve these IDs via the CrossRef service. => Gary King; Langche Zeng, 2006, "Replication Data Set for 'When Can History be Our Guide? The Pitfalls of Counterfactual Inference'" hdl:1902.1/DXRXCFAWPK - 'Mummi' Thorisson
I'm willing to bet that future archeologists will have less trouble making sense of URLs than any of the other schemes that are supposed to outlive the Web... - Eric Jain
The "UNF" (checksum) is supposed to be based on format-independent, canonical representation of the data. Good luck with that! - Eric Jain
The underlying idea is spot on. But no Unicode normalization? And rounding instead of BigNum? - Paul J. Davis
The "canonical" representation ends up being just another data format that needs to be supported. - Eric Jain
Eric Jain
Anyone here used HDF5 for storing data?
Deepak's former colleagues worked with HDF5 - Pierre Lindenbaum
I haven't used it yet, but glu-genetics, a python toolkit for gene association scans ( uses HDF5 through PyTables ( The relevant code is here: - Brad Chapman
I saw it awhile ago and thought that I should write Python wrappers. PyTables looks nice, but I think I'd try using first as it seems simpler from scanning the docs. - Paul J. Davis
Eric Jain
Yet another high-throughput sequencing start-up:
Examples of others? - Hope Leman
@Hope - Pacific Biosciences and Helicos maybe? Not sure on 454 and Illumina's origins. - Paul J. Davis
Complete Genomics - Eric Jain
NABsys - Eric Jain
Hi, guys--thanks for the info! - Hope Leman
Illumina has been around for a while and acquired Solexa, which was a next gen startup. 454 also been around a while and now part of Roche Applied Science. Pac Bio, Complete, there's one with nano in the name that I can't remember. Does Helicos still qualify as a startup (they are a public company) - Deepak Singh
@Deepak, perhaps you're thinking of Oxford Nanopore Technologies? - Rob Syme
Rob, I think that's it. Thanks - Deepak Singh
On the recent embargo breach involving GWAS data and a PNAS publication (which was recently retracted). - Hilary
Good to see people taking the ethical side of this seriously. I'm less convinced about the value of specific rules and more by the idea that this should just be seen as bad behaviour but very glad to see people coming down on it like a ton of bricks. That's what will make people feel safe - not rules, not regulations, and not compulsions either, but very strong and public responses to breaches. - Cameron Neylon
@Cameron +1 . But ideally some kind of consequences/punishment surely would be order as well, e.g. the authors responsible would not be kindly received next time they ask for ethical approval to access controlled-access data from NIH (or other) repositories. Some sort of blacklisting for 'repeat offenders'? - 'Mummi' Thorisson
Not greatly in favour of blacklisting per se. I would say that it was a disciplinary offence though that ought to consider dismissal from post. Which really amounts to the same thing. - Cameron Neylon
Please correct me if I'm wrong, but I thought the consequences ("punishment") was that their paper was retracted. - Hilary
The paper was published Aug 31, retracted Sep 9, when all the authors had to do was to ask PNAS to publish it no earlier than Sep 23 to comply with the GENEVA data embargo policy. The closeness of all the dates suggests to me that it was more a serious messup than a malicious breach of policy. - Iddo Friedberg
Hilary, I would say that the retraction is just the reversal of the act rather than punishment. Paper shouldn't have been published, therefore it was "unpublished". If (and there should definitely be a proper investigation) someone thought they could get away with playing outside the rules there should be punishment above and beyond simple reversal in my view. This is "conduct unbecoming..." etc. But as Iddo says, not clear from the dates whether it might just have been a screwup. - Cameron Neylon
Cameron: a retraction is a very bad thing to have on your record. It is for all intents and purposes synonymous with"fraud". - Iddo Friedberg from Android
Without *knowing* the intent was malicious, forcing a retraction seems a bit harsh. If data is online it should be intended for use by the public. IMO this is just another argument for mandatory DOI's and better dataset citations. On the other hand, calling out a group for not having the courtesy or awareness to contact the originating lab is a good thing. Like Cameron said, the social norms are probably the best way to play this. - Paul J. Davis
Also, don't physicists have a pretty good system for the whole idea of citing datasets? NCBI's ability to provide transparency in terms of what data came from where and when is pretty atrocious, so its a bit weird to consider for biology. But I thought I read that the LHC data was pretty much available for citation. - Paul J. Davis
Iddo - I disagree on two counts with that. There are plenty of retractions out there that are honest mistakes or re-assessments. Embarrassing yes, emblematic of sloppy work yes, synonomous with fraud, nah. But more importantly if we take that kind of attitude then people will be too scared to correct things in the future - when we will (hopefully) have much more fine grained approaches... more... - Cameron Neylon
Paul - I think citing datasets at NCBI isn't so hard. I'm not sure that's really the problem in this case (if it is then it's a definite mark against the authors). The problem is the culture in biology that collecting the data isn't worth anything so having a highly cited dataset isn't useful on your CV - no matter how good or useful it is. Only the paper matters. I have to say I haven't actually had the time to look over this case in detail though. - Cameron Neylon
This also raises issues of roles of journals, institution employing authors (often several, in different countries/legal systems, as papers now almost all multi-author), and funders in "policing" sci ethics. Lots of talk everywhere about this. Journals can publish policies and retract/correct (ensuring linking in A&I dbase searches etc) - but how can sci community deal with wider issues beyond the paper? (quite apart from the technical problems with enforcing eg "blacklisting") - Maxine
Neil - in response to your Q above - v hard in practice to be perfect but from journal's perspective: (1) consult with relevant community and state policies for standards all agree and (2) the peer-review process (advice from reviewers on repeatability). Also, of course, journals can in general encourage authors to disclose more rather than less. - Maxine
+1 Neil and Maxine. There is too much of an expection for "the journals" to sort this out. Publishers have an important role to play but we need to clean our own house. Or someone will do it for us. Probably the public. And probably by saying that they're not so interested in funding science any more. - Cameron Neylon
Thanks, Cameron. I agree, journals can and should help but as part of a wider process that scientists themselves (as a profession) decide is "best practice". Neil - have had this "code" discussion with eds here before - one view is that the documentation better/more meaningful to scientists (who aren't programmers in the main) - also many programmes are not open-source. Probably other points which I don't immediately recall. Nature Biotech is running community consult at the moment on this, I think. - Maxine
Andrew Su
There are 591 tRNA genes in Entrez Gene. For 64 codons, why so much redundancy?
most tRNAs have some bases modified - Pierre Lindenbaum
46 for ala, 44 for leu, 36 for lys, 35 for arg, ... - Andrew Su
Just curious , what is your query to find those numbers ? - Pierre Lindenbaum
among the 46 coding for ala, 30 use the AGC codon, 10 use UGC, and 6 use CGC. Presumably pseudo-tRNAs are easily removed (when secondary structure is disrupted), so why so many real and redundant genes? - Andrew Su
admittedly a potentially imprecise hack, but I'm parsing gene_info from NCBI: <shell>gzip -cd gene_info.gz | gawk -F"\t" '$1=="9606"&&$10=="tRNA"{print $9}'</shell> and then a bunch of sed, sort, and uniq piped after that... - Andrew Su
ah, ok, I thought it was a request in Entrez - Pierre Lindenbaum
Hmm, number of tRNA genes seems to correlate well with the amino acid usage frequency in vertebrates ( Perhaps this is the answer? - Andrew Su
tRNA is likely redundant than you think. - Ami Iida
I'm also pretty sure that transcription throughput is affected by copy count. As in, you need lots of tRNA's floating around to keep the ribosomes busy. An easy way to make that happen is multiple genomic copies. - Paul J. Davis
Paul, point well taken. Just would have thought that we higher organisms would have developed more elegant regulatory solutions to take care of that. Copy number is just so ... primitive... (though I suppose so are tRNAs...) - Andrew Su
Also, there is apparently a rich literature around correlations between tRNA copy number and codon usage (e.g., ) - Andrew Su
I'm not an authority on this by any means, but think of it in terms of computers. A polymerase during transcription has effectively locked that gene copy. Thus you're rate limited by the time it takes to transcribe (roughly). Granted, I have no idea on time scale here. BioNumbers might have a bit of illumination on that part. - Paul J. Davis
Wow, bionumbers looks cool. Back when I thought I was interested in quantitative biology, I would have thought it was extra-awesome! ( - Richard Klancer
@Richard It's interesting stuff. I saw a seminar that biologists hated but I maintain it was cause the presenter sold it wrong to biologists. They need to position themselves like this: The hugely useful reference for numbers. Or the Farmer's Almanac for biology. - Paul J. Davis
You think you've got problems. Last I looked we had 500,000 of them! - Paul Gardner
Other ways to read this feed:Feed readerFacebook