Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »

Mitsuteru N › Likes

arek
tony yuan
SDF Viewer and Publisher - Making a sortable, searchable structure view: http://www.olncloud.com/oln...
SDFViewer.jpg
Very nice. How does it scale to large SD files? - Rajarshi Guha
Egon Willighagen
Importing Nanotoxicity Data with SPARQL into R for analysis - http://chem-bla-ics.blogspot.com/2011...
Pierre Lindenbaum
Pubmed: sorting the articles on the number of times they've been cited - http://plindenbaum.blogspot.com/2011...
ReaderMeter
Open data folks, I'd love to hear about your experience and pros/cons of using specific platforms for research data sharing
e.g. what platform do you use in your community? Are you happy with that? How steep was the learning curve for you and your collaborators? - ReaderMeter
What kind of platforms are you thinking about here? - Egon Willighagen
CKAN, FigShare, Dryad (via EvoMRI) and the like. It's for a social computing community. - ReaderMeter
Anyone aware of a a feature comparison of data repositories? #opendata - Daniel Mietchen
"Dryad accepts data in any format as long as it is associated with a primary publication" I guess this is too restrictive as it makes the platform unsuitable for hosting research datasets the Wikimedia Foundation generate but that do not make the object (at least not immediately) of scholarly publications. - ReaderMeter
Not an actual feature comparison but at least a useful directory to start from: http://wiki.civiccommons.org/Data_Pl... - ReaderMeter
I started a basic feature comparison spreadsheet at: http://bit.ly/OpenDat... - ReaderMeter
Noel O'Boyle
MIOSS - Open Source in Chemistry workshop - http://baoilleach.blogspot.com/2011...
"An interesting aspect of Taverna is that workflows can be stored at http://myexperiment.org, and once set up can even be run directly on that website without installing Taverna." Noel do you know how to do that? - we have some Taverna workflows uploaded to MyExperiment - it would be useful to run them on MyExperiment as web services - Jean-Claude Bradley
I can't figure out if that's true or not now. Maybe I'll amend the text, at least until the slides are sent around. - Noel O'Boyle
Jan Jensen
Interactive chemistry ebooks: highlight and annotate - http://molecularmodelingbasics...
Interactive chemistry ebooks: highlight and annotate
looks like an awesome blog for anyone teaching chemistry - I'll look into doing some experimentation with my organic class with some of these tools and ideas - Jean-Claude Bradley
Pawel Szczesny
How many times you can publish the same service? http://www.ncbi.nlm.nih.gov/pubmed...
All three articles are describing slightly different aspects of the service, but it's still _the same_ service (!). - Pawel Szczesny
Well papers are academic currency. Any way to increase your wealth (utility be damned) :) - Rajarshi Guha
But three? Within six months? 200% inflation rate is going to kill this system... :) - Pawel Szczesny
@Pawel... agreed. This is the same paper. It is not uncommon for biology groups to publish the 'tool' separetely from the 'science', but this sounds ridiculous... plagiarism it is... actually, all journals I have been reviewing for in chemistry, do not allow results to be published before... I can't believe there are so many angles to this tool that those journals would have allowed it... within 6 months... that means they must have been submitted simultaneously :) - Egon Willighagen
Actually they're doing themselves a major disservice. By publishing the same thing 3 times they effectively divide their citations by 3, which harms their H-index. - Paul Gardner
And none of these 3 papers cite Jmol! Or even mention it... - Egon Willighagen
NAR often includes previously published databases and software. - Matt Hodgkinson
I've also heard of a rejection to the NAR webserver issue b/c of a Bioinformatics Application Note. But I'm not sure if this is a general policy. Once you're in the NAR db / webserver issue, you can re-submit after 2 years. - Michael Kuhn
I'll play devil's advocate. Apart from the reaction against CV stuffing is there any good reason not to do multiple publications for a service? If the argument were, for instance, to reach a series of different audiences? - Cameron Neylon
multiple pubs in multiple venues are fine. But pubs are currently a currency and basis of competition (amongst other things); from this POV, spamming journals with multiple articles devalues the individual articles - Rajarshi Guha
Agreed but surely its the author's choice to balance that devaluation against potential value gain of reaching new people? I guess what I find interesting is that people feel that protecting against publication inflation is a bigger concern than getting information out efficiently. Similar case where a piece in PLoS Currents was subsequently published elsewhere and everyone got their... more... - Cameron Neylon
I checked the website, and the have an attribution clause... I could not find the attribution requirements, but nothing stops them from asking people to cite all *3* papers... - Egon Willighagen
@Cameron... I think it's a problem of inflation, and devaluation. 3 papers is simply more rewarding, and everyone not publishing more or less the same thing trice is effectively punished. - Egon Willighagen
Perhaps, but is that not a symptom of measuring the wrong thing? If we actually measured re-use (e.g. citations) and three papers meant the number of citations were cut in three for each paper and the total number was the same then we'd be ok right? No devaluation? The problem here is not that its being published three times but that we value the wrong things (number of papers) in a system that enables (or even encourages) cheating. - Cameron Neylon
What @egon said. My basis for this argument is that, in principle, multiple pubs in different venues are fine (I'm not sure how different the venues were for this case). And in a world where the nuances (or lack thereof) of these multiple pubs are taken into account, this would be fine. But in the real world, where jobs/grants/promotions are (unfortunately, frustratingly) based on a... more... - Rajarshi Guha
@cameron - absolutely! We are measuring the wrong thing. But, that's what we're measuring. So to stay in the race, we (well, not me, it doesn't matter to me much anymore) play the game, whose apparently best strategy is to publish as much as we can. I'm sure that with your and others' efforts this will change one day - but people still want to get their jobs/grants/promotions ... - Rajarshi Guha
Agreed - and this isn't a case where I'd argue much in their favour. But the thing with PLoS Currents was a bit different but got a very similar response. Interested whether people feel that's as egregious a case. - Cameron Neylon
is there a link to the PLoS Currents discussion? - Rajarshi Guha
Not sure if these particular 3 papers are what I usually think of as duplicate papers. The "Acta Crystallographica Section F" one is part of a special issue about the JCSG pipeline, so I think it's reasonable there even if it's duplicating things. And my opinion is that the NAR database/server issues are also a special case - as they provide a resource to the community and often describe websites that have been published elsewhere. In short, not the most straight forward example of duplicate publications. - Mickey Kosloff
Cameron, if you're playing devil's advocate, don't forget to send an invoice to NPG, because they will profit the most from perceived inflation of papers outside of Nature* ecosystem. :) But let me play the game as well - if we allow for such marketing strategy, it gives yet another advantage to people who use English natively and have no problems to write five different stories on the same discovery. Yet another penalty for not being British? Thank you so much, Cameron ;). - Pawel Szczesny
Pawel++ - Egon Willighagen
Mickey, while I agree these are "special cases", not clear duplications, I still don't really get it why it's allowed in a first place. When I was reviewing a manuscript for NAR special issue I'd asked authors to improve the service in comparison to the original (published few months earlier) despite clear policy on allowing duplicates. Today, probably I would refuse to review for NAR special issue at all... - Pawel Szczesny
Incredible how people behave like you expect them to behave in these comments. Very revealing and eye opening. - pn
Pierre Lindenbaum
XSLT+XML+geolocation=KML "BioStar users on a world map" - http://plindenbaum.blogspot.com/2011...
Attila Csordas
The what, where, how and why of gene ontology—a primer for bioinformaticians — Brief Bioinform - http://bib.oxfordjournals.org/content...
"In principle, researchers directly annotating genes they themselves characterized would be more efficient, but this practice has not yet caught on because annotation is time consuming and annotation guidelines are complicated" - Attila Csordas from Bookmarklet
Rajarshi Guha
Man places his genome in public domain, on Github - http://manu.sporny.org/2011...
*cough* his SNPs are public domain *cough* - pn
Good point about the need for better search and analysis tools - Mr. Gunn from YouFeed
He's braver than me....not sure I want to go there. I did share my Paternal Haplogroup and projections of what I might look like fat, bald and old though: http://tinyurl.com/5upk6p9 - Antony Williams
Just wondering how the type of data placed into the public domain here differs from that in Genomes Unzipped? http://www.genomesunzipped.org/project In the latter case it also appears that at lot of care was put into the ethical considerations of the people concerned plus there's strength in numbers. - Dan Hagon from Android
Pierre Lindenbaum
Champagne ! Our paper is accepted in "Nature Genetics" :-)
Congrats! What's it about? Or doesn't that matter? :) - Egon Willighagen
@Egon, I'm going to wait for the publication of this article before talking about it :-P - Pierre Lindenbaum
Congrats!! - Björn Brembs
I'm glad you got a paper out, but I'm sorry you couldn't get it published in a real journal. - Bill Hooker
@Pierre... ah, embargoed... obviously increases dissemination of scientific results :) - Egon Willighagen
Congrats !!! - Mitsuteru N
Lars Juhl Jensen
pn
pn
Sharing proteomics data, trickier than it seems - http://blog.dannynavarro.net/2010...
Think the rub is both a) standards are still lagging behind users (formats for marking up identications and quantitations are still in development), and b) mass spec manufacturers and key software tools are still not fully supporting the standards that already exist. - Neil Swainston from iPhone
Egon Willighagen
Converting JSON to RDF/XML with Groovy - http://chem-bla-ics.blogspot.com/2010...
Is there a link to the rdf file? - Andrew Lang
No, not yet. But you can use the SPARQL end point to 'download' all the data... - Egon Willighagen
I was trying to see how to format the solubility data. - Andrew Lang
That's next up... Bioclipse has a manager that creates RDF... see: http://chem-bla-ics.blogspot.com/search... - Egon Willighagen
I will use CHEMINF, which is a bit cryptic (inherited from OBO), and elaborate (deliberate, to allow proper provenance of data)... paper should be submitted any day now... - Egon Willighagen
CHEMINF and related ontologies also allow marking something up as solvent... for that, browser the HTML+RDFa linked to here: http://chem-bla-ics.blogspot.com/2010... - Egon Willighagen
(I think I added it there... need to check, actually...) - Egon Willighagen
Andrew, the solvent annotation is part of this page: https://hudson.ch.cam.ac.uk/job... - Egon Willighagen
Thanks Egon. - Andrew Lang
Does this work? <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999..." xmlns:chem="http://www.blueobelisk.org/chemist..."> <rdf:Description rdf:about="http://old.oru.edu/cccda..."> <chem:Solute>benzoic acid</chem:Solute> <chem:Solvent>1,4-dioxane</chem:Solvent> <chem:concentration>3.088</chem:concentration> </rdf:Description> </rdf:RDF> - Andrew Lang
Unfortunately the RDF file that Pierre created for our solubility data is now a broken link http://usefulchem.blogspot.com/2008... - Jean-Claude Bradley
@Andrew: sure. There are no rules, in that respect. The use of common ontologies is merely a matter of convenience... - Egon Willighagen
How's this? Is this useful? Would you add/change anything? http://showme.physics.drexel.edu/onsc... - Andrew Lang
That's very XML-like... have a look at this: https://gist.github.com/766302 - Egon Willighagen
Bioclipse can generate RDF directly from the ONS Google Spreadsheet, but I have not had time yet to look at that... will try that soon. - Egon Willighagen
Thanks Egon. I don't see a closing to the <chem:Measurement> or an opening for </rdf:Description>. Should the </rdf:Description> be </chem:Measurement>. Also the units are M. Would it be ok just to stick that at the end of the number, e.g. 2.380M, or do I need to put units separately from the value? - Andrew Lang
Yes, the rdf:Description should be a closing chem:Measurement. You can better change the predicate to molarConcentration, I think. That way, the field stays a float. - Egon Willighagen
I'm trying to convert the SMILES to InChIs. I've tried http://cactus.nci.nih.gov/ and ChemSpider and both fail and are slow. Also some urls generated don't resolve, e.g.: http://rdf.openmolecules.net/InChI=1...) - Andrew Lang
Andrew, there should be a ? just before the InChI= ... like: rdf.openmolecules.net/?InChI=1/CH4/h1H4 ... - Egon Willighagen
Thanks Egon, that makes sense. - Andrew Lang
Pierre Lindenbaum
SVG image in wikipedia + Zoom.it =a Zoomable "Tree of life" - http://plindenbaum.blogspot.com/2011...
Lars Juhl Jensen
Daniel Lemire
Most browsers may soon have object-oriented key-value stores: The W3C Indexed Database API http://www.w3.org/TR... (via @oldaily)
Hope Leman
Softwares for drawing graphical abstracts http://blogs.nature.com/andrews... via @AddThis
SVG should be the norm for these types of diagrams. Particularly since it supports RDFa annotations. Inkscape is excellent. I'm also looking into Apache Batik for programmatic generation of diagrams. - Dan Hagon from Android
I love SVG , but it cannot be used it if there are too many points/objects. - Pierre Lindenbaum
Rajarshi Guha
Noel O'Boyle
Really useful. Thanks Andrew! - Noel O'Boyle
This could be really useful for us too - we need the SMILES to be in a particular order for the starting materials to generate libraries of Ugi products http://usefulchem.blogspot.com/2010... - Jean-Claude Bradley
Exactly. I'll be looking into Andrew's code after Christmas, and help you out if need be. - Noel O'Boyle
thanks Noel! - Jean-Claude Bradley
Roderic Page
OMG! The Plant List http://www.theplantlist.org is licensed under CC BY-NC-ND "You may not alter, transform, or build upon this work" #epicfail
Like as in, wtf were they thinking? - Bill Hooker
I wonder how that is supposed to mesh with the section on "Enhancing The Plant List": http://www.theplantlist.org/about... - Rutger Vos
Lars Juhl Jensen
MIPS: curated databases and comprehensive secondary data resources in 2010. - http://www.ncbi.nlm.nih.gov/entrez...
Jonathan Eisen
Shirley Wu
Tim Yates
file formats in bioinformatics are a complete mess. Far too much free text for important data makes it impossible to write generic parsers
Why do you think #OpenBabel was so popular? (there are more reasons nowadays) - Egon Willighagen
Tooting my own horn here, but our lab has come out with a lightweight sequence format specifically to address the metadata-as-free-text problem. 61 proteomes from the Reference Genome Annotation Project are already available. See http://seqxml.org and http://www.ebi.ac.uk/referen... - Dave Messina
Pierre Lindenbaum
Answer by Pierre Lindenbaum for Determining which new SNPs in 1000G data result in coding changes - http://biostar.stackexchange.com/questio...
Pawel Szczesny
Finding myself doing analysis in #cytoscape and using #gephi for visualization. The latter is so much faster.
I do some annotations, clustering and enrichment analysis in Cytoscape via its plugins. I haven't use igraph yet, but I guess it's just a matter of time as soon as I move my analysis entirely to R. - Pawel Szczesny
@Neil - Cool, igraph is new for me, I learned something! - joergkurtwegner
Got to cechkout Gephi - Cytoscape is really slow for graph vis - Rajarshi Guha
wow .. where did that come from, sexy networks :) thanks - Pedro Beltrao
umm, not as amazing as the videos made it look. the layouts in cytoscape look better. it is fast - Pedro Beltrao
Benjamin Good
How many articles in PubMed contain information about genes? - http://i9606.blogspot.com/2010...
Gene2pubmed provides a nice lower bound but what about the upper bound? This article http://psb.stanford.edu/psb-onl... suggests that we may see a recall of about 0.55 for gene2pubmed in identifying genes in articles. That would suggest that the number over all of pubmed may be closer to 6%. - Benjamin Good
Joachim (http://joachimbaran.wordpress.com/)) posted some additional data as a comment on my blog post. He suggested: "...I would now say that MEDLINE's baseline 2010 has 1.7M * 90% / 73% * 90% / 98% = 1.9M gene mentions in its titles + abstracts. That would mean that 19% of titles/abstracts -- for which there is an abstract -- have a gene mention..." - Benjamin Good
That sounds like a good upper bound - so we've got somewhere between 611,108 and 1,900,000 (*2 if Joachim's data extends to papers without abstracts) articles with information about genes. (With 'information' defined loosely enough that a mention indicates its presence.) - Benjamin Good
Abhishek Tiwari
Other ways to read this feed:Feed readerFacebook