Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
OpenSci Info

OpenSci Info

Following the open access, data, standards, notebooks, and other information sharing to support science research and education.
Mike Chelen
"Links any Document Object Identifiers for resolution with http://dx.doi.org" - Mike Chelen from Bookmarklet
currently works with plaintext DOIs like http://www.plosone.org/article... and with DOI class spans like http://www.nature.com/nature... - Mike Chelen
RegExp missing some chars, e.g., % if URL-encoded. We tend to use 10\.(?:\d{4})/(?:[^ "'<&]+) but that can break too - it's annoying that the DOI spec doesn't limit the chars allowed! This is for searching the whole text (incl. within href), not element-wise so might not suit your code exactly. - Fergus Gallagher
Mike Chelen
"OSCAR3 (Open Source Chemistry Analysis Routines) is software for the semantic annotation of chemistry papers. The modules OPSIN (a name to structure converter) and ChemTok (a tokeniser for chemical text) are also available as standalone libraries." - Mike Chelen from Bookmarklet
Mike Chelen
access PLoS article level metrics by DOI for use in Javascript and more - Mike Chelen
Mike Chelen
PathVisio / WikiPathways tool for creating and analysing biological pathway diagrams - http://www.pathvisio.org/
PathVisio / WikiPathways tool for creating and analysing biological pathway diagrams
I'd really like to see that refactored as a collaborative Google Wave gadget. - Dan Hagon
Me too, re: Google Wave gadget. Then add SBML support, use SBGN in the display, support MIRIAM annotations and I can retire penniless. - Neil Swainston
@Neil, there are a few other tools which support SBML and SBGN (see http://sbgn.org/Communi...). Wikipathways seem to be inventing yet another pathway format and dont provide a conversion to any other existing "standard". Shame as it would benefit everyone if they did. - Frank
Thinking about it a little more, I'd really like to see the above refactored as a collaborative Google Wave Gadget. I've been involved in about five network reconstruction "jamborees" now, which involve flying loads of people around the World to sit in a room and discuss things that they could do with PathVisio (if it supported SBML...) or Payao. Anyways, this costs a fortune (see the... more... - Neil Swainston
@Neil: WikiPathways is intended exactly for that type of collaborative pathway creation. WikiPathways pathway format is based on, and developed in cooperation with, http://www.genmapp.org. So admittedly it's not a widely supported standard, but at least it wasn't a complete new invention. SBML / SBGN support is on its way. Re Google Wave: unfortunately, all this work predated Google Wave by several years... - Martijn van Iersel
Mike Chelen
"Discussion of issues relating to the use of Debian for science research, including useful packages, particular problems faced by scientists using Debian, how to make Debian more useful to scientists, etc." - Mike Chelen from Bookmarklet
there are a number of useful software packages that have been integrated already, and debian makes a great foundation for science projects since it can run well on servers and desktops. it is also used as the basis for other popular derivatives like ubuntu - Mike Chelen
here are some of the packages currently available: http://blends.alioth.debian.org/science... - Mike Chelen
Mike Chelen
Symposium on the Data Sharing Plans and on the Scientific Benefits of Data Sharing in GEOSS - 16 Nov 2009 - http://sites.nationalacademies.org/PGA...
Symposium on the Data Sharing Plans and on the Scientific Benefits of Data Sharing in GEOSS - 16 Nov 2009
Show all
"The Global Earth Observation System of Systems (GEOSS) 10-Year Implementation Plan explicitly acknowledges the importance of data sharing in achieving the GEOSS vision and anticipated societal benefits. The Plan, endorsed by nearly 60 governments and the European Commission at the Third Earth Observation Summit in Brussels in 2004, highlights the following GEOSS Data Sharing Principles: 1. There will be full and open exchange of data, metadata, and products shared within GEOSS, recognizing relevant international instruments and national policies and legislation. 2. All shared data, metadata, and products will be made available with minimum time delay and at minimum cost. 3. All shared data, metadata, and products being free of charge or no more than cost of reproduction will be encouraged for research and education." - Mike Chelen from Bookmarklet
good to see a specific focus on "open exchange of data" and it will be interesting to hear the plans to achieve this - Mike Chelen
Mike Chelen
Student coalition for open access now represents over 5 million internationally - http://blogs.unimelb.edu.au/library...
"The student Right to Research Coalition, a group of national, international, and local student associations that advocate for governments, universities, and researchers to adopt Open Access practices, has now grown to include some of the most prominent student organizations from the United States and across the world. The recent addition of 8 new organizations brings the number of students represented by the coalition to over 5 million, demonstrating the broad, passionate support Open Access enjoys from the student community." - Mike Chelen from Bookmarklet
helps to appreciate the global reach and scale of these scientific concepts - Mike Chelen
Mike Chelen
NIH Notice on Development of Data Sharing Policy for Sequence and Related Genomic Data - http://grants.nih.gov/grants...
wonder which existing published statements from CC science commons and other groups might discuss some of the topics raised? - Mike Chelen
Mike Chelen
OpenSci Info
@mlangill @opensci is happy to help seed these torrents =) beginning with all_plos_pdf #opensci - http://twitter.com/mikeche...
OpenSci Info
all plos pdf 14 Oct 2009 - http://www.mininova.org/tor...
from biotorrents, there should be several fast seeds: http://www.biotorrents.net/details... - Mike Chelen
Mike Chelen
Re: tasks overview wishlist: Canonical citing reference [Debian Science] - http://lists.debian.org/debian-...
"Dear all, last year, Michael opened a discussion to have bibliographic information displayed in package summaries: http://lists.debian.org/msgid-s... In the discussion that followed, we talked about where to store this information, and in which format, since adding more content to the debian/control file is not an easy thing (it ‘costs’ a lot because it goes to pivotal files like the Packages.gz files on our mirrors). A four line summary is available here: http://wiki.debian.org/DebianS... This year, some progresses are being made. For the display, Andreas has modified the ‘Web sentinels’ so that they can display bibliographic informations. See http://debian-med.alioth.debian.org/tasks... for instance. But currently the limitation of the system is that the bibliographic information is in a quite remote location, in the Blends ‘tasks’ files. I am currently working on a new workflow which would help the... more... - Mike Chelen from Bookmarklet
Mike Chelen
Mike Chelen
"Any object in Amazon S3 that can be read anonymously can also be downloaded via BitTorrent. Simply add a "?torrent" query string parameter at the end of the REST GET request for the object." - Mike Chelen
Is that a security risk or a feature? :) - Owen Greaves
Since the files are already public, it should be expected that people would download them. S3 bandwidth isn't free, so letting others help with distribution seems in the interest of the content author :) - Mike Chelen
with a little more info about how the .torrent information is generated, could make a fantastic basis for file distribution - Mike Chelen
This is a great feature of S3. Perl gurus might also want to see: http://search.cpan.org/~qantin... or http://code.google.com/p... - Todd Harris
Mike Chelen
Mike Chelen
PLoS articles by citation type, journal and publication year: 2009, 2008, and earlier sharp contrast - http://manyeyes.alphaworks.ibm.com/manyeye...
PLoS.articles.by.citation.type.journal.and.publication.year.sharp.contrast.2009.2008.and.earlier.png
see which journals received each type of citation (scopus, crossref, and pubmed) and compare among the most recent years - Mike Chelen
Mike Chelen
Shows that PLoS One has an overall equal rate of citation to PLoS Biology, and that more of One's articles have been published in a recent year. - Mike Chelen
Has anyone generated a slightly nicer data object out of this data yet? Been thinking of graphing the correlations of downloads versus citations versus whatever and similar for different journals which really requires a bit of cleaning up the data to be effective but if someone has already done it? - Cameron Neylon
Cameron: what else needs to be done to make the data more usable? the source data here is available as TSV http://manyeyes.alphaworks.ibm.com/manyeye... and CSV or XLS too, is there any other format that would be better? - Mike Chelen
I was wanting to do some analysis that included comparing papers based on time of publication i.e. "what is the average trajectory of downloads?" as well as comparing these across journals so I was hoping someone might have converted to either SQL and/or a set of python objects containing lists of downloads/citations/pageviews by month. Not difficult to do myself but just wondered whether someone else had already. - Cameron Neylon
is this other ManyEyes dataset helpful? http://manyeyes.alphaworks.ibm.com/manyeye... it contains the "per day" figures for most of the metrics, including PDF and XML downloads. some spreadsheet and CSV files are also in a github repository http://github.com/mchelen... which might be convenient for import to SQL - Mike Chelen
That would certainly do one of the things I had in mind but the big problem I was having was with wanting to come up with average initial rates and saturation points to see if there are any characteristics of "hot" vs "slow-burn" papers. I saw some evidence of this in the very crude graph analysis I did when the stats first came out. - Cameron Neylon from twhirl
how could the change in rates for each article be determined given only the totals? while the plos website includes a chart of an article's recent history, the data released so far can show how older and newer articles compare in terms of downloads per day like PDF files http://friendfeed.com/mikeche... - Mike Chelen
Cameron: here can be seen which years and journals have articles with the most downloads per day http://friendfeed.com/mikeche... is this close to what you have in mind? - Mike Chelen
Looks very nice Mike. We should have all the 'missing' usage data (pre Aug 2005 and first 200 PLoS ONE articles) added sometime this week. - Peter Binfield
Peter: great thanks, looking forward to it! any preferences or suggestions about where people might want to look for or share data analysis results? - Mike Chelen
Mike, just realised that I've got a somewhat different dataset that I think hasn't been publicly released yet which includes all these parameters by month - but as Pete points out there are some dates missing. - Cameron Neylon
Actually Cameron, you dont. We have released the usage data down to the month level (and you may be referring to that), but not the citation/bookmarks/blogs/comment/notes etc data (although we track cumulative data on these items, by the month, we only started tracking it in March, so dont really have enough monthly data to release - though we could if people felt it was valuable) - Peter Binfield
Cameron: how about this to see differences in citation source (scopus, pubmed, crossref) grouped by journal: http://friendfeed.com/mikeche... or similarly to look at the combined citation types of each journal: http://friendfeed.com/mikeche... - Mike Chelen
Peter & Cameron: ah yes, there is history data is in the other sheets, hadn't even started looking past the first one =) - Mike Chelen
Mike Chelen
"fpocket is a very fast open source protein pocket (cavity) detection algorithm based on Voronoi tessellation. It was developed in the C programming language and is currently only available as command line driven program. A GUI is in development. fpocket includes two other programs (dpocket & tpocket) that allow you to extract pocket descriptors and test own scoring functions respectively. As the algorithm is very fast it can be used on a large scale level (PDB size for instance)." - Mike Chelen from Bookmarklet
Mike Chelen
Internet Archive: Free Download: CTPUG Meeting 13 - 2008 - Mayavi - http://www.archive.org/details...
Internet Archive: Free Download: CTPUG Meeting 13 - 2008 - Mayavi
"Speaker: Stefan van der Walt" - Mike Chelen from Bookmarklet
Mike Chelen
New Tools for Old Traumas: Using 21st Century Technology to Combat Human Rights Atrocities - http://www.americanprogress.org/events...
"One of the major developments in the human rights field over the past decade has been the increased application of new technologies, such as satellite imaging, database and data analysis tools, medical forensics, mobile phones, and social networking software to situations in which human rights are under threat. The convergence of scientific innovation and human rights advocacy may well represent a major breakthrough in the struggle for human dignity. Full realization of that promise will require far greater collaboration between government, business, the scientific community, and human rights NGOs than we have seen to this point. Our panel will describe ways in which new technologies are revolutionizing human rights work and make recommendations for how the U. S. government can play a leadership role in promoting the nexus between technology and human rights." - Mike Chelen from Bookmarklet
panelists include members of PHR and AAAS - Mike Chelen
Mike Chelen
Experimental PLoS ALM import to Drupal with Node Import, CCK and Faceted Search - http://plos-alm.opensci.info/
CCK node types available at http://github.com/mchelen... and CSV format data at http://github.com/mchelen... . - Mike Chelen from Bookmarklet
The goal is to help people find answers to questions about the data through a web interface. For example, to see which PLoS Biology non-research article had the most Pubmed citations in 2008, http://plos-alm.opensci.info/faceted... . Then hopefully to make the results available as XML, JSON and more. - Mike Chelen
XML or JSON would be fun since the results could be pulled into graphic APIs like Google Charts - Mike Chelen
Mike - this is exactly the type of thing we are hoping people will start doing with this data. Let me know how we can help! - Peter Binfield
Peter: with so much useful data, seeing everything done with it is exciting. a filtered search has been implemented http://plos-alm.opensci.info/article... (uses CCK and Views) with the results now available as RSS. more frequently updated source data would definitely be helpful for everyone interested in the latest information, thanks! - Mike Chelen
raw XML now available, change the url from rss.xml to raw.xml with any RSS feed - Mike Chelen
Mike - in the next couple of weeks we will add all the 'missing' usage data (pre-July 05 and 1st 200 PLoS ONE articles). I will ping you when we have it in an excel sheet. - Peter Binfield
OpenSci Info
Scientific Linux, recompiled Enterprise Linux - https://www.scientificlinux.org/
"SL is a Linux release put together by Fermilab, CERN, and various other labs and universities around the world. Its primary purpose is to reduce duplicated effort of the labs, and to have a common install base for the various experimenters. The base SL distribution is basically Enterprise Linux, recompiled from source. Our main goal for the base distribution is to have everything compatible with Enterprise, with only a few minor additions or changes. An example of of items that were added are Pine, and OpenAFS. Our secondary goal is to allow easy customization for a site, without disturbing the Scientific Linux base. The various labs are able to add their own modifications to their own site areas. By the magic of scripts, and the anaconda installer, each site is to be able to create their own distributions with minimal effort. Or, if a users wishes, they can simply install the base SL release." - OpenSci Info from Bookmarklet
Mike Chelen
Bio-Linux 5.0 — NERC Environmental Bioinformatics Centre - http://nebc.nox.ac.uk/tools...
Bio-Linux 5.0 — NERC Environmental Bioinformatics Centre
Bio-Linux 5.0 — NERC Environmental Bioinformatics Centre
"A dedicated bioinformatics workstation - install it or run it live. Bio-Linux provides more than 500 bioinformatics programs on an Ubuntu Linux base." - Mike Chelen from Bookmarklet
Neat idea- but how much of the 4gb USB stick remains for holding data / analyses: need a bigger stick? - Richard Badge from Nambu
This was one of the first (and probably best) of these distributions (think there was a BioKnoppix at one time?) It's been around at least 6 years. But a software suite is only half the battle. The biologist needs to know how to use the packages, store and interpret the output. Which is why we have bioinformaticians and IT staff. I've never been convinced that "bioinformatics on a stick" is much use to biologists compared with expert advice/support, but I may be wrong. - Neil Saunders
Are these targeted towards biologists, or informaticians? Don't see biologists getting much use from such a distro, but do see computational types making good use - Deepak Singh
Target market is what has always confused me. I'd assume that bioinformaticians are happy to install their own software locally and for biologists with limited tech skills, a live CD doesn't help much. But I'm happy to be proven wrong by success stories. - Neil Saunders
I gave it to one of my students - we'll see what he has to say. - Björn Brembs
Neil: making the software easier can help decreasing repetitive tasks, allowing more efficient use of expert advice and support, which is definitely the most valuable and scarcest resource. newcomers can often manage to boot the OS and start playing with some software, and experienced users can check if their software of choice is included, and save a little time when setting up new machines - Mike Chelen
Deepak: looking at the package list http://nebc.nox.ac.uk/tools... some favorites of both fields stand out, for example a biologist may run a BLAST search regarding a DNA sequence they are studying, while an bioinformatist could develop applications with Bio-Java and Eclipse IDE - Mike Chelen
Richard: data could be stored on a network drive, or a larger flash disk could be used, since there are 8, 16, and 32gb USB sticks available now pretty inexpensively. also, additional USB drives can be plugged in limited only by the number of USB ports on the machine - Mike Chelen
Björn: cool, would love to hear how useful others find it. the software packages can also be installed in current Ubuntu systems by adding their repository http://nebc.nox.ac.uk/tools... - Mike Chelen
Follow on question? Would a VM be equally useful? For example, I use VMs a lot to learn stuff and configure environments. - Deepak Singh
Deepak: yes absolutely! for exactly the reasons you mention, experimentation and reliability. found a VirtualBox VDI image: http://friendfeed.com/bioinf... any more formats such as VMware or EC2 AMI would be great too :) - Mike Chelen
there's a bunch of good EC2 AMI's that I will be highlighting either here or somewhere else soon (from familiar names), but more the merrier - Deepak Singh from IM
@bjorn please do get your student to feed back to NEBC, a long time ago Bio-Linux was my baby and my full time job. It's come a long way since I left it and I'm very happy to see that it's still going. It has its rough edges, and things which could be done better, but out of the box it's a well set up system ready to go. It's already been used as the base for other more focused... more... - Daniel Swan
@Neil with Bio-Linux I can happily say that we turned a few biologists into informaticians, and one into a programmer when I was with the team! Even last week a biologist walked into my office, asked me to help it getting up and running in VM on his laptop so that he could do some work. The Live-CD version was really just a distrubution method, we used to send out a bootable cd-rom that would netinstall a Linux image from our servers. Inefficient at best :) - Daniel Swan
Good to hear. I remember when NEBC were setting up many years ago, Dawn contacted me regarding compilation of Phred/Phrap under Cygwin after I mentioned it on Nodalpoint. The early days of the bioinformatics social network! - Neil Saunders
Deepak: thinking about combining the Bio-Linux packages with some of the standard Ubuntu EC2 AMIs from http://alestic.com/ since they are optimized already, and contain other common tools - Mike Chelen
Mike, that would be brilliant. Lots of our customers ask for starting points in this space and being able to point to something that they might be familiar with would be great. Let me know when you do that. I am thinking about writing up a post on all the available bioinformatics AMI's on AWS - Deepak Singh
Mike - you might want to talk to Tony Travis about this (ajt@rri.sari.ac.uk) he has interests in taking the Bio-Linux base in a more 'cloudy' direction, and I'm sure Dawn and Bela and co. at NEBC would be happy with any feedback along those lines. - Daniel Swan
Deepak, getting the bio-linux packages installed can be done with a bash script http://github.com/mchelen... and maybe used with runurl http://alestic.com/2009... or to generate an AMI - Mike Chelen
Daniel, almost all the packages install okay, are there any particular applications that would be important to test? here is how the desktop looks on ec2: http://ff.im/8fb8j - Mike Chelen
Great idea. Anything to reduce the tedium of wget; ./configure; make; make install is welcome and helps lower the entry barrier for people into the field. - Todd Harris
Todd: it would be nice to start an instance with the least manual input, especially when running a particular application. for example software set up to use a biology AWS dataset http://developer.amazonwebserv... - Mike Chelen
<-- pure biologist, willing to install on top of current Ubuntu to give it a try. Agree with Todd on extra installs. I essentially only use a java-based app to chomp on big files, also Bioconductor on rare occasion. Neil's point about need for expert advice is well-taken, but I find that a biologist, willing to use a linux platform, right away gets more targeted feedback when asking for help. - Heather
Heather: that's great, did you find it easy enough to add the repository? here's a script that can save a little time, it is only a few lines though: http://github.com/mchelen... - Mike Chelen
The ideal way to install the repository would probably be a .deb containing the apt sources and signing key. This is used for example with Ubuntu One https://one.ubuntu.com/support... and PlayDeb http://www.playdeb.net/updates... (expand the instructions). Maybe someone could prepare this given the existing NERC repository info? - Mike Chelen
Deepak: the Bio-Linux image from JCVI http://www.jcvi.org/cms... really looks great, anything that helps this project can benefit other researchers interested in running bioinf software on EC2. maybe a repository mirror within EC2/AWS cloud, or a public AMI, would help too? - Mike Chelen
Mike that is a public AMI, ami id is ami-6953b200 - Deepak Singh from IM
Deepak: found the wiki page now http://sourceforge.net/apps... thanks! - Mike Chelen
anytime - Deepak Singh from IM
does this image include a desktop environment? the screenshot sort of looks like remote x. in most cases that is probably best, it could be nice to have a desktop version as well. thinking about slower internet connections, where some compression (such as freenx) is usually needed for remote desktop - Mike Chelen
Not sure, you'll have to ask Bioinfo ... My guess is no, since I think it's built on a server image, but I could be wrong - Deepak Singh from IM
Deepak: that's good, been trying to find a way to get the bio-linux packages installed on 64bit instance. desktop could be handy for learning and testing - Mike Chelen
OpenSci Info
Re-Engineering.the.Scientific.Journal.Mark.Patterson.OASPA.2009-OpenSci - http://www.mininova.org/tor...
Mike Chelen
Fwd: "Wikipedia for academic research". Post a summary of your research to increase its impact. http://acawiki.org/Home (via http://friendfeed.com/plosone...)
it looks like semantic mediawiki? - Mike Chelen
ah thanks, interesting to see which extensions they are using - Mike Chelen
Noteworthy: so far, 18 summaries of articles in PLoS Biol., 12 in PLoS Med. (None in PLoS ONE). - Jim Till
Perhaps noteworthy: the vast majority of edits in the last thirty days are by two people. It's hard to build critical mass... http://acawiki.org/index... - Andrew Su
the site looks pretty new, sometimes starting fresh produces the best results, although it can be useful to find some existing data to import initially - Mike Chelen
The site launched this week. The PLoS articles were seeded using the Editor Summaries that PLoS Bio and PLoS Med routinely publish - Peter Binfield
Peter: wondering if the data was converted from another format, or does PLoS supply RDF directly? is there any description of the process used? - Mike Chelen
We didnt work with them on this (other than to have some early meetings). I suspect they just copied and pasted... However, it can be extracted from our XML file of course. - Peter Binfield
Mike Chelen
Mike Chelen
Mike Chelen
OpenSci Info
Liked "#OpenSci BitTorrent release collection available through RSS - http://bit.ly/iXWBL" http://ff.im/-8dPSS - http://twitter.com/SladePu...
Mike Chelen
Open Notebook Science – Falcipain-2 Preliminary Results : Nature Precedings - http://precedings.nature.com/documen...
Open Notebook Science – Falcipain-2 Preliminary Results : Nature Precedings
"This talk was presented by Jean-Claude Bradley at the American Chemical Society meeting in Philadelphia on August 20, 2008. An introduction to Open Notebook Science is presented followed by an illustration of how ONS can be used in drug discovery. New data relating to the anti-malarial activity of Ugi products on 2 falcipain-2 docking sites is detailed. The docking calculations were provided by Rajarshi Guha and the enzyme and in vitro assays on Plasmodium falciparum were provided by Phil Rosenthal and Jiri Gut. Most of the syntheses were carried out by Khalid Mirza in the Bradley group." - Mike Chelen from Bookmarklet
Other ways to read this feed:Feed readerFacebook