Ricardo Vidal › Likes

Michael Nielsen
"Scientist" -> "Senior Scientist" and an accepted journal article, what a fantastic Monday ;-) #promotion #thank #you
congratulations! - Rajarshi Guha
Congrats indeed! What does it practically means? More people working for you, or just a raise? - Egon Willighagen
Thanks, and getting more people working for me ... hahaha ... that's a good one *snief*. No ,I fear I just keep working as much as usual (working more seems not really feasible without stopping sleeping). Getting more people in Pharma does not sound very realistic these days ;-) - joergkurtwegner
Congratulations! - Bill Hooker
Great news! Congratulations. - Pawel Szczesny
Thanks, Pawel! - joergkurtwegner
Thanks, Bill! - joergkurtwegner
Congrats, Señor Scientist. - Noel O'Boyle
Congratulations! :) - Ricardo Vidal
Congratulations! - Björn Brembs
Si, Thanks, Noel! - joergkurtwegner
Thanks, Ricardo! - joergkurtwegner
Thanks, Björn! - joergkurtwegner
Congratulations on all fronts! - Kubke
Thanks, Fabiana! - joergkurtwegner
Cameron Neylon
I remain unconvinced that these big data sources are the real challenge. Is the mass of heterogeneous small data that is the issue #jiscmrd
Agreed. Another thing making RDF(a) so interesting: putting machine readable data on the web is becoming very simple (...) - Egon Willighagen
Steve Koch
#scio11 Jason Hoyt (paraphrase): "I think impact factor is the biggest problem now. So whatever alt-metric brings that down quickest should be the focus."
Heather Piwowar
Mr. Gunn
"This site is designed to aggregate news and research on disease biomarkers, including predictive, early detection, diagnostic, prognostic, toxicity and target biomarkers." - Mr. Gunn from Bookmarklet
Khalid Mirza
Glad I am using Mendeley..took a little bit to appreciate it though.
Deepak Singh
Rajarshi Guha
#flot makes sweet graphs
yep. my favourite - Andrew Lang
so, what is #flot? - Egon Willighagen
can it take JSON input? - Egon Willighagen
flot looks cool. Also been meaning to look at http://www.rgraph.net/ - Andrew Su
Cameron Neylon
What would scholarly communications look like if we invented it today? - http://cameronneylon.net/blog...
If we imagine what the specification for building a scholarly communications system would look like there are some fairly obvious things we would want it to enable. Registration of priority, archival, re-use and replication, and filtering. Some of these the current system can do well, some of them not so. Can thinking about how we would design a system from the ground up help us to think about what we can do today to build a better and more effective record? - Cameron Neylon
So (she wonders clambering back onto her soapbox), should this system include an educational component? I imagine for practicing scientists, the answer is likely "no" -- the system should first serve the needs of scientists, and related systems serve the needs of educators. But I also imagine that in use, they will connect, much the way pubmed is used by scientists who never see patients and PAs looking up how to treat some pathology. - Mickey Schafer
I would very much hope so (to me engagement and education are closely linked). It's clearly an area where we could do a lot better. - Cameron Neylon from twhirl
I am currently setting up a demo for a platform of the kind you described: http://www.science3point0.com/coasped... is a wiki that primarily addresses step 1 and provides a basis for step 2 (just remains to be configured). Not so sure what you had in mind exactly with step 3 but I think the watchlist functionality of wikis comes close. There are some important technical hurdles for some... more... - Daniel Mietchen
Daniel, just a brief look, but seeing one of your bugs at the bottom (the title issue) is this something that Samuel L solved in his GSOC project with Egon? I'm wondering whether Semantic Media Wiki (or some of the Simile tools) might also be an interesting lens. Hmmm... - Cameron Neylon
Sally Church
Congratulations Sally! - Martin Fenner
Congrats! Well done! - Anne Bouey
Thanks, folks! Pooped after that effort. Back in the saddle again tonight. - Sally Church
Awesome! 25 miles!! Congratulations :) - Ricardo Vidal
Kaitlin Thaney
this is absolutely brilliant, full of WIN. "four levels of social entrapment" by hyperbole and a half ( http://hyperboleandahalf.blogspot.com/2010...) HT @louwoodley
Yeah, I've found the social expectations about monitoring and responding to be very different among ages, but particularly between American and UK groups. - Mr. Gunn
Shorter Neil: kthxbi: it's not just for text anymore. - Bill Hooker
Wow, Level 1 and 2 describe 90% of the conversations I've had in my life. - Benjamin Tseng
Carl Boettiger
I'm looking for a good solution to upload figures into my open lab notebook. Need fast, many photo, automated / scriptable uploading, permanent hosting, with searching/tags/comments. Trying photobucket: http://openwetware.org/wiki... Thoughts?
I think ideally I would run a script which would run some code which generates a png figure, pushes the photo to the host site under a given album/tags and a link to the version of the code that created the figure on the project's github site. Since the codes take a long time to run, simply having the code version is no longer sufficient for me. - Carl Boettiger
Have you considered the possibility of using Flickr to host the photos? They have an API, are quite inexpensive and I know that there is/was a plugin to add Flickr images easily into OWW (mediawiki). - Ricardo Vidal
Yup, Flickr seems very promising since there's lots of development around it. Do I need to buy a subscription to make sure the uploads are permanent? Think I'll give it a try... - Carl Boettiger
No, all uploads are permanent and I believe you can "display" up to 200 images for free. The last 200 are always visible and nothing is lost. If you pay the $24 USD/year, all images become public/visible again. - Ricardo Vidal
Flickr seems to be a nice solution. With one command-line call I can have a full slideshow of results embedded into the notebook! http://openwetware.org/wiki... Each image can collect comments and other tags and be organized into groups. the command-line upload also doesn't seem to spam my FF/twitter feed even though I added flickr to FF. guess that's a good thing. - Carl Boettiger
Mr. Gunn
Google map of institutional repositories, including the type of repo package they run and a cool deposit growth map. - http://maps.repository66.org/
sounds cool - Egon Willighagen
Rajarshi Guha
ok, time for my talk. "Data driven life sciences   The Pyramids meet the Tower of Babel" #acs_boston
slides on slideshare I presume? - Deepak Singh
Thanks - Deepak Singh
Great talk. Would have loved to be there in person. Two things jump out. One is the n-gram fragment question and the value of sufficiently large data. Second, and correct me if I am wrong, but there seems to be a place for computation over large graphs, or complex key-value stores, where you are querying over common keys and looking for all the values and then aggregating? - Deepak Singh
Hej, you got Swedish in your presentation :) #acs_boston - Egon Willighagen
@egon, how so? - Rajarshi Guha
@deepak, it depends on how those graphs are constructed. If we're just considering small mols, I'm not sure how constructive that would be (individual mol graphs are small) and even if one did go to similarity graphs (ala Bajorath), they are still not very big. As for aggregation - we already do that in various ways, but it doesn't necessarily provide insight, except at a high level.... more... - Rajarshi Guha
@egon, ahh got it. Smorgasbord :) - Rajarshi Guha
Rajarshi, the angle I was coming from was that if you have keys (compounds) with tons of properties (values), and lots of complex relationships (lots of nodes and edges), you could slice and dice them in interesting ways. But based on what you say, that might not really be that insightful - Deepak Singh
Right, just aggregating properties is of limited value - you only get descriptive stats. predictive modeling goes beyond that. Where the open challenge is is making a multi-partite graph (aka integration) of mols, targets, pathways, .... and getting a coherent story oout of it all - Rajarshi Guha
I think we're on the same page then. Should I put my money on you? - Deepak Singh
Andrew Su
Just received notification that the Gene Wiki project (http://en.wikipedia.org/wiki...) received four years of funding from the NIH. Blog post with more details forthcoming...
Congratulations, Andrew! - Daniel Mietchen
That's a great news Andrew. Congratulations! Open science ftw !! - Khader Shameer
Wow, wonderful news! :) - Ricardo Vidal
That's great news! - Mr. Gunn
Great news, congrats! - Walter Jessen from FreshFeed
Great news indeed - congrats! - Lars Juhl Jensen
Congratulations! - Bill Hooker
Wonderful! Well done. Can I start exhorting more genome annotators to head your way? I think I convinced an archaeal person a few weeks ago. Gene Wiki is certainly gaining more attention from the Sanger/EBI crowd. - Paul Gardner
Congratulations to them! - Tyson Key
Second R01, you're on your way... - John Hogenesch
Woohoo! Looks like I get to stay at GNF for a while... - Benjamin Good
Walter Jessen
Writing up a review of a dozen iPhone apps I've found useful for the life sciences. #iPhone #life-sciences
I did that a while back and they've been some of my most popular posts to date. :) - Ricardo Vidal
Yes, I found them yesterday afternoon. I'm linking back to the one you wrote earlier this year. - Walter Jessen
Dick Moore
For a glimpse of what is to come with HTML5, Javascript and possibly chrome take a look at this site here http://www.chromeexperiments.com/ While just because you can does not mean you should, this site does showcase some of what we can expect very soon as a browsing experience. (Not your mums javascript!)
Jonathan Eisen
PZ Myers will reveal his decision on free blogagency on live TV - http://phylogenomics.blogspot.com/2010...
if it weren't so sad it would be funny - Christina Pikas
Clever :) - Jeff Habig
Mr. Gunn
Bora Zivkovic
If you are in bioinformatics (or anything really), and are not associated with a University, what services and databases you'd like to have and use, but they cost money (so you don't, or you pay while cussing and cursing because it's impossible to work without)?
Although slightly rephrasing the question, I would like to be able to pay money to have a version NCBI that didn't make me want to gouge my eyes out with a spoon. - Paul J. Davis
I work at a small biotech company and as Neil said, access to journals is the biggest issue. We are associated with a university so when I'm there I can access journals, however normally from the company building I can't. The delay is frustrating. In terms of commercial tools/databases, currently I would like access to Genelogic gene expression database and Biobase transcription factor database and analysis tools. - Greg Tyrelle
Another small biotech here, and yes, access to journals is a huge pain. - Bill Hooker
I'm currently at a university, but was considering being an independent scholar for year or two (= a homeschooling mom with a research rather than a knitting hobby). Resource problems I anticipated: access to journals, Scopus subscription, Web of Science subscription. - Heather Piwowar
Thank you all - just as I expected: access to journals (and ways to find journals/papers) is the most expensive and most difficult to get thing if one is outside of University. How about space for office (lab?), equipment (poster printer?), software - if you were a researcher at home (freelance scientist) what would you need that costs? - Bora Zivkovic
Depends what kind of research. I'd need a biosafety hood, liquid nitrogen storage, glassware, balances, electrophoresis equipment, culture incubators, autoclave, chemical store... :-) - Bill Hooker
Software is the other bit. Biotech's tend to be a lot more budget constrained than many academic labs. - Deepak Singh
Access to journals and availability of equipment are both big deals, but whereas some equipment can be obtained from ebay or a local lab equipment company, academic literature database subscriptions can't be obtained at a discount rate. (AFAIK) - Mr. Gunn
For journal access, try getting yourself an unpaid appointment as affiliate faculty (aka courtesy appointment) at your local university. I've known a couple of people who have done that. - Donnie Berkholz
Lab space outside of a university or big company is very difficult to find. The so called "incubators" are expensive. It is still unclear to me what the rules are for running labs in commercially-zoned properties. - Jeremy Leipzig
Access to journals is a huge issue. My biggest expense for freelance science (computational biology) is the computing equipment; second would be software. - Walter Jessen
As I'm considering becoming an independent scientist (again!), some time ago I did back-of-the-napkin calculations and it turned out that I might need ca. $100 (reliable internet connection) + $500-1000 (per article payments, no subscriptions) + $500-1000 (computing cloud, storage and calculations) a month to work comfortably outside of academic infrastructure (without spamming all my... more... - Pawel Szczesny
Access to journals is the biggest issue. Obtaining this access by being an adjunct faculty is much more valuable than the salary they pay for your services. Obtaining funds to attend scientific meetings and to cover publication costs isn't far behind. Not always trivial to convince business types there is value in publishing your basic research. - Jeff Habig
Donnie, good to know re: affiliate positions, thank you. - Heather Piwowar
I'd be in Pawel's boat. On the software end, I can probably make do with open source. - Deepak Singh
Lars Juhl Jensen
Downloading just over half of PubMed Central has taken almost a week! Would it be legal to set up a public mirror?
Likely not. For all what I have heard, PubMed is not public domain, let alone the fact that 'public domain' does not work here in Europe quite the way as it does in the US of A. Lars, but why not try? - Egon Willighagen
PubMed != PubMed Central ;-) But from what I read, I would only be able to make a mirror of the OA part of PMC, which is fairly pointless since it is anyway a small subset and therefore not hard to download from the main site. - Lars Juhl Jensen
I know that I'm not allowed to redistribute PubMed since that is clearly stated in the license I signed with NCBI to be able to download the complete dump. - Lars Juhl Jensen
PMC is legally a much less clear situation. On one hand most of the articles are under publisher licenses that probably even forbid me to download them and do text mining on them. On the other hand they are legally available from a public FTP server that anyone has access to without signing license agreement. I am not a lawyer, but from where I stand it would seem that I have thus not agreed to the license terms of the publishers, and that the documents are only protected by copyright. - Lars Juhl Jensen
In any case, normal copyright would forbid redistribution, so the answer to my question is: "No". - Lars Juhl Jensen
@Lars... it isn't...? oops... so, it is the OA subset where, e.g., BMC papers are deposited? - Egon Willighagen
@Egon PMC is the repository where many publishers now place their full-text papers. It is not to be confused with PubMed, the database of abstracts. However, it is important to note that the majority of the papers deposited in PMC are nonetheless not OA. - Lars Juhl Jensen
OK, thanx. Sorry for my confusion! - Egon Willighagen
My understanding is, alas, this: "On one hand most of the articles are under publisher licenses that probably even forbid me to download them and do text mining on them." Or rather, it is the PMC license that doesn't allow you to do a bulk download of them (http://www.ncbi.nlm.nih.gov/pmc...), and publishers that may or may not allow text mining as fair use. OA subset though? have at it :) - Heather Piwowar
Heather, I'm getting increasingly confused. What I'm downloading is what is in ftp://ftp.ncbi.nlm.nih.gov/pub/pmc - the readme.txt file in that directory says nothing about not being allowed to download the content from this anonymous FTP service. In fact that file claims that the directory contains only the OA articles from PMC. Maybe I am only downloading the OA part? How many articles do you have in the OA subset that you downloaded? - Lars Juhl Jensen
Ok, I just did a random sample of journal names from what I've downloaded and all 10 were in DOAJ, so I guess it is only the OA part I'm downloading. Which means that I could make a mirror if I like to. It also means that the OA part is a whole lot larger than I thought it was - which I consider excellent news :-) - Lars Juhl Jensen
@Lars, yes, sorry if you are getting them from the OA FTP server then you are fine on downloading. http://www.ncbi.nlm.nih.gov/pmc... It is the rest of PMC that can't be bulk downloaded. btw, if you just need the text, not images or PDFs, do you know about the much-smaller XML content? - Heather Piwowar
@Lars I'd assume the "OA" subset can be redistributed, since that is supposed to be part of the definition of OA. PMC's official take, though, is to read the agreements of each of the publishers... http://www.ncbi.nlm.nih.gov/pmc... - Heather Piwowar
@Heather, I saw that the web page for the PMC OA part claims that there is a file called articles.tar.gz, but if one looks at the FTP site there is no such file. There are some files called articles.A-B.tar.gz etc., which I suspect are a new split version of it. However, these files are not very good for my purpose: I want to keep my copy up-to-date on a daily basis ideally, and redownloading ~2GB of tarballs each day seems silly. - Lars Juhl Jensen
What I would really wish for is that PMC implements a system like PubMed: one or more baseline files and then a daily update file that contains only the new OA papers that have been added. The present setup is not very well suited for text miners IMHO. - Lars Juhl Jensen
Can't the FTP client skip the files that have already been downloaded? - Mike Chelen
@Mike Yes it can - that is why I'm downloading the version with each article as a separate tarball. However, that version contains all the figures, which I don't need. The version that contains only the text is bundled up in four tarballs that all change every day. So I have the choice between initially downloading hundreds of GB (most of which I don't need) and then have easy subsequent updates, or downloading approximately 2GB every day. - Lars Juhl Jensen
could put together an update script that uses the api? http://eutils.ncbi.nlm.nih.gov/ - Joe Dunckley
Lars: If text is the important thing then it is probably easier to manage 2GB than hundreds of GB of files, even if the XML archives change every day. Maybe it would help to script the download and extraction? The file list ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/file_list.txt also might have newest entries last, but it still points to the articles larger file size version. - Mike Chelen
Joe: It looks like Esearch http://eutils.ncbi.nlm.nih.gov/corehtm... can show recent entries restricted to PMC, wonder if there is any way to limit results to open access subset? - Mike Chelen
Too bad they didn't offer BitTorrent! I have tried to download numerous things over their FTP server without any great success. If you ever get it downloaded you should upload it to BioTorrents. - Morgan Langille
The PMC FTP works ok for me, had the most luck with the ncftpget client because it can resume interrupted downloads fairly quickly. The full articles including images and supplementary materials are literally hundreds of GB and I have never had the patience to download them completely :D Here is an old release of the XML archive though, would be happy to provide an updated copy: http://www.mininova.org/tor... - Mike Chelen
Lars, there must be some way to do this. Check out Hubmed, for example. http://hubmed.hublog.org Then there's GoPubmed and others. If what you're interested in is the abstracts, then certainly what you suggest is possible. - Mr. Gunn
@Mike, the FTP site works fine indeed. It is just slightly impractical to download literally hundreds of GBs, although that is what I'm doing now. I guess I just have to be patient for another week or so. Thanks for the offer of a dump, but what I'm trying to set up is a local copy that is automatically kept up-to-date, so it doesn't really solve my problem. - Lars Juhl Jensen
@Mr. Gunn, the problem is not the abstracts. I already have a PubMed license and I have downloaded all abstracts. The problem is PubMed Central: the full-text version of OA papers, which neither Hubmed nor GoPubMed makes use of. - Lars Juhl Jensen
Gotcha, that makes sense. Let us know how it ends up working for you? - Heather Piwowar
Lars: Why download the full text+image+supp archives if only the XML text is needed? One of these utilities might help increase transfer speeds, been getting 5MB/s with aria (not mentioned) http://fasterdata.es.net/tools... Also could try CurlFTPFS to mount the remote FTP to the local filesystem if the analysis doesn't require reading every complete file. - Mike Chelen
@Mike: The problem is the structure of the PMC FTP site. You have two options: 1) Download four tarballs that in total contain all the XML text (2GB in tota), or 2) download thousands of tarballs that each contain everything related to one paper. If I choose option 1, I have to download these four big tarballs daily although I only want the new text. If I choose option 2, I have the advantage that I can download only the new tarballs each day, but I must download everything (not just the XML). - Lars Juhl Jensen
@Mike: regarding mounting the FTP site as a file system, that would not really help me at all since I would in any case have to untar all the tarballs to get the XML text that I need. The end result would still be that I fetch everything by FTP. - Lars Juhl Jensen
@Heather: I'm ~70% through downloading the OA subset of PMC now, so I should have it all in a few days. My plan is to either weekly or daily download all new document tarballs, extract the .nxml files, and parse the XML into a simple (document ID, paragraph #, sentence #, UTF-8 text) data structure that IMHO is a much better starting point for text miners. I intend to make this available for download, but I have to think through how best to make the daily/weekly updates available (some kind of patch files). - Lars Juhl Jensen
Lars: What about extracting the .tar.gz with the -k flag to skip overwriting files? Then only new .nxml files would be read from the archive and transferred over the network. - Mike Chelen
Here is a bash script to mount the FTP and then extract all of the articles.XX.tar.gz archives while skipping existing files http://gist.github.com/439364 - Mike Chelen
Thanks Mike, but I'm sorry to say that there are two problems with this: 1) since you are running the tar command at your end, you'd still have to download the entire 2GB of articles.XX.tar.gz every day just to get the few new .nxml files; 2) if the .nxml file is later updated on the PMC site you'll be stuck with the old one on your disk if you use -k (can be solved using --keep-newer-files instead of -k). - Lars Juhl Jensen
Just for the record: the initial giant download is still going fine. I have about 75% of the OA subset of PMC on my disk now, which is about 230GB distributed over 155,000 article tarballs. The script for extracting just the XML text files from the tarballs is in place, and we are making good progress on the code for extracting the text from each XML file and splitting it into paragraphs and sentences. - Lars Juhl Jensen
Interesting, wonder why the whole .tar.gz file has to be read if only a few of the files are actually being extracted? Good point about edited files, thought each would only be deposited once. What kind of analysis will done with the XML text? That is a lot of data to deal with, glad to hear another method is working out! - Mike Chelen
Guessing you could have just read the large articles.XX.tar.gz to start with and then done incremental updates via the ftp site (to decrease initial download time), except that as I recall the tarball and the individual ftp files have different naming conventions so it would be messy. If not, that would be an idea. Though too later for you :) - Heather Piwowar
Heather, you are right - it would save a big initial download, but the differences in naming conventions would make it hard to implement. Also, I cannot rule out that I will use more than the XML text files later on, so I might as well get it all from the start :-) - Lars Juhl Jensen
Mike, the first step (after parsing the XML text files into paragraphs and sentences) will be to run all of it through Reflect to do named entity recognition. This will enable us 1) to easily search the literature for abstracts or papers that mention a given gene, protein, or small molecule, 2) to make literature-based gene set enrichment analyses, and 3) to construct improved cooccurence networks (which we plan to use in the STRING and STITCH databases). - Lars Juhl Jensen
Download completed. Now I just need to extract all the .nxml text files and the fun begins :) - Lars Juhl Jensen
Lars, do you have the full text papers as plain text, or something that comes close? - Egon Willighagen
Abhishek Tiwari
txt2re: headache relief for programmers :: regular expression generator via @CameronNeylon - http://txt2re.com/
txt2re: headache relief for programmers :: regular expression generator via @CameronNeylon
very useful tool! - lanfeng
Victor / Mendeley Team
Dr. Arthur Lesk has agreed to serve on my committee http://www.bmb.psu.edu/faculty...
That's great! - Ricardo Vidal from iPhone
Lars Juhl Jensen
Downloading Medline, OMIM, PubMed Central, Wikipedia, and more to prepare for some serious text mining
That sounds like some super serious text mining! :D Let us in on the juicy bits once you're done. - Ricardo Vidal
Chris Miller
A Programming Langauge for Genetic Engineering of Living Cells - Microsoft Research - http://research.microsoft.com/en-us...
We introduce such a programming language, which allows logical interactions between potentially undetermined proteins and genes to be expressed in a modular manner. Programs can be translated by a compiler into sequences of standard biological parts, a process which relies on logic programming and prototype databases that contain known biological parts and protein interactions. Programs can also be translated to reactions, allowing simulations to be carried out. - Chris Miller from Bookmarklet
Love how there's a typo in the title. Let's call it a "snip"-up :) - Ricardo Vidal from iPhone
kate alma jamie chase Lynette 18, F Manhattan - Aamrit Ghimire
Andrew Su
Of course you can count me in! - Mr. Gunn
Count me in too. - Ricardo Vidal
+1 here. - Khader Shameer
Andrew Su
BioGPS powered by CouchDB now, time to relax. - http://biogps.blogspot.com/2010...
BioGPS powered by CouchDB now, time to relax.
Great work. I stumbled upon BioGPS the other day when Mr.Gunn pointed me in that direction. I really like what you have been doing. - Ricardo Vidal
Thank you Ricardo (and Mr.Gunn!)... - Andrew Su
Yay CouchDB in bioinformatics! - Paul J. Davis
Steve Koch
Just sent my saliva to 23andme ... is going to be so cool browsing my genotype!
saliva for 23 and me.JPG
Congrats! So it's now just a matter of time until you have your SNPs loaded into R. - Steve Koch
yay! we just sent ours in 3 days ago. - Deniz Eda Goze
my mom wants to do this for ancestry's sake -- we have a rather tangled family history! - Mickey Schafer
As far as I know, German law requires everyone who have had their genome analyzed to forward all that information to their health insurance. What is the situation elsewhere? - Daniel Mietchen
Other ways to read this feed:Feed readerFacebook