Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »

Matt Wood › Comments

Matt Wood
All new http://www.sanger.ac.uk, including links to databases (http://www.sanger.ac.uk/resourc...), and a short welcome video (http://www.sanger.ac.uk/about...).
Looks better, but unfortunately they've broken loads of links to old press releases... - Duncan Hull
True true, that's a shame: I'll forward this on to the web team. Drop a line to webmaster@sanger.ac.uk if you spot anything else. - Matt Wood
Press release and other links should now be up and running - great fast response from the awesome Sanger web team. - Matt Wood
Matt Wood
I received a similar take down notice when working on a set of microformats for biology: http://dret.typepad.com/dretblo.... The disservice is all theirs.
So who sends out these take downs? - Deepak Singh
I received a bunch about my old bioformats project, asking why they were designed outside of the sanctioned 'process', that they weren't really microformats, who did I think I was etc etc. Pretty blunt. I understand the need for some of it, but the bureaucracy around some semantic web projects is stifling. - Matt Wood
Neil Saunders
Question for bioinformaticians and computational biologists
How many of you have a professional programming qualification? Either from an educational institution (e.g. a CS degree) or something like Sun Java certification? Is it even possible, useful or desirable to get "recognised" qualifications in other programming languages? Or are most of us self-taught? - Neil Saunders
Useful for what? For landing a job with an established software company, a CS degree seems like a good idea. For a job at an outsourcing company in India, certification is a must. But if it's actual programming skills you're after, nothing beats practice (even if it's just with an open source hobby project). Ideally you get to work with people who have more experience than you do, so you're not just "self-taught", but also "group-taught". - Eric Jain
Useful for bioinformaticians and computational biologists. Just interested to know if this has affected career development for anyone; either within those subject areas, or if they've moved out into other jobs; e.g. people who've left academic life science research to become software developers. - Neil Saunders
I would think in most cases, commercial and academic, a good track record with some awesome projects would trump certification. FWIW, we don't have many people with certs, but Masters and PhDs are still common. - Matt Wood
Let me put it this way: You don't want to work for a place that filters candidates for a software development position based on their formal qualifications, rather than their experience. For what it's worth: I have some formal qualifications for being allowed near computers, but as far as I can tell that was never a factor (both in and outside of academia) for being invited to an interview or hired. Arguably that's a rather small and biased sample set, but there you go. - Eric Jain
I got RHEL certified at one point, but it was only because work paid for it to happen. But I'm a 'biologist turned informatician' and therefore have no qualifications in any CS related field - just experience! - Daniel Swan
I am self taught, no professional qualification. I thought of getting certifications but in the end decided not to, as I thought it wouldn't be useful getting them. - Paulo Nuin
I have a double bachelor's degree in CS and Bio. I felt that it helped me get into a comp bio graduate school program, at any rate. As for certifications, I personally see them as a waste of time, as most biologists in academia couldn't care less how you munge their data, as long as you do it quickly and efficiently. I'd be interested to hear if things are different in industry. - Chris Miller
The only reason I know some people got certified was because it helped them focus down and learn something they wanted to. I haven't been in any situation during a hiring decision where certification comes into play. I'd rather be pointed to a website someone has developed or some code that's on sourceforge - Deepak Singh
I've never heard of anyone asking for professional bioinformatics certification. Publish, show your previous work, yes. But certification? Never. - Andreas Matern from Alert Thingy
I have a bioinformatics masters degree, which was taken post-PhD. As part of that I was taught Java, but very much in a 'Java for Bioinformatics' style. Maybe my approach to projects earlier in my career would have benefited from some software engineering training, but you pick that stuff up as you go along :) - Simon Cockell
I have no certification, just a master's in Bioinformatics where I was taught Java, Linux and R. I taught myself Ruby. I've got no interest in certification and I think Chris Wansworth's short essay is a good guide to follow - https://gist.github.com/0a2655a... . I think this echoes the same sentiments expressed above. - Michael Barton
Matt Wood
Towards a science data platform #1: easy, flexible retrieval and reuse above all else. #scidata
Why do I see a website in the near future :) - Deepak Singh
Surely easy deposition has to come first though? - Cameron Neylon
Perhaps. Ideally all reads and writes would be created equal, but in reality a reasonable amount of heavy lifting is required at one end or the other. Given that data is usually written once and retrieved many times, I wonder if it's easier for those already generating and working with the information to jump through some deposition hoops once, rather than everyone being forced to do it at retrieval, time after time. - Matt Wood
Agreed that access needs to be easy for users but put barriers in front of depositors and you will only get specific types of data (mostly big and well funded). But I don't know how to square the circle from easy to deposit blob to usefully described blob on a service. - Cameron Neylon
Cameron, the barriers in front of depositors are cultural or systemic. The barriers in front of retrievers are often technical. But yes, we need to address both problems, but technical challenges of retrieval are real, since we aren't retrieving small data sets any more - Deepak Singh
The most enthusiastic depositors are those whose peer-reviewed publication is tied to deposition. Get key journals to require deposition of data + metadata prior to publication, under a Public Domain or Attribution-required license, and the rest will follow. - Andrew Perry
Deepak, absolutely agree - just was uncomfortable with "above all else". Tell you what, I'll let you and @mza get on with the technical challenges while I worry about the social ones. Division of labour and all that. @Andrew - this is absolutely true, but journals will not do this (and I agree with the logic on this) until they get a very strong steer from the community that this is... more... - Cameron Neylon
@cameronneylon it's ridiculous to think you can separate the technical challenges from the social ones, one cannot be understood (and solved!) without the other. Trying to tackle the technical challenges without solving the social ones is like building a car without having the blue print. Trying to tackle the social challenges without the technical ones is like building a rocket ship for the year 2150 and hoping someone will somehow magically solve the technical issues tomorrow. - Alexander Griekspoor
Alexander, you won't get any argument from me on that, but Matt did start this manifesto with "there are no technical reasons...why an open data platform for science couldn't excel". In my view the technical problems are largely soluble with clear pathways for development, and I don't have the detailed knowledge to make a big contribution at the coalface. The social problems are much larger and require a more multipronged attack - which is directed by but not defined by the current technical capability. - Cameron Neylon
What are the "social" issues? I'm confused by the use of the word in this context. Are we talking about cultural changes, such as acceptance of a more "open data" world, coercing people into using public repositories and so on? Or is this social as in social network? If the latter, I don't see the relevance to what Matt is discussing. - Neil Saunders
Maybe cultural issues is better. But basically the fact that we have an entire social edifice built around control and secrecy driven by the need to publish. Fundamentally the problem that we need to rebuild the reward systems so that people actively take advantage of the potential of available technology. So yes, cultural rather than social perhaps, but I do think social networks or... more... - Cameron Neylon
Jan Aerts
Could we have a list of google wave names somewhere so that I can figure out what Life-Scientists are on there?
toddwharris@googlewave.com - Todd Harris
just added you to Life Scientists wave - Cameron Neylon from twhirl
A Google Wave might be appropriate for tracking such a group. - Shiran Pasternak
shiranpasternak at the same domain. - Shiran Pasternak
search "with:public research" and you will find the "Research collaborations in Wave" wave endre.sebestyen@googlewave.com - Endre Sebestyen
yann.abraham@googlewave.com - Yann Abraham
Searching Google Wave with "tag:the-life-scientists" will get you to "Research collaborations in Wave", a good starting point for life scientists. - Martin Fenner
b.brembs@googlewave.com - Björn Brembs
mndoci - Deepak Singh
I also missed out on the invites .. if anyone knows someone .. :) - Pedro Beltrao
I'm firstname.lastname - Ruchira S. Datta
firstname.lastname here too - Andrew Clegg
I don't get how you search in public waves. I've tried searching for tag:the-life-scientists and it gets no hits -- I think it's just searching my own waves - Andrew Clegg
there was a thread by Kol about wave usernames couldn't find the link - ffcode
Aha -- with:public . They really should include a button for that - Andrew Clegg
anna.k.croft - Anna Croft
thanks Kol - ffcode
attilacsordas - Attila Csordas
I never got invited to the party :( - Lars Juhl Jensen
AndrewJamesPerry - Andrew Perry
avijitguharoy@googlewave.com - A Roy
allyson.lurena@googlewave.com - Allyson Lister
@Endre: You can link to waves, eg: https://wave.google.com/wave... - Nick Lothian
mbembee@googlewave.com - embee
000.cacarr@googlewave.com - Christopher A Carr
cassjohnston - Cass Johnston
plindenbaum - Pierre Lindenbaum
firstname.lastname också - Egon Willighagen
diegomorelli76@googlevave.com - diego morelli
abhishek.twr@googlevave.com - Abhishek Tiwari
Would like to be added to life scientists wave, please! david.rothman@googlewave.com - David Rothman (☤)
@David: done - Pierre Lindenbaum
Many thanks, Pierre. :) - David Rothman (☤)
churchsg - Sally Church
life scientists wave: me too, thx - Attila Csordas
Don't know if I actually put myself here :-) jan.aerts@googlewave.com - Jan Aerts from email
Count me in: matt.j.wood - Matt Wood
jeanclaude.bradley at googlewave dot com - Jean-Claude Bradley
chrisamiller@googlewave.com - Chris Miller
Matt: added. - Jan Aerts from email
inspiring2designllc@googlewave.com - Justin H. Johnson
somebody please add me too attilacsordasat... - Attila Csordas
Attila: can't seem to find attilacsordas@googlewave.com. - Shiran Pasternak
it does exist I can tell ya :) - Attila Csordas
@Attila: tried a different way... yer in. - Shiran Pasternak
Done. - Jan Aerts from email
An undergraduate student in our lab, Caleb, just got his wave invite. I told him to look at this thread for possible people to connect with. - Steve Koch
+1 skhadar@googlewave.com - Khader Shameer
got it, thanks - Attila Csordas
murvine - thanks! - Christopher Murvine
Afternoon all. I've written my first robot, which hopefully will embed an interactive mass spectrum into a blip whenever a UniProt name is encountered in the text, and corresponding mass spec data is found for this protein. I say "hopefully", as I've not been able to test it for real, as, alas, I have no account. When are the next batches released? If it's not for ages, does anyone fancy testing it anyway? - Neil Swainston
Am now Waving as ben.blackburne. - Ben Blackburne
Now waving as lars.juhl.jensen - Lars Juhl Jensen
waving too: sciphu@googlewave.com - Nils Reinton
waving as fgibson - Frank
aemonten@googlewave.com. How can I get started with the life scientists wave? - Alejandro Montenegro
my wave ID: macmanes@googlewave.com - Matthew MacManes
dan.swan@googlewave.com searching is sloooooow trying to find the life scientists wave right now! - Daniel Swan
chris.lasher@googlewave.com - Chris Lasher
sjcockell@googlewave.com - Simon Cockell
jwhabig@googlewave.com - Jeff Habig - Jeff Habig
comprendia[at]googlewave[dot]com Mary Canady - Mary Canady
rebeldad@googlewave.com Brian Reid (PR guy for life science types ... I promise to behave) - Brian Reid
bronxman - Jack H. Pincus
I'm trying to get a Solexa/Illumina Sequencing Wave going. Search "with:public Illumina" or add me macmanes@googlewave.com - Matthew MacManes
waving as georgkam - george
For some reason, I'm stevekoch3 but glad to finally have a preview account! - Steve Koch
And I found the research collaborations wave by the following search: "with:public tag:the-life-scientists" - Steve Koch
mine is pedrobeltrao@googlewave.com (thanks to Mr Gunn) - Pedro Beltrao
In case there is still someone with spare invitations: piotr.byzia at gmail.com - Piotr Byzia
tom.sante@googlewave.com - Tom
Im steelgraham7@googlewave.com - Graham Steel
sametstalker@gmail.com - Samet Güngören
michael.kuhn - Michael Kuhn
Requesting for life scientists wave, abhishek.twr@googlevave.com thanks in advance - Abhishek Tiwari
The usual, neilfws. - Neil Saunders
now that I am finally on board: danjurczak@googlewave.com - Daniel Jurczak
i'm mightyfib(at)gogglewave(dot)com Your most welcome to add me...:O) - Jeannette Høvring
mine is dave.lunt - Dave Lunt
michael.chelen@googlewave.com - seems to use the google contacts system like gmail - Mike Chelen
pengwen.not.penguin@googlewave.com - Parvez Halim
mstalnos@googlewave.com - TRsdr
I'm on (thanks to Steve Koch!): tom.tullius@googlewave.com - Tom Tullius
Matt Wood
Coat.new(:color => [ :red, :yellow, :green, :brown, :scarlet, :black, :peach, :ruby, :olive, :violet, :fawn, :lilac, :blue]) #songsincode
Not sure you got all the different colors! I guess I can forgive you, though, as it would probably take more than 140 characters :) - Allyson Lister
You're absolutely right. I missed out gold and chocolate and mauve and cream and crimson and silver and rose and azure and lemon and russet and grey and purple and white and pink and orange. They all inherit from super. ; ) - Matt Wood
They all inherit from super? Ah, Fridays :D - Allyson Lister
Michael Barton
I'd like images of the twenty standard amino acids in encapsulated postscript, but I can't find them on the web. Is it feasible for me to try and draw them in some chemistry program? Can anyone recommend where I might find these images?
If you can get an SD file containing the amino acids, I can give you the collection of EPS files using software my company has created for doing this kind of batch structure image processing. Also produces SVG and SWF. - Rich Apodaca
This is a coincidence, I was just reading your blog post about this and I just emailed you. - Michael Barton
Jerome Pansanel has 3D versions... with the CDK you could save those as 2D SDF... - Egon Willighagen
Pardon my ignorance but what is the SD file type? I can get mol formats from chemspider if this is good enough? - Michael Barton
I have them from my own thesis, but only in PNG. If they would be useful, just let me know. - Matt Wood
another solution: get the amino acids from PubChem, dump the SDFs into mol2ps. http://merian.pch.univie.ac.at/~nhaide... - Michael Kuhn
Thanks for the tip, but I tried mol2ps but I couldn't get it compile on OSX. - Michael Barton
So far I've gone down the route of PNG -> JPG -> PS but the resolution is really poor. - Michael Barton
SD file is in one form just a concatenation of molfiles joined by a "$$$$" line. - Rich Apodaca
I have mol2ps running on STITCH. I had wrapped it in mol2png, but I quickly exposed the ps file as well: if you go to http://stitch.embl.de/images... (replace 1 by your favorite PubChem compound id) then you'll get a postscript image. Please ask for only one structure at a time, since I get the SDF from PubChem - Michael Kuhn
@Rich I put the combined mol file as well as a tgz of individual mol files at http://drop.io/amino_acid . If you have the time to produce the resulting images file that would be fantastic. - Michael Barton
@Michael I tried using the link but I couldn't view the resulting ps output. :S - Michael Barton
What do you get? If it's a white page, look in the lower left for the image. Works fine for me with Firefox and the Firefox PDF plugin - Michael Kuhn
@Michael, just sent you the set of EPS files. I had to make some minor changes to the SD file, but was able to download it OK in FF 3/Linux. Feel free to redistribute the result. It's also very easy to change the look of the output (colors, line widths, atom label sizes, etc.). - Rich Apodaca
Neil Saunders
Lucky people with Google Wave previews - do you see it in any way as a potential FriendFeed replacement?
Not in time obviously - but a lot of the functionality could be similar. The trouble is there is a fundamental break between the "participant list" functionality of both Wave and Facebook (i.e. people have to add other people to the wave) and the public participation in Friendfeed which I think is at the core of what has worked well for the research community here. - Cameron Neylon
There are groups in Wave which allow larger collections of people to follow updates, and you can certainly import Twitter/RSS into a Wave. However, the Google Wave client app wouldn't make a good replacement as it stands: the activation energy for getting started and staying up to date is much higher then Friendfeed. - Matt Wood
No. It's unbelievably complex and non-user-friendly. See http://is.gd/283ph - Tom Morris
I agree the client isn't easy to use but that is not really the point in my view. It might be possible to build something that looked a bit like friendfeeed using the embed API and a wave server with OpenID authentication but I just think it wouldn't behave the same way in important ways. But the client isn't the protocol or the framework. - Cameron Neylon
I agree with Cameron. It might well be possible to build something on top of the Wave APIs, however, from what I can see, there is a fair amount of 'magic' in Friendfeed from both the smooth user experience perspective, and the technical implementation. Either way, the Friendfeed public timeline would be very useful as a seed to provide context. - Matt Wood
no, unlikely. More general adoption of pusubhubbub might lead to decentralization of commenting on items of interest, which would render the need for a hub like ff obsolete. - Ian Mulvany
I think Wave is more about collaboration than aggregation. - Euan
Agree with Ian, PubSubHubHub may provide a solution. And definitely agree with Euan - Wave is about (controlled ) collaboration, FF is more about uncontrolled commenting on aggregation. - Cameron Neylon
Just saw something about iGoogle making an FF like interface.http://www.readwriteweb.com/archive... - Joe
It could be - but it's not ready yet. Give us devs time to write robots, etc ! - Ahsan Ali aka. Slick
Matt Wood
Beautiful Data just arrived in paperback: http://twitpic.com/dlr9g. All proceeds to Creative Commons and the Sunlight Foundation.
Beautiful Data just arrived in paperback: http://twitpic.com/dlr9g. All proceeds to Creative Commons and the Sunlight Foundation.
Congrats to the other authors too. Great project to be involved in. - Matt Wood
Alexander Griekspoor
NEWSFLASH - Very happy to announce that Charles Parnot has joined Mekentosj to work on a super secret new project. Say hi to @cparnot!
All those supersecret projects :) way cool - Deepak Singh
Can't wait to start showing folks what we're up to. Very exciting. - Matt Wood
Matt Wood
Slides from yesterday's ActiveResearch keynote: @arfon on community science, @galaxyzoo and Rails: http://www.slideshare.net/arfon...
Very nice. I should point out though that a "most emailed" story at BBC News is not a good measure of impact. I read a post from someone who pushed a story into that list by emailing himself only 5 times :-) - Neil Saunders
Good point, Neil - although I think the 220k users probably make up for the inaccuracy. ; ) - Matt Wood
Unless it's a robot. - Maxine
Is there a website for papyrus zoo? - Andrew Lang
Allyson Lister
A great few days: have met FF / twitter peeps Matt Wood, attilacsordas at BioSysBio and at the Eagle; and have met new ones too! :)
Great to meet you, Allyson. Hopefully catch up with you and the SBML hackathon on Friday. - Matt Wood
@Matt - sounds great! Great to meet you too :) - Allyson Lister
Matt Wood
News! I've joined Mekentosj! http://greenisgood.co.uk/green...
So this means you're going to be BLASTing papers against one another? Pioneering the field of literanomics? - Mr. Gunn
Congrats. Not what I expected at all! I hope rails will be playing a large role. - Neil Saunders
Or perhaps ranking papers by their citations to/from the literature to suggest further reading? Consider that a wishlist item, btw :-) - Chris Cotsapas
@Chris, that's a major feature I'm pushing for in @Mendeley, as well. Last.fm-style recommendations are coming first, since that's their background, but "times cited" will definitely be an input into that. - Mr. Gunn
wooah! Very very interesting... - Cameron Neylon
No, no, no! We don't need any more pointless recommendation engines, iphone apps, or other trivia. Just get the core software working properly, e.g. please do something about the fact that not every document on my hard disk is appropriately classified as a "paper". - Matt Leifer
All great feedback - thanks, folks! We'll definitely have more to talk about in a few months. ; ) - Matt Wood
I'm would have thought that not recognizing PDFs has more to do with poor/absent metadata than recognition. I suppose you could assume large font text is the title and search on it (if text PDF) but the rest is largely publisher-side isn't it? - Chris Cotsapas from Nambu
@Chris Parsing PDFs is a fraught and error prone process: it's really a shame that they were adopted so widely. But they're here now, and so there's plenty of opportunity to add value! - Matt Wood
That was my understanding. There's not much you can do with an unannotated image-based file... - Chris Cotsapas from Nambu
I don't know much about parsing PDFs myself, but there;s definitely a different between the old ones, which look like scanned copies of pages, and the new ones which were converted to PDF from whatever other format they were in. - Mr. Gunn
Some publishers are starting to add better metadata to their PDFs, but it's still an uphill struggle across the board. - Matt Wood
Fortunately XMP labelling of PDF articles is becoming more common: http://blogs.nature.com/wp... - Martin Fenner
Martin Fenner
Green is Good : software, science, etc - http://www.greenisgood.co.uk/green...
"ActiveResearch is a great opportunity to meet and greet others working with Ruby and Rails in a scientific or technology discipline." - Martin Fenner from Bookmarklet
Stuff like this makes me wish I was getting into bioinformatics about now, rather than 10 years ago. You lucky young people :-) - Neil Saunders
Completely agree. Darn it - Deepak Singh
There's plenty of room at the back for you old timers. ; ) - Matt Wood
Matt Wood
Heading to RailsConf? Join me for ActiveResearch, showcasing science rolling on Ruby and Rails! http://activeresearch.org
Great! Next year in Europe? No way I can get funding to come to Las Vegas, but might be possible if in Europe... - Jan Aerts
Sure thing. It would be good to pull together a few events this year, some live, some on the web. The biohackathon was a big inspiration! - Matt Wood
Matt Wood
News! This is my last week at Sanger. I'm leaving to pursue some very exciting new projects in 2009.
!! News indeed. Keep us posted. - Neil Saunders
It was a teaser, not news :) Will we see official announcement in a week, or maybe the projects are secret? - Pawel Szczesny
Thanks folks - more details on new stuff coming soon. : ) - Matt Wood
Matt ... since I have some idea, can't wait for you to reveal all. Now, you owe me a case study *grins* - Deepak Singh
Wow! Looking forward to details. - Chris Lasher
Richard Akerman
having interesting exchange with @guistini - how much CPU would it take to do a basic index of the full scholarly lit (full-text articles)?
That is, let's assume for sake of argument we have in a single place the entire full-text of every scholarly article (in English, for sake of argument) back to the first journal (the Royal Society?) How big is that? 60 million articles? (I'm pulling a number out of the air.) How much CPU time to do a basic Lucene index, and how much time to do a full semantic index? - Richard Akerman
The second question is "how much CPU do you need to serve up the search index", to which I would think "no more than Google currently has" would set the ceiling. Not sure what the floor would be. - Richard Akerman
Building an index of the full Medline corpus (which is just abstracts) takes a little over 2 days on a quad core, high memory (32Gb) machine, as I recall. I used MG4J over Lucene though: cutting edge compression and indexing. - Matt Wood
William Hayes told me it was about 250 CPU-hours (an admittedly hazy term) to build a semantic index using MEDLINE (18 million abstracts) and the small true open access bits of PMC (about 50k articles) - Richard Akerman
not enough to make the CPU/server requirements a serious barrier to just doing it would be my answer. 250 CPU hours/couple of days on a fast machine feels about right to me. Wouldn't have thought serving it would be terribly hard, probably less than a terabyte of total data if its just the text, and presumably a reasonably parallelizable process? - Cameron Neylon
I like where this thread is going. - Mr. Gunn
I think it may be an interesting point to make (perhaps an article?) to show just how (relatively) little compute power and storage it would take - achievable with Amazon S3/EC2 - to store and index the entire scientific (journal article full-text) literature back to the Royal Society days. - Richard Akerman
@Cameron we're already into multi-TB with just the PMC complete set (about a million articles) - it will depend on the storage formats being used - Richard Akerman
@Richard - yes obviously the format is important in that statement. Was thinking afterwards that I was probably underestimating the amount of text and other gubbins you'd need to store. - Cameron Neylon
hi gents....i think the corpus of scholarly literature in the 21st century is upwards of 700-800 million articles. Could even be a billion documents when you consider articles in the deep web. In other words, massive. Dean Giustini - Dean Giustini
But this brings up the issue that Nicholas Baker (who is otherwise a jerk) and critics of Google put forth--that such initiatives tend to damage the documents that are scanned in such projects and that such projects tend to lead to the physical disposal of the original materials. It is not just a matter of computing power. - Hope Leman
@Dean I think you need to scope "scholarly literature". I can't imagine the journal literature as we understand it (in English at least) is more than about 100 million articles - I'm pretty sure my organisation (and OhioLink, Scholars Portal) and others have a substantial portion (30%?) of it already spinning on local hard drives. I guess it's a set of questions: 1) how big is it if you get all the *currently available* (digital) journal lit in one place 2) how big if you digitise everything else... - Richard Akerman
... 3) how big if you add in books 4) how big if you add in "everything scholarly" in text 5) how big if you add in data (I know for this last item once you add data we get into the petabytes *very* quickly) - Richard Akerman
The main bottleneck is not so much "basic" indexing (which can often be done almost as fast as the data can be obtained) but all the additional processing required to make the index useful (smart tokenization, stemming, synonym substitution, detecting duplicates, pageranking etc). Going beyond 100M documents or 1TB of data a distributed approach seems like a must, see e.g. http://queue.acm.org/detail... which discusses hardware requirements for Nutch (based on Lucene). - Eric Jain
@Eric great pointer - will have to think about how much Nutch experience (which was mostly for Internet Archive?) would map to indexing the scholarly literature. - Richard Akerman
Matt Wood
Confirmed: I'll be speaking at RailsConf! 'Orchestrating the Cloud': using Cloud approaches and Ruby to get stuff done: http://en.oreilly.com/rails20...
"You lucky, lucky <beep>" (hanging upside down in a cell) - Jan Aerts
Why thank you - looking forward to it. : ) - Matt Wood
Neil Saunders
Last day of being a postdoc! Then 2 weeks "leave" - one day, I'll take leave for rest/vacation, rather than to make time for more tasks...
Have a great final day! - Matt Wood
Onwards and upwards! - Chris Cotsapas
Nina Jansen
I will start working on Monday :-)
Congrats! Where will you be working? - Matt Wood
Matt Wood
Awesome! Proposal accepted for RailsConf 2009: "Orchestrating the Cloud", a guide to cloud approaches with Ruby and Rails.
In case you missed it, RailsConf accepted _everyone's_ proposal today! An accident, apparently. The 'testing' jokes rang loud and clear on Twitter: http://search.twitter.com/search... - Matt Wood
Ah darn. Of course in this case, an inevitability - Deepak Singh
Andrew Clegg
Here's one for the programming fans. I'm good at Java & competent at Perl. I'm itching to learn a new language that's less wordy than Java and less easy than Perl to write bad code with. I'm thinking Python, Ruby, or maybe Groovy or Scala to leverage my Java. Suggest a language and persuade me :-)
Python. Easy of learning, fast prototyping, lots of free packages available, "endorsed" by Google... and, one of the most important: it's a pleasure to write python code. And, if you need to do math with it, good package: http://www.sagemath.org/ - Arnaldo M Pereira
Any experience of Jython? I like the idea of being able to use all the Java libraries as well... - Andrew Clegg
Ruby. Pure obect-orientation, elegant idioms, powerful meta-programming. And just plain fun. - Louis Simoneau
Never heard of it, until now. The idea seems weird to me.. - Arnaldo M Pereira
If you've spent 6 months learning a specific Java toolkit (in my case Apache CXF for web services) the idea of being able to keep that investment seems appealing. But only if the implementation's good... - Andrew Clegg
Groovy would be the easiest one to pick up given your Java experience. It's much more concise than Java but it's easy to leverage the Java APIs if you need them. It'll definitely give you the most seamless fit with Java. - Tom Walsh
Python and Ruby are broadly similar in ease of use and elegance. In general I think Python has a stronger base for the sciences, while Rudy is a little more focused on the web. For a while it looked as if Ruby was going to overtake Python, but recently Python has gained traction and is moving up the popularity ladder fairly quickly. There's a significant advantage in using a popular language; better support and less reinventing the wheel. - Ian York
Ruby: because whitespace shouldn't matter :) - Chris Miller
+1 What Chris said, I hate the idea of significant whitespace in a language, and it's always put me off learning Python I'm afraid :( - Daniel Swan
You can't go wrong with ruby or python. Personally, for some of the reasons listed, I prefer Ruby - Deepak Singh
Python. And I don't need to persuade you, the language does. You wouldn't go wrong with Ruby too. - Paulo Nuin
In all seriousness, if you're familiar with Perl, Ruby won't be much of a stretch. In a lot of fundamental ways, it's like perl, sans the ugly syntax that gets in your way, plus real OO. - Chris Miller
+1 ruby. Although you won't do wrong with python either. - Jan Aerts
''C'' . yes I know, in this world of modern 'frameworks', OOP, scripting, web applications it's a little bit provocative. But 1) I've got the feeling that less and less people know C and learning it can be an asset. 2) the BLAST algorithm was written in C, and BLAT, and... 3) managing the memory yourself can be a challenge. Oh ? did you say less wordy than java ?... hum ;-) - Pierre Lindenbaum
You can write very bad code with ANY language. Of course, with Java/C# code is wordy so you think a little more before writing too much crap - but that's it. - General Kafka
for something mind transforming, learn a functional programming language, such as ML. - General Kafka
Instead of a new language entirely, you could learn some new facets of development: additional skills in testing, automation, patterns and anti-patterns are useful in any language. - Matt Wood
Surprised no one has mentioned C or Assembly. Once you have a good foundation on exactly what a language is abstracting you'll find that learning new languages is trivial. Following Matt's guidance, learning basics of things like file systems and network protocols is a good idea. - Paul J. Davis
Oh, missed Pierre's comment. But I think we agree on the why. - Paul J. Davis
+ 1 ruby. May be R if you want to do a lot of statistics :) ! There was a poll about *Which computer language are you most interested in learning (next) for bioinformatics R&D?* in Bioinformatics.org http://www.bioinformatics.org/poll... - Khader Shameer
You all are going to hate it when we start sharing all the LabVIEW code we're writing :) - Steve Koch
Interesting that Ruby and Python are about half and half here but Python's waaay ahead in that poll. Re. C, dabbled ages ago, but it's not where I'm at now really. And too wordy :-) @Kafka yeah I was thinking about that, hence mentioning Scala. But you can do functional in Python and Ruby too right? - Andrew Clegg
Doesn't it depend why you're learning it? I think if you already know Perl, Python or Ruby shouldn't be much of a leap, so personally, I would only bother if it was needed for a specific project. As suggested, maybe C or network protocols if you want to get a better grasp of computing. R / Bioconductor looks good on a bioinformatics CV and is genuinely handy to know. Not a new language, but have you had a look at Moose.pm? Takes some of the *urgh* out of OO Perl. - Cass Johnston
Groovy, Ruby and Python all support some functional programming idioms; Groovy's FP goodness is one of the things that makes it much more palatable than Java for those of us who prefer dynamic languages. If you really want to experience FP then Scala looks like the one to go for. Clojure (Lisp on the JVM) is also worth a look and there's always Haskell if you're willing to leave the Java world completely. - Tom Walsh
+1 for Cass's recommendation of Moose for Perl. - Tom Walsh
If anyone else in interested in Moose, there's a London.pm techtalk this month: http://londonpmtech.appspot.com/ - Cass Johnston
@Cass - London.pm techtalk looks v. cool and if I lived about 600 miles closer to London I'd definitely be there. Would be nice if the slides from the talks are online at some point. - Tom Walsh
@Tom Walsh: You on the london.pm mailing list? If not, I'll give you a yell when they post the slides. - Cass Johnston
@Cass Thanks for that Moose link. I might go along. One of my workmates uses it extensively and it's about time I learnt about it. - Andrew Clegg
Python is definitely my vote, but like Cass said above it will be very dependent on what type of project you want to use it for. For instance, using Jython to call CXF libraries will not give as much exposure to thinking about problems in the way python programmers learn to do. That's the major advantage of learning a new language, since you can take away the concepts to your regular daily work. - Brad Chapman
Cameron Neylon
going to make a concerted effort to bring twitter into my attention stream - I'm just not sure how to do that best in practice
do you twitter from your phone? People who follow my twitter get texts when I update my twitter. right now that is only one person, but still. - Anthony Salvagno
Cameron, you can feed @ replies into FF using this Yahoo Pipe http://pipes.yahoo.com/pipes... - Mr. Gunn
No - in fact I've only just piped friendfeed into twitter - where I realised several people had tried to raise me over there where I wasn't monitoring. What I'd like to do is bring it into a separate friendfeed stream that I can easily dip into when there is time but ignore when there isn't. I don't really want another client on my desktop either if i can avoid it - Cameron Neylon
Anthony, the texting-on-update behavior is set per-user by the follower. - Mr. Gunn
Good point - it would be nice to have lists for our own feeds, as well as people. Could you make an imaginary friend, which just has your twitter stream attached to it, and add it to a 'when I have time' list? - Matt Wood
Also thinking about this. Can't you feed it into a separate room? - Matthew Todd
There is a problem with the feed of your own stream out of twitter - they seem to all be secured via username and password. What I would like to do is pull out the stream that I see when logged in at twitter. The other approach is to set up imaginary friends for all the people I want to follow but that seems like a faff - Cameron Neylon
Matt, that's exactly what I did. I added the pipe as an imaginary friend. - Mr. Gunn
Stephen Foskett
Does anyone out there use #Linode or #Slicehost? Care to comment on their service, performance, or availability?
Some experience with Slicehost - can certainly recommend it. - Matt Wood
Maxine
To Matt Wood - Hi Matt if you are reading this, could you make Euan Adie an administrator of this room? The Nature Network consolidated feed is not updating here whereas it is in Google Reader, so Euan might need to take a look. My admin rights aren't sufficient to make him a co-admin. Thanks! Maxine.
PS I am updating manually for now. - Maxine
All done - welcome our new admin, Euan. : ) - Matt Wood
Thanks, Matt! I was unable to fix this problem but Euan will be able to reveal all I am sure. - Maxine
Matt Wood
Sequencing throughput at Sanger could hit 100Tb a week in the next 4 months.
And what's the throughput of the automatic and manual annotation? - Egon Willighagen
Egon, for things that aren't covered by automatic annotation, I need between 6 hours and a week per protein sequence to do some serious annotation. So my weekly throughput is optimistically around 9000 bases a week ;). - Pawel Szczesny
Each sequenced sample is automatically analysed for base calling and quality: that runs a 1000 core cluster flat out. Secondary analysis (alignment, SNP calling, annotation) is performed after that. - Matt Wood
Matt Wood
To my Sanger homies: Full git support coming soon to a Genome Campus near you soon.
Keep us posted! - Jan Aerts
Matt, what does that mean? All local software/data repositories get converted to Git? What (graphical) clients are people expect to use? - Egon Willighagen
How does this integrate with non-Git specific software? A BioMart bridge? Or services? - Egon Willighagen
This is just an additional option for those using our central source code repositories. RCS, CVS and SVN have been in place for a while - good to see Git getting some traction in the life sciences. - Matt Wood
I've begun to love git - Rajarshi Guha
Frank
Finished working in academia and Newcastle for ever - off to the pub
Congratulations Frank, and best of luck in the new job. - Duncan Hull
Good luck at the new startup in Cambridge. Hopefully see you at a local event soon! - Matt Wood
Paulo Nuin
Would there be an audience for a Beginning Ruby for Bioinformatics blog?
Perhaps it's worth finding out! I'd be interested in trying - any other volunteer authors? - Matt Wood
I would help out, even though I am no Ruby expert. - Paulo Nuin
I wonder if something along the lines of the Peepcode or Gitcast screencasts would be cool? - Matt Wood
Gitcasts lite would be nice. Essentially a world in which the examples are about bioinformatics-related topics and not blogs and shopping carts :) - Deepak Singh
That's the sort of thing I had in mind - teach Ruby from a bio perspective: define classes with biological relevance, using ActiveRecord in biology, show use of @jandot's Ensembl API etc. Perhaps some podcast discussions too. - Matt Wood
Stands in front of line. I saw go for it - Deepak Singh
I'd be interested in learning some Ruby, especially if it's bioinformatics-related. - Walter Jessen
This is a great idea. - Michael Barton
Yep. And I'll try to find time to contribute :-) - Jan Aerts
That would be great - It would be great to help introduce the rest of my lab to ruby and programming. -r - Rob Syme
I'm in a ruby bioinformatics lab - I may be able to contribute a guest post or two, and I'd definitely read the blog. - Chris Miller
Don't know about a blog, but I'd love to help write a wiki book. We could start by demonstrating every bioruby method by example. - Neil Saunders
The ruby_for_bioinformatics site could be a repository on github, written in the format of Tom Preston Warners blog engine. That way everyone can contribute sections via git. The site could then be automatically hosted on github at ruby_for_bioinformatics.github.com. - Michael Barton
Explanation of TPW's blog engine http://bit.ly/rVdl - Michael Barton
As nice as github is, a wiki has a lower barrier to entry, which, in theory, means more contributions. - Chris Miller
Matt Wood
@mekentosj Agreed. Odd that MSFT have never taken up the Palette concept. Good to have some tools up as and when you need them. #keynote09
Office 2008 for mac has a palette - Deepak Singh
That's true - was thinking more generally in Windows. - Matt Wood
Hmmm maybe in Windows for Mac :) - Deepak Singh
Other ways to read this feed:Feed readerFacebook