Does anyone know of any MapReduce implementations of sequence motif finding algorithms? If not, what would be a suitable open-source motif finding package that could potentially be mapreducified? And are there any Amazon Machine Images with preinstalled motif finding packages?
Nature, Vol. 461, No. 7261. (26 August 2009), pp. 258-262. Cyanobacteria of the Synechococcus and Prochlorococcus genera are important contributors to photosynthetic productivity in the open oceans1, 2, 3. Recently, core photosystem II (PSII) genes were identified in cyanophages and proposed to function in photosynthesis and in increasing viral fitness by supplementing the host production of these proteins4, 5, 6, 7. Here we show evidence for the presence of photosystem I (PSI) genes in the genomes of viruses that infect these marine cyanobacteria, using pre-existing metagenomic data from the global ocean sampling expedition8 as well as from viral biomes9. The seven cyanobacterial core PSI genes identified in this study, psaA, B, C, D, E, K and a unique J and F fusion, form a cluster in cyanophage genomes, suggestive of selection for a distinct function in the virus life cycle. The existence of this PSI cluster was confirmed with overlapping and long polymerase chain reaction on...
- Neil Saunders
Added centroid banding to HMMoC adapter. MCMC alignment reconstruction should now run like the clappers, and may even be memory-efficient enough to run on genomes!
Scapegoats cause disease: "...disease has been 1 of the most powerful influences in competition between social groups." - http://www.mindhacks.com/blog...
Over the course of history, we have looked for scapegoats to blame for pestilence and disease. Cultural differences likely made it easy to place blame since practices that could influence the infectious spread often differed between sub-groups. I like the proposed link to J. Diamond's work.
- Noah Gray
don't scapegoats cause EVERYTHING? ;)
- Ian Holmes
Science, Vol. 325, No. 5943. (21 August 2009), pp. 1014-1017. Protein biosynthesis on the ribosome requires repeated cycles of ratcheting, which couples rotation of the two ribosomal subunits with respect to each other, and swiveling of the head domain of the small subunit. However, the molecular basis for how the two ribosomal subunits rearrange contacts with each other during ratcheting while remaining stably associated is not known. Here, we describe x-ray crystal structures of the intact Escherichia coli ribosome, either in the apo-form (3.5 angstrom resolution) or with one (4.0 angstrom resolution) or two (4.0 angstrom resolution) anticodon stem-loop tRNA mimics bound, that reveal intermediate states of intersubunit rotation. In the structures, the interface between the small and large ribosomal subunits rearranges in discrete steps along the ratcheting pathway. Positioning of the head domain of the small subunit is controlled by interactions with the large subunit and with the...
- Ian Holmes
Autism is a systemic body disorder that affects the brain. A toxic environment triggers certain genes in people susceptible to this condition.... Martha Herbert, MD, has painted a picture of autism that shows how core abnormalities in body systems like immunity, gut function, and detoxification play a central role in causing the behavioral and mood symptoms of autism.
- Ian Holmes
Nope. There are a variety of pipelines that perform similar tasks. Good starting point might be IMG documentation - http://img.jgi.doe.gov/w....
- Neil Saunders
Worth remembering that there is very little "best practice" in any bioinformatics. For a long time, we made it up as we went along. It's only this new generation of bioinformaticians that have any formal software engineering education and bandy around fancy terms like "best practice" to make us feel bad ;-)
- Neil Saunders
I think its more like the Perl culture "There is more than one way to do it !!" Best practices in bioinformatics is currently in an ad-hoc state of practice.Just like Damian Conways's Perl Best Practices is one of the best guide for good coding practices for Perl - hope we will also have a book on "Best Practices in Bioinformatics" soon, may be by a group of authors from LifeScientists room - what say ?
- Khader Shameer
@Khader thats why we need flexible guidelines and not the constrained best practice. Several minimal guidelines have been already worked out for the different aspects of the life science domain. MIBBI (http://www.mibbi.org/index...) can be a good starting point in this case.
- Abhishek Tiwari
I think very often in bioinformatics, TIMTOWTDI. It's not like software development, with a "task" and an "optimal solution". What I think matters most is that however you do it, it's documented and repeatable.
- Neil Saunders
I completely agree with you Neil, but some efforts towards developing well defined, documented workflows / protocols (can we call this as "Best Practices") to perform generic tasks (eg. annotation) will be useful for the community. I think several 'standards' (eg. MIRIAM/MIBBI) are developed to bring in a common frame work for routine tasks. I believe TLS is an ideal place to get a consensus about such practices and work on a wikibook of best practices in bioinformatics.
- Khader Shameer
And I agree with you. I'm all for standards and best practice. I'm also a realist and a practical bioinformatician :-)
- Neil Saunders
@Abishek : Best practices are not always "constrained", and constrained practices are impossible due to complexity of biological system - flexibility should be there. But my point is that even if MIBBI / other standards (http://www.mibbi.org/index...) are available for a long time - I've never seen them in research papers - is it due to poor visibility of such projects or no interest in promoting such initiative ?
- Khader Shameer
Khader, that's a good question. There seems to be a disconnect between standards developers and the people who should be using the standards. I think it's a publishing problem. Developers publish in computational journals and use computational jargon; users don't read those journals or understand the jargon.
- Neil Saunders
Khader, In my opinion the main motive of guidelines is to avoid the disagreement while best practices try to bring an agreement in community. Also, people are using these guidelines. Its just lack of awareness otherwise more and more people will adopt them. Take any Biomodels database model or CellML repository model, they are well annotated according to MIRIAM guidelines. Allyson...
more...
- Abhishek Tiwari
I find the line "it's not like software development" to pretty much sum up some of the problems in bioinformatics. Why isn't it?!?
- Neil Swainston
It's complicated :-) In part, it's because researchers are more interested in quick answers (= quick fixes) than good code. In part because it's only in recent times that bioinformaticians receive formal software training. In part, because biological problems are more complex than input -> process -> output and you don't always know exactly what you want to achieve when you start. And I guess, biological information has a lot of "context", not easily captured by simple routines.
- Neil Saunders
Hi Neil. Yep, all that you say is true. Just from a personal perspective, I've found that being "disciplined" in writing code (making nice, clean, interfaces to modules, unit testing, documenting) means that in the middle-to-long-run, quick answers are easier to come by. By building up a reasonably reliable library of classes (I'm a Java-geek), sticking the bits of Lego together is...
more...
- Neil Swainston
Neil, I absolutely agree. It took me some time to get to the point of trying to "do things right" from the outset - libraries, documentation etc. and I'm glad I got there. I think a lot of the problems stem from how academic research is conducted. "Can you just give me a table by tomorrow?" "Sure, let me write a library." "No, I just want a table." Hack together perl script, deliver table, discard, move on. Rinse and repeat, until contract expires. Leave mess behind.
- Neil Saunders
Couldn't put it better myself! I guess I'm lucky in so far as that I do have the luxury of longer timescales... until my contract expires.
- Neil Swainston
Thanks Abishek for the pointers to application of different standards. My point is the goal of both best practices and standards are the same - getting a consensus to do repetitive experiments / workflows. But as Neil's are discussing - the choice of individual bioinformatics projects is mainly to get a good fix, rather than an excellent code base. But hope some degree of consensus can be obtained if people can follow standards as a first step.
- Khader Shameer
Science isn't set up to reward coding standards. Funding agencies reward quick biological results, not infrastructure and software development. I'd argue that for every 5 biological grants, the NIH should be funding one software/database/computational infrastructure grant. The amount of data is only getting bigger.
- Chris Miller
I'd agree with that, Chris. Career wise, it's pretty much immaterial whether I churn out a hack or something "good" and reusable. It's quite annoying. Grrrr!!
- Neil Swainston
@Michael / Neil : I am agreeing with "Science isn't set up to reward coding standards", but as a subject in the interface of science and technology - it is high time that bioinformatics should embrace the standards. For Michael's question I was trying to make a point that if there is a standard/best practice/generic protocol for microbial genome annotation - he could have just followed...
more...
- Khader Shameer
I think genome annotation is an excellent example of how bioinformatics is not like software development. You don't just run a program and annotate a genome. There are lots of biological features: protein-coding genes, non-protein coding genes, motifs - all with their own associated metadata, all with various, disparate tools written specifically for each type of feature. Annotation is...
more...
- Neil Saunders
too right Neil. is there a best practice for violin-making, vision quests, or coming-of-age experiences? ;)
- Ian Holmes
:-) Exactly. The end result is what matters.
- Neil Saunders
srsly tho -- there are plenty of papers describing microbial genome annotation. it's still an open research area, but there are commonalities (repeats, transposons, genes, typical errors, ...) so I guess the rough union of those vague concepts would constitute the current best practice. not exactly a recipe...
- Ian Holmes
:D best practice for violin-making, vision quests, or coming-of-age experiences :D - Neil, in the current era of bioinformatics with Webservices and Work-flows - having an SOP/BP is always help you to kick start the work in minimal time rather than going through all genome project paper for the flowcharts for annotations.
- Khader Shameer
@ Ian : OK, finally that's something that Michael/any one interested in annotation to get from this thread.
- Khader Shameer
Khader, what we're saying is that in this case, there isn't an SOP/BP, because it just isn't that kind of procedure. But there is, as Ian says, plenty of advice available. I guess, in terms that CS people might understand, it's not agile. You actually have to put some work into understanding what's going on and what you want to do.
- Neil Saunders
@Neil - ^(chicken|egg)? - It could and should be that kind of procedure though. All the advice in the world isn't going to help the people that actually *use* your annotations. The current 'system' for annotating anything is so mindlessly broken I'm surprised it works at all. Now all it needs is a catchy name. Blight of Bioinformatics maybe?
- Paul J. Davis
Thanks for the comments everyone. I'm going to read as many genome papers as possible and try and put what I read together.
- Michael Barton
Neil Saunders, I agree a lot of advice is available and it is definitely helpful. For example, I was not aware of something like MIARE (thanks to Abishek), am now implementing in our RNAi screen. But I can't agree with you if you define bioinformatics projects as non-agile. From a simple BLAST based sequence analysis to large scale data analysis is following agile approach. Think of n...
more...
- Khader Shameer
Thanks Paul,for the links to the articles.
- Khader Shameer
Khader, your very use of the word "agile" sums up what this is all about. Clearly you are "new school" bioinformatics and appreciate software development. "Old school" bioinformatics would never even use the word :-) As I keep saying, I don't disagree with anyone here who calls for better practices, standards or "agility". Just be aware that there are still plenty of old-timers around for whom bioinformatics means "hack together something that works."
- Neil Saunders
Summoning the FF hive-mind to give examples of fads in science. A fad is defined as a notion that is highly popular for a short term, although its popularity stems from the claim to be a long term or ultimate solution to whatever problem is being addressed. Discuss.
you need to update that. Personal genomics is way up on the hype cycle :)
- Deepak Singh
from IM
well it was in 2007 :) .. I should update it one of these days
- Pedro Beltrao
systems biology. (That said, these things always work in cycles - the overhyping, the cries of false promises, the retreat from the limelight, and the slow integration into the mainstream)
- Chris Miller
Chris, you have to look at Pedro's diagram then
- Deepak Singh
from IM
I don't think HTS, GUTs or Strings can be called fads because they have been around too long for that. There are fads *within* those subjects for sure, but I think something that is popular for >20years is not a fad.
- Matt Leifer
+1 Jason. Chaos and fractals. Also cellular automata (though all 3 are very interesting fads)
- Ian Holmes
Here's another good one that may be a bit more obscure: the catastrophe theory that was prostheletized by Zeeman. (Now, the mathematical theory of catastrophes is well established. The fad was to apply it to everything.)
- Jason Miller
Wow. nice thread. Now..muhahaha... you have helped write half of a new blog entry. TYVM. Especially Pedro.
- Iddo Friedberg
YOU_NAME_IT@home projects... or a bunch of crappy PCs contributing 24/7 to global warming.
- Martin Jambon
Not sure that some of these things are fads. I guess that the ultimate test for phenomenon being a fad is that it ends up being discarded, or adopted at a a much lower key than it was initially played with. Maybe a good test would be to look for historic NSF/NIH/DOE roadmaps, and see whether the highly funded projects and topics of a decade or two ago actually "made it".
- Iddo Friedberg
yes i forgot about catastrophe theory! nice one. Can we add category theory too?
- Ian Holmes
A bit outdated, but what about "solving the protein folding problem" or discovering a protein with a "novel 3D fold".
- Mickey Kosloff
Mickey - Your 'protein folding' wasn't such a fad as a 'much harder problem' than people thought. Many people are still working on it, and it will have a huge impact when solved.
- Jason Miller
I do maintain that "it's not rocket science" should be replaced with "it's not protein folding". Getting a ship to the moon's a lot easier problem to solve :)
- Deepak Singh
@Jason - my personal view is that current state of the art structure prediction approaches (which is what most people are working on) are not what people *used* to refer to as "solving the protein folding problem". More specifically, before the CASP experiments started people used to frequently publish papers that stated that they solved this problem (again). My impression is that CASPs seem to have killed that particular fad.
- Mickey Kosloff
Mickey, not the protein folding community (which is different from the structure prediction community). What happened was structure prediction got good enough and CASP (which I don't really like) gave people bragging rights, so that's where the effort went. Some day people will get back to real physics :)
- Deepak Singh
@deepak. What happened with CASP was that a lot crystallographers and biochemists got sick of theorists claiming that they had solved the protein folding "in principle", which would happen like clockwork every few months in the 90s. CASP was put together as a challenge to put your money where your mouth is. Unsurprisingly the claims of having solved the protein problem dropped significantly after the first CASP.
- Bosco Ho
from iPhone
@ Deepak. I agree with Bosco, and perhaps I wasn't clear enough in my initial comment. The fad I referred to was not studying protein folding (either experimentally or computationally), but the (frequent) declarations of solving the protein folding problem. This was before my time, but people that have been involved in CASPs since the 90s still remember this "fad" well. I participated...
more...
- Mickey Kosloff
From my recollection most of the physical folding people never even took part because all you could even try and look at were small models (people like Peter Wolynes and Dev Thirumalai). It was the comparative modeling/threading folks who were participating and trying to out CASP each other with entire labs shutting down for half the year. That's just bad science. I think the first few CASPs had a purpose, and with all the structures being solved made a lot of sense, but CASP outlived its use in about 2004.
- Deepak Singh
Now we're hijacking Iddo's thread for some CASP-bashing, which might require a separate thread (if not a separate room) :-)
- Mickey Kosloff
"Let's face it: all the cool kids use it."
- Yann Abraham
I would add at least one more: R code can be embedded in TeX documents.
- Daniel Mietchen
I like how "R is used by the majority of academic statisticians" and "Let's face it: all the cool kids use it" are two entirely separate points ;-)
- Ian Holmes
Here is the text of a letter from UC Officials: CHANCELLORS ACADEMIC COUNCIL CHAIR CROUGHAN Colleagues: After speaking at length with all of you and a number of other people with an interest in the issue, we have decided that faculty furlough days will not occur on instructional days (days for which a faculty member is scheduled to give lectures, lead classes or workshops, have scheduled office hours, or have other scheduled face-to-face responsibilities for students). The furloughs that have been necessitated by the severe University underfunding by the State are causing significant problems for faculty who have restrictions on research and service as well as increased teaching workloads; employees who have fewer days to do their work and sometimes fewer colleagues to help them; administrators who have reduced staff and budgets to accomplish their complex tasks; on top of lower salaries for everyone. Students too will suffer the effects of the underfunding--larger and fewer classes,...
more...
- Jonathan Eisen
on 8/24 by exec VC/provost George Breslauer. Details still being finalized. Announcement is referenced here though they don't specifically mention the mitigation scheme http://www.dailycal.org/article...
- Ian Holmes
let me know if you're interested -- the tools should hopefully be self-documenting, so you may be able to use them independently, but i'd be happy to help
- Ian Holmes
looking forward to it. Your protein paper was fantastic. Helped me settle on MAFTT.
- Bosco Ho
Thanks for the kind words, Bosco. And Thanks Ian for the information and help. If needed I will contact you.
- Paulo Nuin
combine this with the no-SQL movement, and I'm feeling pretty darn well vindicated
- Ian Holmes
from BuddyFeed
I hadn't heard of the noSQL movement before. Fun reading for later. It is certainly something I would have supported in my pre-Xfam days. Now I do see it is useful for the things that I can't use cat, grep, join, sort, perl -lane, awk, ... for.
- Paul Gardner
sure it's useful; the nice thing is it's no longer the only power tool in the box...
- Ian Holmes
IMO we can expect a lot more scrutiny of the precarious research/teaching social contract
- Ian Holmes
Agree with Ian. Forward-looking institutions might think about gathering evidence to show whether they've held up their side of the contract...
- Bill Hooker
Article really doesn't break down costs but instead seems to bash professors/colleges/research. I think the title doesn't match the content but I like this quote: "That’s why it’s essential, when making the ever more costly choices required in education, to carefully scope out each college. Call the admissions office and inquire about the student/teacher ratio and the percentage of classes taught by graduate students."
- Andrew Lang
relevant Lincoln Stein quote: "I get up every morning and I write for two hours" (via Sean Eddy, discussing how he wrote his books)
- Ian Holmes
btw i'd dispute the coffee thing. coffee is only necessary for work if you are a caffeine addict (which i am). whenever i've been able to ditch the habit, it's actually made me marginally more efficient
- Ian Holmes
Controversial use of Pfam. Mmmm, I was thinking about using domain prescence/abscence to resolve deep evolutionary relationships a while ago. It's still a cool idea -- just needs to implemented carefully.
- Paul Gardner
a long-running interest of mine is to work on stochastic models of domain presence/absence that include pathway information. let me know if you ever feel like collaborating on this :)
- Ian Holmes
It sounds like an awesome project. Properly incorporating pathways & co-occurrence would be tricky. I'd love to get involved but I'm not sure I have the wherewithal or the ability to break out of my bad production cycle. Rfam 10.0 is developing slowly...
- Paul Gardner
That was hilarious. See the mRNA swimming eel-like towards the ribosome. The tRNAs all queuing in the correct order (maybe they're British?). The zoom out looked like a big bear eating tRNAs and shitting out blue peptides.
- Paul Gardner
Having come back to re-learn Java after learning the basics in the mid-90s, I'd say it's not all that bad. Although, I do find myself often wanting to port over the Python standard library since there are a few annoying inconsistencies in the Java standard library. Compared with Python, it often also forces you to deal with many more fine details for apparently simple task, and several layers of abstraction. But maybe I'm doing it wrong :)
- Andrew Perry
That latter thing is my problem, which is why I am so much happier with Ruby. Of course, I am sure I am doing it wrong
- Deepak Singh
it's all about holding out as long as possible. that way you can half-learn a bunch more languages which is more fun - I retreat to a dark & gloomy corner caked with Perl and C++ debris whenever I need to get anything actually done ;)
- Ian Holmes
My problem is that I am a Perl and Fortran guy who didn't write any code for 3 years .. bad mistake. Should never ever have stopped
- Deepak Singh
Respect. I hear Fortran bindings for hadoop are just around the corner
- Ian Holmes