Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Deepak Singh
Trying to figure out how to engage the chemistry community. 100's of people doing life science stuff in the cloud (even big companies) and with public data sets, but chemistry is a problem. Part of it is licensing. Any bright ideas?
I've been thinking about this a lot. The disparity is huge. So many life science/bio/physics on FF and elsewhere, so few chemists. After much thought I have concluded: that I don't know why this is. Tell you what. I'll start a new room for organic chemistry. - Matthew Todd
for the structure based drug design people, load up all of the PDB? In practice I don't know if it will be used (since most people are interested in one target, many ligands), but maybe it'll get people thinking... - Andrew Su
Making data in the cloud is one aspect. Deepak also notes licensing - this is a majr point since many of the tools used in chemistry are commercial. But I think another factor is that in the life sciences the basic data is standardized. In chemistry however, while at the most basic level all problems make use of the same data types (chemical structure) there are so many subsequent variations on that. - Rajarshi Guha
Also, do things like MD scale well in a cloud setup? But a cloud setup would be be nice to generate a large multi-conformer version of PubChem (and doable freely even) - but that's just me :) - Rajarshi Guha
Not yet. MD requires the kinds of interconnects that we don't have yet. Docking on the other hands should be just fine - Deepak Singh
Yes - I can see that the embarassingly parallel problems would do well in a cloud - there are a number of papers that talk about docking on grid systems, so I assume it's not a stretch to move them into the cloud. I suppose somebody just needs to set up images - though apart from Dock & Autodock they'd all face licensing issues - Rajarshi Guha
Yep. I was actually thinking of finding someone who'd do that. Probably should just talk to the Autodock folks since I know some of them - Deepak Singh
Love of Linux based applications and clusters is the major hurdle for the computational chemist (those who are doing Quantum chemistry MD and other Simulations). I hardly see any web application for computational chemistry, cloud computing never heard. At the same time for chemoinformatics there are many web based applications and sooner or later you will see many cloud applications. - Abhishek Tiwari
Abhishek, people are running Cytoscape, R, HMMer, Matlab, Mathematica, etc on EC2 today. You have a linux box with root access. In fact most scientific computing on AWS is not web based, but I suspect that's part of the lack of knowledge. I suspect fewer data resources that are generally open might be the other - Deepak Singh
Deepak, what about putting it to use in the Open Notebook Science projects? E.g. the solubility project... online validation of the chemistry behind the Ugi reactions... prediction of solubility, etc... - Egon Willighagen
How many people here are academic, how many are in industry ? Believe me, open data (no legal protection mechanism) is the last thing industry needs, work on the licensing and intellectual property question ! Data clouds raise severe data security questions, and some people argue that they are years away from practical use, just for this single reason. - joergkurtwegner
Joerg, that would make sense if there were not a bunch of financial services institutions and biomedical types doing good work today (from pilots and POC's to production work), and I am not talking academics. I am just curious why you have entire conferences in BioIT having panel discussions and think tank meetings on this subject, but not the chemistry side. - Deepak Singh
Deepak, the problem with any chemistry is that you need a patent to make money, the last thing you need is that others are getting smart about an (unvalidated) chemistry idea you are working on. It takes a while to check the paper2reality rate. I am a rookie in the cloud field, without numbers, names, and the ideas behind the financial return strategies, I have problems seeing it flying. My guess would be, that for all of them the model is quite different to pharma. - joergkurtwegner
I completely understand the IP concerns and questions and that is definitely part of the equation, but it's probably the only community I can think of right now that is not active - Deepak Singh
I think that the transfer of the old Inpharmatica databases and chemogenomics tools to EBI, where they will go public will be an important step to more openness. - Ola
@Ola: Interesting move, yes, and will it change the chemistry paper2physical compound rate? I do not think so ... still too little chemistry connections, physical compounds, and reactions in the open. - joergkurtwegner
Yes, Joerg - while much of this discussion is about cheminformatics (probably correctly), I am also interested in why there are so few distributed chemistry collaborations - i.e. those involving the actual synthesis of chemical compounds and their evaluation. Does it come down to data sharing tools? - Matthew Todd
@Matt: I think it complexer than that. 1-conservative group dynamics of chemists, and there are positive trends like the chemistry collaboration enthusiasm of Mitch http://tinyurl.com/djh3sc and the distributed drug discovery initiative http://tinyurl.com/chuh6a, 2-infrastructure of available purchasable physical compounds http://tinyurl.com/df7324, 3-information sharing or there are still too many single (non collaborating) data silos of chemistry projects. There is CAS and then...nothing broadly accepted. - joergkurtwegner
Mat - great idea about setting up a room for organic chemistry! Egon set up an ONS solubility room as well http://friendfeed.com/rooms... - Jean-Claude Bradley
Mat - I think it comes down to critical mass - that's why I was so happy to see you join FF - Jean-Claude Bradley
JC/Joerg - I think it's acquired, not hereditary... It's a social thing. Doesn't biology also suffer from dominant suppliers, and the need of purchasing reagents/physical objects? Non-collaborating data silos - maybe, but it doesn't have to be that way, does it Anthony? - Matthew Todd
@Matt,Jean-Claude: Each single room itself is already some sort of data silo. Sure FriendFeed itself is bridging and ChemSpider could e.g. serve as chemistry image link with providing more information ... so we all agree on more collaboration ? Still, the question of Deepak remains, 'what do those people need', which have not joined any of those collaboration forms, yet ? - joergkurtwegner
Tools that can help them share data - BUT tools that are simple to use. i.e. that require MINIMAL IT/programming/wiki/html knowledge to do so. All members of this discussion room would, I think, be astonished at how little web 2.0 tools are embraced by chemists because of the usability barrier. - Matthew Todd
Joerg - yes I have mixed feelings about FF rooms because they tend to isolate BUT getting someone initially into an organic room might make them see the other members and start subscribing to their posts. - Jean-Claude Bradley
@Matthew, my feeling is that conservativeness is a major factor - traditional collaborations are alive and kicking. So one could say that there are already 'distributed' collaborations. But for the stuff you are talking about requires some sort of data handling infrastructure to make it efficient. While a number of solutions come ot mind (CDD for example) academic chemists are, IMO, not used to thinking at that scale (?). Also, given the fact that cheminformatics does not have much traction in academia, we - Rajarshi Guha
have a situation where many synthetic chemists (for example) maybe do not realize that such distributed projects could be very feasible. Of course, IP issues play a major role in this area (and likely trump everything else). - Rajarshi Guha
From the looks of it, there are a number of factors at play here,culture, lack of knowledge, IP concerns, etc. Need to chew on this a little, since almost all the fears are essentially FUD. - Deepak Singh
One suggestion cut through directly to the consumer rather than the developer. Chemistry "services" are already very entrenched but there are many services that could get wide use and get attention. e.g. running some sort of in silico screening service where you can throw a PDB structure at PubChem/Chemspider or a set of compounds against some subset of PDB. The big difference between biology and chemistry is the effective lack of a chemoinformatics community building web based services that are easy to use - Cameron Neylon
...which should in no way be seen as a slight on the good work of all the people around here - but we have a huge proportion of the people working in this space right here - whereas there are, what, 10 times as many bio-inf people and a minute proportion of the global total. - Cameron Neylon
@Cameron, good points. It's also understandable why this is the case -it's only been recently that we have large and freely accessible data sources. One could also say that free availability of tools also plays a role - but still, many commercial vendors will collaborate in such efforts. In contrast, in bioinformatics, data & tools were primarily public from the start. - Rajarshi Guha
There also happen to be fewer people doing cheminformatics, so not too surprising that we don't see many cheminformatics services :) - Rajarshi Guha
Absolutely - that was kind of what I meant - there isn't the same size community of people interested in building services so what about reaching directly to the users? - Cameron Neylon
Only just found your blog posts too for some reason - they don't seem to have come through in my reader yet. Anyway I think there are fundamental social differences between chemistry and biology especially synthetic chemistry. You only have to walk into a chemistry lab. One person making one compound in one fume cupboard with virtually no interaction with anyone else...very very odd in today's world in my view - Cameron Neylon
@Cameron, wrt users, good point. But there are two aspects to that. First, knowing what users want. In many ways, it's a bootstrapping process in that users need to know what's available, what can be done etc. After which those things can be done. All this requires tight binding between cheminfo and syn/med/... chem. This is the case in industry, not in academia. At the same time, while this type of infrastructure is important, it's also not the only thing that cheminfo is about. Indeed, a lot of the ... - Rajarshi Guha
@Cameron. "One person making one compound in one fume cupboard" I know what you mean, but I'm not so sure. My group shares resources, and talk to each other daily, and at group meetings. We share a space with other groups. Maybe we don't talk shop enough? Maybe we're just old-fashioned and suspicious of computers... - Matthew Todd
It is a bootstrapping process. You can have all the data you want out there, but in the end it is about applications. In the bio space, lots of apps that are readily available (including as AMI's), so the friction levels are much lower. - Deepak Singh
... things that might be useful to the users are not really research topics - so given the small number of academic groups, I can see why they may not always be interested in putting together s/w tools. Personally, I think deployment/dissemination is as important as research - but it seems a lot of what you're suggesting might be best handled by contract s/w dev. And that's the vicious cycle - the smaller community makes it so that it's difficult to find someone with those skills. - Rajarshi Guha
More generally, and picking up on Rajarshi's interesting blog post, there's more to chemistry than docking and pharma. For a big data set, try SciFinder - the accumulated knowledge of how to make compounds. i.e. Take Chemspider and link all the entries with yields/reagents. I would like: an open database of chemical reactions that helps plan syntheses. - Matthew Todd
@Matthew - eminently doable - except where's the data? We can't pull it out of the commercial databases. So that leaves some form of public effort. So it's back to a social solution for data gathering. OK, with the data, you then want to plan reactions. There's quite a bit of literature on that - and a few groups working on it. Do you get in touch with them? Or get some sort of commercial solution? Do you want it to work out of the box? Or do oyou see it as a research project? Where is the money to ... - Rajarshi Guha
... implement it? And if it's not a research project, you need to get someone to do the s/w dev. I suppose my point is that the problems facing what you're saying are not (primarily) due to technical issues. (Also, why doesn't FF let me type more!!) - Rajarshi Guha
Again, kind of what I meant with a docking service. It isn't really a research project that any academic would want to take on but it would be incredibly useful, simple. A Blast for chemistry if you like. I guess what I am trying to get towards is that I'm not sure that just data will do it for chemistry - we need the kind of turnkey things that make things that are a hassle just straightforward - Cameron Neylon
That pretty much sums it up :) - Rajarshi Guha
One of the things people forget is to the extent to which all that genome sequence really only does one thing. It reduces the hassle involved in getting biological projects moving and keeping them moving. It's not a radical thing - just many steps become a bit easier. Almost purely a convenience thing. - Cameron Neylon
Cameron, couldn't agree more. - Deepak Singh
@Cameron, definitely true. But going from a sequence to a chemical structure means having to account for a lot more complexity (which is why I personally don't like the term 'BLAST for chemistry' - though efforts are being made in that direction for similarity searches). - Rajarshi Guha
So I guess the question is where are the similar pressure points in chemistry? But at the same time things that would grab attention and get publicity? - Cameron Neylon
On my wishlist: 1) pharmacophore searching against a multi-conformer (using a relatively large energy window) version of PubChem. 2) assay recommendation services - given a compound, has it been tested in any assay? What is its activity in assays it hasn't been tested in? Should / can I test it in some specific assays? Etc. But ideally, it'd be the bench chemists who come up with the wish list :) - Rajarshi Guha
And if you had to select a bunch of apps (pure computation, no services necessary), which ones would you want (lets assume that licensing is not an issue) - Deepak Singh
Docking would be an obvious candidate, though DockBlaster seems to address that currently. But I'm not sure what you mean by 'apps' - do you mean specific s/w? Or scientific applications (which is what I think Cameron was pointing towards) built on top of s/w components? - Rajarshi Guha
For now, specific applications. Autodock, dock, glide, etc. - Deepak Singh
glide,eHits, the Ftree stuff from Inteligand, the OpenEye ecosystem, markush searching/filtering s/w. I should admit, I actually have used very little commercial software - but these would be representative of what I'd like to play with - Rajarshi Guha
I guess I meant "Blast" as in the somewhat pejorative view that Deepak wrote about last week or so. Something that is really obvious and really useful that becomes so embedded it is part of the background. - Cameron Neylon
Bench chemist asks (repeatedly): 1) How do I make this molecule? 2) Has anyone made it before? and 3) How will I know when I've made it? Beyond that, I agree, the question "what has this molecule been tested against?" is a good one. The question "where can I buy the starting materials?" is less significant, particularly if you don't pay the bills. - Matthew Todd
I guess I don't even really know what the specific apps I'd like would be. I guess that probably illustrates the problem :-) - Cameron Neylon
Doesn't SciFinder/CAS/Reaxsys etc help for 2) (not considering cost) and possibly 1) ? - Rajarshi Guha
Rajarshi - yes, Scifinder etc help with 2) but aren't these expensive, monolithic databases with no user input? However, I ought to add an extra question we ask: 4) Who else in the world is currently making this compound? - that's something that no database can currently tell you, and could be of huge value. - Matthew Todd
Yes, they're expensive and closed - but also, thorough and I'd guess authoritative. So any other similar resource will have to provide somilar value. Maybe over time this is possible, but it'll take quite a bit of time! As for 4, yes I can see that'd be useful - but then one faces all the issues that ONS tries to address, so that ones definitely a problem that needs a social solution :) - Rajarshi Guha