Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Open Notebook Science Solubility Project

Open Notebook Science Solubility Project

Room for chatting and information aggregation around the Solubility project of the Open Notebook Science community.
Egon Willighagen
Fwd: Open Notebook Science Solubility: the SPARQL end point - http://chem-bla-ics.blogspot.com/2009... (via http://friendfeed.com/egonw...)
Brilliant Egon - Andrew Lang
Andrew Lang
Webservice that calculates Hansen solubility parameters from SMILES. http://www.pirika.com/hiroka...
hsplight.png
super Andy! - Jean-Claude Bradley
Andrew Lang
Solvents clustered by Abraham solvation parameters using happieclust. ref: http://friendfeed.com/danielm...
abraham.png
This should help us pick sets of diverse solvents. - Andrew Lang
Is there a reason to use approximate clustering over an exact method? The dataset isn't very large if I recall. Is there a big difference with say standard hierarchical clustering methods? - Rajarshi Guha from iPhone
For small datasets it should do exact. Just to make sure I used the the flag --all-pairwise. Didn't seem to change but you're right - I needed to make sure. - Andrew Lang
Jean-Claude Bradley
Prediction of solubility of drugs and other compounds in organic solvents - recent article by Abraham on non-aqueous solubility prediction - thanks Andy http://dx.doi.org/10...
Andrew Lang
Google now indexing thumbs. Final page of solubility book: Solubilities of inorganic and organic compounds: a compilation of ..., Volume 1 By Atherton Seidell
thumb.png
it actually looks like a toe - Jean-Claude Bradley
Correct. Tis pink, looks like a form of shoewear with desert like background to me so, might this be "Toe in the Sahara, with shoe" Featuring Sting and @cromercrox :- http://www.last.fm/music... - Graham Steel
or does this prove Megan Fox made this scan http://www.nydailynews.com/gossip... - Jean-Claude Bradley
Is this already be digitized by typing monkeys? - Egon Willighagen
it is automatically generated Egon - just waiting for me to get the preface done.... - Jean-Claude Bradley
Jean-Claude... you mean, you are getting the data rescued already? - Egon Willighagen
no sure what you mean by rescued Egon - Jean-Claude Bradley
@JC I think Egon is talking about people transcribing the Seidell's solubility book. @Egon I think JC is talking about the ONS solubility book. :) - Andrew Lang
thanks for the clarification Andy - Marshall already uploaded most of the carboxylic acids and aldehydes - yes I was referring to our own book Egon - Jean-Claude Bradley
Ah... JC, sorry... I did not realized you were compiling an own book :) @Andrew... yes, I was talking about transcribing values from the Seidell book... I might know someone who wants to help with that (or at least try it; he's not a chemist)... - Egon Willighagen
My mistake Egon about the confusion with the book - yes we have one coming out soon. As for help with adding data from the Seidell book I think we have most of the relevant compounds. And it would require a chemist to translate the way names were done back then - also much of it requires conversion between g/100g solvent or g/100g solution to molar, etc - Jean-Claude Bradley
Andrew Lang
log P = 1.46(±0.02) + 0.11(±0.001) NC-0.11(±0.001) NHET
Just noticed - this is from vcc lab. - Andrew Lang
like a dream come true - Jean-Claude Bradley
@Andrew... no, it's not that easy... it's that *difficult*... the problem is so complex, it's very hard to do better... - Egon Willighagen
Andrew Lang
Should we contribute the solubility data to Wolfram Alpha? http://www.wolframalpha.com/partici...
rdf? - Andrew Lang
seems like an obvious thing to do if possible - but is it true that there is no way to provide a source for data? - Jean-Claude Bradley
I'm uneasy with the whole interface for this reason. - Matthew Todd
I'm uneasy with the lack of attribution for specific data points and the lack of an interface for "corrections" as opposed to just comments. The data has to scale for this to work and I just don't see how they can do that without an open dataset - Cameron Neylon
content submission policy - "When you submit a fact, set of facts, dataset, formula, or any other information to be considered for incorporation Wolfram|Apha, you are giving it to Wolfram Alpha LLC ("we"/"us") free and clear, to do with anything and everything we choose. Your submission has to include a transfer/disclaimer of all intellectual property rights because ..." - Andrew Lang
Andrew - what does that mean? Data flow openly in, but not openly out? - Matthew Todd
@Matthew: I'd say that's exactly what it means. They'll happily take but won't give back. Hell, I could even live with that -- if there were sources provided for the data, rather than a blanket "Trust Us". - Bill Hooker
Yes. I don't like it much either - I made my suggestion before I found the submission policy - maybe we shouldn't - just as a matter of principle. - Andrew Lang
I think there is a balance here between being helpful and trying to persuade them to open up the innards more and working just to see stuff dissappear into the bowels of the system. I would start by being positive and see what the response is really. After all it is open data so they can do with it what they like - question of how much effort we are prepared to put in really - Cameron Neylon
Cameron++ This is a nice example of why people use copyleft data licenses :) If you truly want your data to be Open (CC0, PD, ...), you would not care if WA would remove source info and make the data proprietary... - Egon Willighagen
Egon agreed. Which is why I phrased it like I did. But equally an example of community enforcement. WA are free to use the data, I would be happy for them to do so. But I'm not going to put much work into assisting them or working with them unless I can see that data being allowed back out in a useful form. To be fair to them they are allowing export of the "Mathematica Form" of their data objects which is presumably what they are holding in their databanks - Cameron Neylon
Egon, and to be precise I object to them removing source data and making it proprietary because I think it means the service won't ultimately be as useful as it could be. If it were open and user editable then with a growing user curated data set and what appears to be a pretty good natural language parser and reasoning system we could do great things. Closed it won't be as good so I do object and will say so. I just don't think a license on our data is the right way of enforcing their good behaviour :-) - Cameron Neylon
Wrote up my thoughts... discussion most welcome: http://chem-bla-ics.blogspot.com/2009... - Egon Willighagen
Don't mind at all if WA wants to vacuum up all the data it pleases, but as Cameron says it alters the motivation for being part of the experiment. I also can't be bothered with it if the data are not sourced and credited. Question to student = x. Student answers y. Student is asked "How do you know?", student replies "Because WA says so" etc - Matthew Todd
Matthew, indeed! Google has the advantage that it keeps track of the source... this is what worries me about many chemistry databases too: where is the link to the (primary) source... WA is not that different from other recent efforts... BTW, I did see sources for some questions... e.g. the 42 answer did source to D. Adams... - Egon Willighagen
:) The one thing nobody needs a source for... - Matthew Todd
Given the amount of curation that goes on with the ChemSpider database it is totally unrealistic to expect that data are immutable "facts" - very dangerous foundation indeed - Jean-Claude Bradley
Lets say we put some solubility info in there - what would be and example of a query or task that could be performed by WA? - Jean-Claude Bradley
@ JC: Right - what are the really interesting questions you could use answers to? How about a prediction of other solvents you could try that might dissolve/precipitate a molecule with known (in)solubility in several given solvents. i.e. an extrapolation. Maybe I'm over-burdening WA with expectations, but I have in mind what Michael Nielsen was previously talking about - *discoveries* that have been made through the semantic web, or linked data. Isn't that what would separate WA from a search engine? - Matthew Todd
@Egon -- "If you truly want your data to be Open (CC0, PD, ...), you would not care if WA would remove source info" -- not quite. I don't care if they come and get my data and do whatever with it -- as you say, that's why it's Open. What I am balking at is the idea that the community should actively provide them with data, only to see it disappear into a black hole. If they want community input they should be prepared to engage fully with the community. - Bill Hooker
I'm also unhappy - our own studies show that there is no quality control - they hoover everything and use suspect algorithms to deduce from it. There are sources they have used - like MSDS collections that I did not use because I assumed I would broach copyright. Maybe they have paid, but maybe they have stolen - peter murray-rust
@Bill: yes, I can relate to that. There is so much to do, and independent from license choice, I too do not want to spend (too much) time on something proprietary black hole. - Egon Willighagen
I actually chatted with Theodore Gray at Google and offered to help with the quality of data on Wolfram Alpha. They do not seem to deal at all with stereochemistry (See the blogpost: http://tinyurl.com/yk7mkyt). I sent an email and can't get a response to even allow us to help improve the quality. If anybody has an inroad to Wolfram Alpha and can introduce me to someone who is... more... - Antony Williams
I emailed Theodore Gray after scifoo and he set me up with some W|A peeps, we're converting our calc labs to W|A, but I don't know any of the data people. Separately, I did get a reply from the data people regarding the solubility data: "Thank you for your suggestion regarding Wolfram|Alpha. We are interested in hearing more about your data, please provide us with source links if possible or attach a sample of your excel file." I replied but didn't hear back. - Andrew Lang
Egon Willighagen
Fwd: A First General Solubility Model from ONS Challenge Data - http://usefulchem.blogspot.com/2009... (via http://friendfeed.com/jcbradl...)
Fwd: A First General Solubility Model from ONS Challenge Data - http://usefulchem.blogspot.com/2009/09/first-general-solubility-model-from-ons.html (via http://ff.im/8SMsA)
I replied on JC's blog but maybe we can discuss here. - Andrew Lang
yes, pset predictions would be useful. I think the main concern is the use of a stepwise like feature selection procedure. Depending on the size of the descriptor pool, I'd probably just do a brute force all combination search. Crude but easy - Rajarshi Guha
Is there a tool to do the an all combination search? - Andrew Lang
I don't know of the top of my head - but depending on what environment you're working in it's just a matterof making all combinations and then looping over them - Rajarshi Guha
:) would you believe I use mathematica. - Andrew Lang
IIRC, the Subsets function will enumerate all combinations - Rajarshi Guha
You're right! Thanks Rajarshi. - Andrew Lang
'Out of Memory' when I tried all combinations, so now I'm trying biased subsets of combinations of descriptors - not ideal I know. Was getting better R^2 values last night but fell asleep. Will try to get something up tonight. Also as Egon suggested, using 200 data points in modeling, reserving the remainder for testing. - Andrew Lang
Page updated: http://onschallenge.wikispaces.com/Solubil.... With suggestions from Rajarshi and Egon, the model has improved - an expert could improve it further still. :) - Andrew Lang
Andrew Lang
1-octadecylamine on W|A. Looks like they included gravity in the minimization algorithm? http://www.wolframalpha.com/input...
1-octadecylamine.gif
Andrew Lang
Translation Needed: "...indices for cohesive interactions in solids..." Context is descriptors. Is this melting point?
Stuff like LJ potentials, columb parameters, crystal lattice energy, H-bonding terms etc - Rajarshi Guha
Thanks Rajarshi. - Andrew Lang
Egon Willighagen
This is how I'll share the results of checking the spreadsheet data: Tweet send using the #jtwitter library with a #Bioclipse script #2 - http://egonw.posterous.com/tweet-s... (via http://friendfeed.com/egonw...)
This is how I'll share the results of checking the spreadsheet data: Tweet send using the #jtwitter library with a #Bioclipse script #2 - http://egonw.posterous.com/tweet-send-using-the-jtwitter-library-with-a-0 (via http://ff.im/6Jr8E)
cool - by the way I mentioned RDF during my talk yesterday and didn't get the blank stares I usually do :) - Jean-Claude Bradley
I wonder if we could automatically twitter dosol, dougi, and solsum updates for people to follow. - Andrew Lang
Andy that might be useful if we don't flood the feed - maybe just post if there is a new top priority request? - Jean-Claude Bradley
@Andrew: yes, that's the kind of thing I am thinking about too... BTW, I have started adding support for #MyExperiment in #Bioclipse, so that Bioclipse 'workflows' can be shared on their social network... - Egon Willighagen
Egon Willighagen
Fwd: Beautiful Data just arrived in paperback: http://twitpic.com/dlr9g. All proceeds to Creative Commons and the Sunlight Foundation. (via http://friendfeed.com/themza...)
Fwd: Beautiful Data just arrived in paperback: http://twitpic.com/dlr9g. All proceeds to Creative Commons and the Sunlight Foundation. (via http://ff.im/6uT5o)
Egon Willighagen
Fwd: Hacking up #bioclipse support for #onssolubility... below is a screenshot of the preference page to set Google credentials... (via http://friendfeed.com/egonw...)
onsSolPrefPageBioclipse.png
Egon Willighagen
Andrew Lang
Solubility data uploaded to Google Fusion Tables - Displaying solid carboxylic acids in methanol.
ONSData.png
sweet - does it have an API? - Jean-Claude Bradley
no API as of yet - today was the first day I could actually import the data. It has been very unGoogle-like in exhibiting bugs. For the future I can see this is the tool we'll use for the solubility data but just something fun to play with right now. - Andrew Lang
Michael R. Bernstein
WebLab: a data-centric, knowledge-sharing bioinformatic platform -- Liu et al., 10.1093/nar/gkp428 -- Nucleic Acids Research - http://nar.oxfordjournals.org/cgi... (via http://friendfeed.com/webmave...)
Seems that the website (http://weblab.cbi.pku.edu.cn/) is down. ::edit:: got through on a second try - Jason Winget
Andrew Lang
Call to action - wikipedia editor questions the reliability of data from open science - please go to his talk page and add your support for the ONSchallenge: http://en.wikipedia.org/wiki...
quote from editor: "Although I am sure that this school project was fun for the kids, Wikipedia needs to have data here from verifiable sources. Reference to a university site (Oral Roberts or Harvard) is not good enough. Otherwise your work risks being deleted. NIST, CRC etc, now they are authorities. I really encourage you to consult someone before launching on what looks like a well intentioned but naively planned project." - Andrew Lang
Please help. Go to the talk page and make a comment. Thanks! - Andrew Lang
There is probably a value in thinking carefully about the response here. I have heard some criticisms of the methodology being used and have suggested that people make those criticisms in the project notebooks but this hasn't happened as yet. Strictly the WP guidelines do require a traditionally published (non-web) verifiable source and one could make an argument that these results have... more... - Cameron Neylon
I think it has been peer reviewed - http://onschallenge.wikispaces.com/judges - Andrew Lang
...but not in the traditional way...I agree with you but it's a subtle argument and it could get lost - Cameron Neylon
I would also say that at the end of the day there is much more back up on our data than there is for a NIST or CRC value in many cases - at least you can tell what was measured. Just not sure whether that will be enough for the traditionalists. Just worth rehearsing the arguments I guess i what I am saying. - Cameron Neylon
Can you link the actual page that's in dispute? Major flaw in WP talk imo, there are no auto links to the pages being discussed. - Bill Hooker
@Bill Here's my edits - he's questioning them all: http://en.wikipedia.org/wiki... - Andrew Lang
Lets keep in mind that none of the other properties even have references (see for example benzoic acid: http://en.wikipedia.org/wiki... ) - Jean-Claude Bradley
Just noticed in the urea talk page someone complaining in 2007 that all the solubility values were wrong and unreferenced :-) - Cameron Neylon
Added my comment. It seems to me that, as a source to be quoted in WP, open notebooks are a new category (see http://en.wikipedia.org/wiki...). They are not "third-party published sources with a reputation for fact-checking and accuracy", but they are "produced by an established expert on the topic" in the sense that JC runs the project and we judges are all scientists. More to the point, WP has never been asked to consider this kind of source before. - Bill Hooker
Thanks Bill. - Andrew Lang
@Cameron Good find! I can see the ppt slide now. Here's the quote from 2007 about the solubility in water - it seems to have never been resolved - "The article provides some specific information about the solubility of urea without giving a source. The values were out by at least a factor of 10 (probably g/L rather than g/100mL), which I have corrected by knocking off the last zero, but... more... - Andrew Lang
I added my 2 cents: http://en.wikipedia.org/w... . Still do not get why none of you guys - so much engaged in doing open science - seem to have ever shown up at Citizendium, a place where expertise is actually valued. It is currently very small compared to Wikipedia, but so is Open Science compared to the rest. See also http://en.wikiversity.org/wiki... . - Daniel Mietchen
Thanks for adding a post to the discussion on wikipedia Daniel. I will check out Citizendium. - Andrew Lang
I believe the reason most of us stick to Wikipedia is that we believe that it is (a) the appropriate general resource and (b) since it is the source most people, and Google, go to, information will be found there. - Deepak Singh
Certainly nothing wrong with reading or editing Wikipedia entries. But the sort of problem discussed in this thread would be much rarer over there at Citizendium (they have others), and since it tends to affect scientists quite often, I am wondering why they do not give this alternative a try. Andrew Su has done that, and he was disappointed (a feeling I share for his case, since I... more... - Daniel Mietchen
Daniel - thanks for the comments and for the link to Wikiversity and Citizendium - we'll certainly check it out. A first look does not show any entries for common chemicals like methanol. Additional portals are always of interest though. The reason Wikipedia is useful is that it turns out to be a significant way people looking for specific non-aqueous solubility find our results. - Jean-Claude Bradley
Andrew Lang
Bing - "Solubility of benzoic acid" search results roll-over gives solubility values in THF, acetone, and acetonitile from ONSchallenge data.
bing.png
bing2.png
Andrew Lang
Bill Hooker
What's up with David's experiment 094 (http://onschallenge.wikispaces.com/Exp094)? Is most of the error coming from poor NMR, or do we need to routinely cross-validate SAMS against internal standard? I see he is repeating in expt 102 (http://onschallenge.wikispaces.com/Exp102)...
It was a little troubling to me too until I read JC's comment "[I don't think that this experiment proves very much either way - the NMR is very poor quality and the solvent peak overlaps with the reference compound. This should be redone with a reference compound that does not overlap with any peaks JCB]" - Andrew Lang
Andrew Lang
Thinking this may replace java applets for webpage chemistry visualization - http://tools.google.com/dlpage... Anyone had any exerience with it?
Michael R. Bernstein
Andrew Lang
ONS Challenge on Wikipedia - http://en.wikipedia.org/wiki...
ONS.png
Contribute please. - Andrew Lang
Page just got flagged for deletion. Please go add a few lines so that they know it is not just me editing the page. Thanks! - Andrew Lang
Done, Andrew - OK, only a one liner and link, but done. Note to self, must wiki more often. - Graham Steel
Thanks Graham. If everyone can do just that it should be ok. - Andrew Lang
Made a few small changes. - Bill Hooker
Added a link to WP Open Data entry... - Egon Willighagen
thanks Bill and Egon! - Jean-Claude Bradley
Andrew Lang
Jmol as a 3D solubility data visualization tool.
methanol.jpg
Descriptor chemical space for methanol solubility data rendered in Jmol. - Andrew Lang
Egon Willighagen
JC, Cameron, others... This open website is a project of a PhD I spoke yesterday over a beer, here at BMC/Uppsala/SE, and when he mentioned it, I immediately thought about the protocols used for Ugi reactions and solubility measurements... this site isolates the protocol descriptions in a social web like idea. The PhD was much interested in feedback from chemists on his work... Can you help him out? - Egon Willighagen
No this cannot replace a lab notebook where a detailed record of a specific experiment with all generated data is made available. We could however generate general solubility measurement protocols from the experience gained from lots of experiments. This is very similar to the myExperiment free text protocols. - Jean-Claude Bradley
No, surely it cannot replace the lab notebook... an experiment is never an exact copy of a protocol... - Egon Willighagen
Sorry, just trying to catch up here. It seems like a reasonably nice functionality for a protocol site but its not clear to me what the killer feature is here that would bring more people in. It looks nicer than OWW for instance but doesn't let people edit directly. Not sure whether that is what people want but again what is bringing people into make comments? One thing that is appealing is the hint that they are interested in linking materials across multiple protocols. - Cameron Neylon
That is technically appealing but I don't really see how it will in and off itself bring in enough users to make it work. Like most of these things the challenge lies in getting a community together that is big enough to make the content happen. - Cameron Neylon
Thanx for the feedback. I will point the author to your comments. - Egon Willighagen
Andrew Lang
Sophomore Student Wins Award Typically Given to Ph.D. Students - March 11, 2009 - http://www.oru.edu/news...
Rajarshi Guha
I #%##&# hate spreadsheets for data storage
I've been trying to get my head around Python the last few weeks but perhaps we need duck typing for data storage and processing. Looks like a spreadsheet, quacks like a spreadsheet, but is actually just a display layer on top of something nicer for more powerful manipulation - Cameron Neylon
Bottom line. Spread sheets are not going away - if you want to interact with the (vast) majority of scientists we need to find ways of translating and communicating with spreadsheets - Cameron Neylon
Well I contacted Rajarshi about this and he said it was a temporary moment of frustration brought on by extra spaces at then end of entries - we just have to make sure to check often after students make additions - Jean-Claude Bradley
Jean-Claude... tied up in grant applications now, but will soon make the RDF generation life... and make a web front end... that can also include life validation of the spreadsheet... so that students can have the additions automatically validated... - Egon Willighagen
Oh I appreciate the frustration - I am hacking through spreadsheets and all sorts of nonsense repetitive processing at the moment as well. Immensely frustrating. Just wanted to make the point that this is really important circle to square - don't really have any technical ideas on how at the moment though. But hopefully this process will actually provide some guidance as the conversion from spreadsheet to RDF to whatever else flows through - Cameron Neylon
Actually, if it was Excel I wouldn't mind so much. The problem is validation of input - unlike a RDBMS (or Excel) I can't add constraints for a column in Google spreadsheets which would simplify things a lot. Egon - how will RDF help here? - Rajarshi Guha
The RDF would help, because the OWL behind it could put restrictions on field content... but more practically, it's just a nice spinoff when creating the RDF... as I need to do some sanity checking at that stage anyway... - Egon Willighagen
Yes, those spaces :) - I Trim everything to remove them - actually I got the idea for trimming from Rajarshi's code - I think. - Andrew Lang
Yes, in the end that's what I do - but if the spreadsheet is the "container of record", it'd be nice to have validation/constraints in the sheet itself. - Rajarshi Guha
Egon - thanks it will be interesting to see if that can help - Jean-Claude Bradley
Rajarshi - is there an API to add to of edit GoogleSpreadsheets yet? - Jean-Claude Bradley
Egon Willighagen
Download and process the ONS Solubility RDF data in Bioclipse... - http://gist.github.com/68201
cool Egon! - Jean-Claude Bradley
It's just the first synthesis step in a multi-step natural product synthesis... but an important coupling indeed. - Egon Willighagen
Cameron Neylon
Carlos Torres
I have already contacted some of you through other means.
got your email Carlos - looks like a very interesting study - will you make your findings public? - Jean-Claude Bradley
Yes, I will publish an article in an academic journal and the ICMPIM 2009 Conference. I also plan to share my findings with the people engage in open notebook science. - Carlos Torres
Other ways to read this feed:Feed readerFacebook