Noel O'Boyle
It all adds up to a new descriptor -
cool. Is their a standard set of SMARTS, in the CDK for example? All I know is keir-hall SMARTS but they're not that great right? Is there a program that generates the counts against a set of SMARTS? - Andrew Lang
i think the "standard" set is problem dependent. The kier & Hall smarts derive from the electrotopological indices. But given a set of SMARTS it should be a trivial OB/CDK/... program to generate counts - Rajarshi Guha
Hmmm...not sure what you're asking about here. You can check out the data files for MR, LogP and TPSA in our repo at - Noel O'Boyle
Like LogP (cool by the way): I see a list of SMARTS. Are these all the ones they tried or just they ones they ended up with after feature selection? If it is just the ones they ended up with, is there a list of all SMARTS - as a starting place for modeling? - Andrew Lang
Ah be honest, I don't know how they do it. You'd have to read the papers. But I'd guess they have some a priori ideas about suitable groups, or maybe they slowly subdivide particular groups and see whether the additional parameters are worth the additional performance. clue really. - Noel O'Boyle
Andrew... ChemoJava is the GPL extension of the CDK... so, if you feel useful, we can use these SMARTS for descriptors there... - Egon Willighagen
The license for the OB data files is a grey area. But they don't include API calls so I think the copyright holders can assert any license, and certainly we would all favour liberal licenses. - Noel O'Boyle
Well, the header is there for a reason... until that header is removed by the original source, or get a public statement I can use it under any license/waiver, I can only assume I cannot... - Egon Willighagen
I appreciate that, but the headers were added to all files probably without any thought. I'll look into it... - Noel O'Boyle
(git blame)++ - Egon Willighagen
Interesting. I would have thought there would be a standard set of SMARTS that you would use as a starting point in all situations and then use feature selection (and maybe add a few custom ones) to get a good model. Would it be easy to collect the complete set of SMARTS used in the CDK by all descriptors? Or is there a better way to collect such a set? - Andrew Lang
I found the contribution files in JOELib: JOELib2-alpha-20090613\src\joelib2\data\plain I would like to build models using SMARTS but I need a SMARTS Descriptor Calculator type program. - Andrew Lang
The problem I have when reading fragment based papers is I'm not sure about counts. For example, if I have a paper with fragments OH and C(O)O would a compound with SMILES CC(O)OC have both of those fragments or just one? - Andrew Lang
And the sad thing is, most cheminformatics papers do not allow you to in fact reproduce it and figure it out for yourself :( - Egon Willighagen