Aaron Sterling
Chemistry question: A research group tested 9712 compounds from the LIGAND database. They measured the "treewidth" of each compound. (Treewidth is a measure on graphs used in the theory of algorithms.) All but one of the compounds tested had treewidth <= 3. This fact might be useful for algorithm design. Is the LIGAND database representative?
is this the KEGG LIGAND db? And if so, is this the compound/glycan etc subsets or all of them? And when you say 'representative', what would it be representative of? If it were the glycan subset it's definitely not representative fo chemical space in general. If compound subset, I think it's pretty diverse, but again, I think 'representative' needs a context. - Rajarshi Guha
Has the treewidth a relation to the graph? As described in Wikipedia? - Egon Willighagen
@Rajarshi: Yes, it is KEGG LIGAND.Hm, the authors cite the same information in three separate papers, but they do not appear to state their choosing procedure. @Egon: Yes, same as treewidth/tree decomposition in Wikipedia. Link: http://citeseerx.ist.psu.edu/viewdoc... - Aaron Sterling
From the paper "... in the field of chemoinformatics, methods using graph algorithms have not entered the mainstream ...". O RLY? - Rajarshi Guha
Yeah, wondering what mainstream they are referring too... fair, I have not seen a lot of chemical graphs on mobile phones, and this is a Japanese paper after all... seriously, it sounds like metabolomics dudes to me. Japan is good at that. - Egon Willighagen
Aaron, please educate me. Why would the threewidth potentially be useful for algorithm design? Unlike my other replies about the paper, this is a serious question. - Egon Willighagen
Some NP-complete problems become tractable when a graph has bounded treewidth. Graph Isomorphism (though probably not NP-complete) is tractable in that case. The Subgraph Isomorphism Problem (which seems more significant for chemoinformatics) is still NP-complete though. So it's not a magic bullet. *If* it's a useful measure, and *if* there's a tractable algorithmic problem that has chemical application, *then* it might be significant. I don't know about either if. - Aaron Sterling
OK, that makes sense. There CDK has one graph isomorphism algorithm implemented, the UniversalIsomorphismTester, which we know works inefficient for subgraph isomorphism... despite the first probably no being NP-complete, while the latter is NP-complete :) - Egon Willighagen
Hmm. So it seems like treewidth could be a filter for molecule graphs that could be handled in a reasonable time? I guess it could be pre-computed for a set of molecules and stored alongside them in a db. - gilleain
Yes, but how would that be different from other 'filters' as we use in fingerprints, some of which can have more than a few values? - Egon Willighagen
Beside the correlation, we could also look at information content (entropy) and see what it adds to existing methods (or fingerprints)... - Egon Willighagen
Can you add some references? How does a treewidth compare to a hashing of fingerprints, aka optimizing them for preventing ties? - joergkurtwegner