ISMB/ECCB
PT21: Jose Caldas - Probabilistic retrieval and Visualization of Biologically Relevant Microarray Experiments
Trying to find a method to relate results in large array databases based on expression information rather than annotation. - Oliver Hofmann
Standard approaches like spearman correlation coefficients, but it would be interested to use sets of experiments as a query rather than a single array. - Oliver Hofmann
Query with a binary phenotype comparison and try to get back other, similar comparisons. Requires encoding of the phenotype comparison such as a vector of t-tests (0/1 vector for differentially expressed genes) - Oliver Hofmann
Using differential GSEA (shorter vector, more robust) - Oliver Hofmann
GSEA is gene set enrichment analysis and compares gene sets with -1,0,+1 vectors - sebi
Uses standard GSEA, number of genes in leading edge as vector value, ignoring the directionality - Oliver Hofmann
Latent Dirichlet Allocation for the retrieval algorithm - Oliver Hofmann
Latent Drichilet Allocation (LDA) from text analysis works intuitively: uses documents to build up topics -- in bag-of-words text data. - sebi
Bag-of-words from gene sets, represented as combinations of sets, relevant distance measures can be applied - sebi
750+ binary phenotype comparisons from 288 experiments, focus on 105 comparisons for this analysis. Gene sets assigned to topics are coherent across a wide range of biological processes - Oliver Hofmann
Cell cycle gene sets in "topic 2": BRCA genes show up in 4 comparisons all related to cancer, as expected. - sebi
Retrieval performance better than random, but seems to have low-to-moderate recall for reasonable precision numbers (?) - Oliver Hofmann