Trying to find a method to relate results in large array databases based on expression information rather than annotation.
- Oliver Hofmann
Standard approaches like spearman correlation coefficients, but it would be interested to use sets of experiments as a query rather than a single array.
- Oliver Hofmann
Query with a binary phenotype comparison and try to get back other, similar comparisons. Requires encoding of the phenotype comparison such as a vector of t-tests (0/1 vector for differentially expressed genes)
- Oliver Hofmann
Using differential GSEA (shorter vector, more robust)
- Oliver Hofmann
GSEA is gene set enrichment analysis and compares gene sets with -1,0,+1 vectors
- sebi
Uses standard GSEA, number of genes in leading edge as vector value, ignoring the directionality
- Oliver Hofmann
Latent Dirichlet Allocation for the retrieval algorithm
- Oliver Hofmann
Latent Drichilet Allocation (LDA) from text analysis works intuitively: uses documents to build up topics -- in bag-of-words text data.
- sebi
Bag-of-words from gene sets, represented as combinations of sets, relevant distance measures can be applied
- sebi
750+ binary phenotype comparisons from 288 experiments, focus on 105 comparisons for this analysis. Gene sets assigned to topics are coherent across a wide range of biological processes
- Oliver Hofmann
Cell cycle gene sets in "topic 2": BRCA genes show up in 4 comparisons all related to cancer, as expected.
- sebi
Retrieval performance better than random, but seems to have low-to-moderate recall for reasonable precision numbers (?)
- Oliver Hofmann