use a novel device called probabilistic arithmetic automata - won't go into details
- Mikhail Spivakov
Need to compute the distribution of occurrences by chance. Not a straight forward task, recently proposed a new approach by building a probabilistic arithmetic automata.
- Roland Krause
The problem is that computing p-values is infeasible due to large number of motifs.
- Roland Krause
matches occur in clumps. use compound possion approximation (almost exact) to calculate exact distribution of clump sizes. approximate number of clumps by Poisson distribution
- Mikhail Spivakov
Use of a Compound Poisson Approximation on a set of clumps (sets of overlapping motifs)
- Roland Krause
Use of a suffix tree of the sequence, iterate over the motifs, use the lower bound for pruning, walk the tree and identify overrepresented motifs.
- Roland Krause
so re-evaluate the motifs producing a good p-value with iid on a Markovian text model
- Mikhail Spivakov
designing a good benchmark set is hard
- Marcel Martin
Future work could incorporate Markovian models directly or use phylogenetic information.
- Roland Krause
Q. Is the implementation available? A: Given in the paper.
- Roland Krause
question: is the tool available? yes, URL in the paper
- Marcel Martin
you have to take into account overlapping motifs for doing proper statistics
- Marcel Martin
Q. Are the data in Jasper or Transfac? A. Had an expert looking at it.
- Roland Krause
# Jasper and Transfac do not really cover Mycobacterium motifs
- Roland Krause
Q: Performance of the algorithm on short motifs. A. Length 10 is the upper bound for the algorithm which is quite dependent on the length.
- Roland Krause
Q: applying to protein models? A: problematic because alphabet is larger and indels would need to be modelled
- Marcel Martin
Q: how is the iid text model? how do you justify that the text fulfills the model? A: the iid model is estimated from the text. dependencies between characters are incorporated by using the Markovian model
- Marcel Martin
Q. (Marcel Schulz) Differences in Markov models of different orders. A. Shorter orders give spurious results.
- Roland Krause
Q: why only a part of the motif space? A: tried to come up with a plausible set that includes most motifs
- Marcel Martin