Mitochondial genome is useful for tracking maternal lineages, and it accumulates mutations slowly -- ideal for building trees
- sebi
seems like there was no interbreeding between Neandertals and modern humans
- Marcel Martin
Deep sequencing with high-throughput next generation sequencing, used to be direct PCR
- sebi
ancient DNA fragments are just 60nt in length
- Marcel Martin
Roche/454 and Illumina sequencing was used, no need to fragment DNA (more fragmented than one would like anyway)
- Marcel Martin
many C to T transitions, also G to A
- Marcel Martin
COX2 protein has 5 differences between chimp and human, 4 of 5 differences happened in the last 600,000 yrs, so Neandertals also have 4 different AAs compared to Homo sapiens
- sebi
Maybe fast evolving sites? Reverting to previous (=monkey), more advantageous AAs?
- sebi
sequencing errors: 3% of all Cs are Ts, same for G->A. Reason: C is deaminated to U, then seen as T
- Marcel Martin
higher probability for deamination at the end of fragments. Perhaps because cytosine deamination is 100x faster in single-stranded DNA and end of fragments are single-stranded
- Marcel Martin
MIA: mapping iterative assembler. manuscript in preparation
- Marcel Martin
Measuring protein evolution with a ratio of non-synonymous differences to synonymous differences dN/dS: indicative of a small Neandertal population size
- sebi
(only to get the blog on top of the ISCB portal site; the figures messed up our layout)
- Reinhard Schneider
CG dinucleotide content in HG: 1%, expected: 4.5%
- Marcel Martin
CpG islands: regions on DNA that contain many CpGs. 28000 islands annotated in HG. almost all of them are near gene promoters
- Marcel Martin
mDIP: methyl-DNA immunoprecipitation assay, similar to ChIP-chip. 244k DNA methylation array
- Marcel Martin
array methylation score (IMS): average signal for all probes mapped to it. bimodal distribution. house keeping genes are methylated (ie, on one side of the distribution)
- Marcel Martin
approx 15 samples (different tissues). almost all are not methylated (~70%)
- Marcel Martin
Nature: Sp1 elements protect a CpG island from de novo methylation, Michael Brandeis et al, Nature 371, September 1994
- Marcel Martin
designed a new tiling array that covers all predicted UMRs
- Marcel Martin
conclusions: 4400 predicted regions were confirmed as UMRs. 923 of the UMRs are placed near known TSS. no one-to-one correspondence between CpG islands and nonmethylated regions. also: yes, there is tissue-specific methylation (didn't go into detail)
- Marcel Martin
use a novel device called probabilistic arithmetic automata - won't go into details
- Mikhail Spivakov
Need to compute the distribution of occurrences by chance. Not a straight forward task, recently proposed a new approach by building a probabilistic arithmetic automata.
- Roland Krause
The problem is that computing p-values is infeasible due to large number of motifs.
- Roland Krause
matches occur in clumps. use compound possion approximation (almost exact) to calculate exact distribution of clump sizes. approximate number of clumps by Poisson distribution
- Mikhail Spivakov
Use of a Compound Poisson Approximation on a set of clumps (sets of overlapping motifs)
- Roland Krause
Use of a suffix tree of the sequence, iterate over the motifs, use the lower bound for pruning, walk the tree and identify overrepresented motifs.
- Roland Krause
so re-evaluate the motifs producing a good p-value with iid on a Markovian text model
- Mikhail Spivakov
designing a good benchmark set is hard
- Marcel Martin
Future work could incorporate Markovian models directly or use phylogenetic information.
- Roland Krause
Q. Is the implementation available? A: Given in the paper.
- Roland Krause
question: is the tool available? yes, URL in the paper
- Marcel Martin
you have to take into account overlapping motifs for doing proper statistics
- Marcel Martin
Q. Are the data in Jasper or Transfac? A. Had an expert looking at it.
- Roland Krause
# Jasper and Transfac do not really cover Mycobacterium motifs
- Roland Krause
Q: Performance of the algorithm on short motifs. A. Length 10 is the upper bound for the algorithm which is quite dependent on the length.
- Roland Krause
Q: applying to protein models? A: problematic because alphabet is larger and indels would need to be modelled
- Marcel Martin
Q: how is the iid text model? how do you justify that the text fulfills the model? A: the iid model is estimated from the text. dependencies between characters are incorporated by using the Markovian model
- Marcel Martin
Q. (Marcel Schulz) Differences in Markov models of different orders. A. Shorter orders give spurious results.
- Roland Krause
Q: why only a part of the motif space? A: tried to come up with a plausible set that includes most motifs
- Marcel Martin