Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »

Marcel Martin › Comments

ISMB/ECCB
HL62: Richard Green - A Complete Neandertal Mitochondrial Genome Sequence Determined by High-Throughput Sequencing
Neandertals: closest extinct relative. existed 400000-30000 yrs ago - Marcel Martin
chimps are closest living relatives, but deviated longer ago - Marcel Martin
first DNA from extinct species was in 1985 from the Quagga - Marcel Martin
Neandertal mitochondrial genome, as it is present about 1000 copies per cell vs 2 copies of genomic DNA - sebi
mitochondrial genome is easier to recover since there are many copies per cell - Marcel Martin
Mitochondial genome is useful for tracking maternal lineages, and it accumulates mutations slowly -- ideal for building trees - sebi
seems like there was no interbreeding between Neandertals and modern humans - Marcel Martin
Deep sequencing with high-throughput next generation sequencing, used to be direct PCR - sebi
ancient DNA fragments are just 60nt in length - Marcel Martin
Roche/454 and Illumina sequencing was used, no need to fragment DNA (more fragmented than one would like anyway) - Marcel Martin
many C to T transitions, also G to A - Marcel Martin
COX2 protein has 5 differences between chimp and human, 4 of 5 differences happened in the last 600,000 yrs, so Neandertals also have 4 different AAs compared to Homo sapiens - sebi
Maybe fast evolving sites? Reverting to previous (=monkey), more advantageous AAs? - sebi
sequencing errors: 3% of all Cs are Ts, same for G->A. Reason: C is deaminated to U, then seen as T - Marcel Martin
higher probability for deamination at the end of fragments. Perhaps because cytosine deamination is 100x faster in single-stranded DNA and end of fragments are single-stranded - Marcel Martin
MIA: mapping iterative assembler. manuscript in preparation - Marcel Martin
Measuring protein evolution with a ratio of non-synonymous differences to synonymous differences dN/dS: indicative of a small Neandertal population size - sebi
(only to get the blog on top of the ISCB portal site; the figures messed up our layout) - Reinhard Schneider
ISMB/ECCB
HL48: Israel Steinfeld - Architecture of CpG methylation in the human genome
DNA methylation: modification of cytosines in CpG dinucleotides, maintained across cell divisions - Marcel Martin
CG dinucleotide content in HG: 1%, expected: 4.5% - Marcel Martin
CpG islands: regions on DNA that contain many CpGs. 28000 islands annotated in HG. almost all of them are near gene promoters - Marcel Martin
mDIP: methyl-DNA immunoprecipitation assay, similar to ChIP-chip. 244k DNA methylation array - Marcel Martin
array methylation score (IMS): average signal for all probes mapped to it. bimodal distribution. house keeping genes are methylated (ie, on one side of the distribution) - Marcel Martin
approx 15 samples (different tissues). almost all are not methylated (~70%) - Marcel Martin
Nature: Sp1 elements protect a CpG island from de novo methylation, Michael Brandeis et al, Nature 371, September 1994 - Marcel Martin
DRIM Discovering Rank Imbalanced Motifs: http://bioinfo.cs.technion.ac.il/drim - Marcel Martin
use machine learning to distinguish between methyl. and nonmethyl. islands. - Marcel Martin
UMR: undermethylated region (?) - Marcel Martin
designed a new tiling array that covers all predicted UMRs - Marcel Martin
conclusions: 4400 predicted regions were confirmed as UMRs. 923 of the UMRs are placed near known TSS. no one-to-one correspondence between CpG islands and nonmethylated regions. also: yes, there is tissue-specific methylation (didn't go into detail) - Marcel Martin
ISMB/ECCB
PT43: Tobias Marschall - Efficient Exact Motif Discovery
Unsupervised motif discovery on a string with no previous knowledege in an automated fashion. - Roland Krause
issues: How to measure over-representation and how to find them. - Roland Krause
184 publications in pubmed for "motif discovery algorithm" - Roland Krause
aim to establish an (almost) exact method based on a rigorous motif statistics - Mikhail Spivakov
given: query text, IUPAC motifs, random text model (background) - for now, iid - Mikhail Spivakov
Calculating a p-value of a given query text, a IUPAC motif and a random text model. - Roland Krause
want: a p-value for a motif - Mikhail Spivakov
use a novel device called probabilistic arithmetic automata - won't go into details - Mikhail Spivakov
Need to compute the distribution of occurrences by chance. Not a straight forward task, recently proposed a new approach by building a probabilistic arithmetic automata. - Roland Krause
an exact calculation - Mikhail Spivakov
The problem is that computing p-values is infeasible due to large number of motifs. - Roland Krause
matches occur in clumps. use compound possion approximation (almost exact) to calculate exact distribution of clump sizes. approximate number of clumps by Poisson distribution - Mikhail Spivakov
Use of a Compound Poisson Approximation on a set of clumps (sets of overlapping motifs) - Roland Krause
clump = overlapping occurences - Marcel Martin
The clump size can be used the probabilistic atutomata. - Roland Krause
nice: clumP SIze is abbreviated with an uppercase psi - Marcel Martin
How to bound the p-value to prune the search space? - Roland Krause
bound no. of occurences using the no. of clumps - Marcel Martin
p-value: the probability of observing >k occurences (when found k in the real data) - Mikhail Spivakov
Motifs with the same composition have the same expectation. - Roland Krause
iterate over all possible compositions, not over the motifs themselves to take advantage of the same expectation - Marcel Martin
but iid model for DNA isn't very appropriate - Mikhail Spivakov
Use of a suffix tree of the sequence, iterate over the motifs, use the lower bound for pruning, walk the tree and identify overrepresented motifs. - Roland Krause
so re-evaluate the motifs producing a good p-value with iid on a Markovian text model - Mikhail Spivakov
designing a good benchmark set is hard - Marcel Martin
other tools: Weeder, MEME - Marcel Martin
Benchmark sets are not easy, used a set by Sandve et al. http://www.pubmedcentral.nih.gov/article... - Roland Krause
Outperforms Weeder and MEME at the cost of higher running time of ~12 hours. - Roland Krause
algorithm is not as fast as the other tools. is easily parallelizable - Marcel Martin
with a 4.4 Mbp genome of M.tuberculosis, found motifs in ~250 CPU hrs in a parallelized setting - Mikhail Spivakov
best motif: AGACSCARAA (or sth like that), found in literature - Marcel Martin
computationally demanding, but possible with modern computers - Mikhail Spivakov
List of models, the first is described, others are under investigation. - Roland Krause
in the future: use modern hardware (eg GPUs) - Mikhail Spivakov
optimise wrt Markovian models directly (rather than iid) - Mikhail Spivakov
Future work could incorporate Markovian models directly or use phylogenetic information. - Roland Krause
Q. Is the implementation available? A: Given in the paper. - Roland Krause
question: is the tool available? yes, URL in the paper - Marcel Martin
you have to take into account overlapping motifs for doing proper statistics - Marcel Martin
Q. Are the data in Jasper or Transfac? A. Had an expert looking at it. - Roland Krause
# Jasper and Transfac do not really cover Mycobacterium motifs - Roland Krause
Q: Performance of the algorithm on short motifs. A. Length 10 is the upper bound for the algorithm which is quite dependent on the length. - Roland Krause
Q: applying to protein models? A: problematic because alphabet is larger and indels would need to be modelled - Marcel Martin
Q: how is the iid text model? how do you justify that the text fulfills the model? A: the iid model is estimated from the text. dependencies between characters are incorporated by using the Markovian model - Marcel Martin
Q. (Marcel Schulz) Differences in Markov models of different orders. A. Shorter orders give spurious results. - Roland Krause
Q: why only a part of the motif space? A: tried to come up with a plausible set that includes most motifs - Marcel Martin
ISMB/ECCB
HL10: Aaron Darling - Evolution of genome structure: what statistics can tell us about the biology of chromosomes
Got into this area studying Yersinia pestis - Allyson Lister
However, he thinks there is a lot of structure in the process, even if we get a pattern that suggests uniform breakage - Allyson Lister
Inversion events in bacteria are generally scattered across the chromosome. - Allyson Lister
Mauve is an excellent package for visualizing rearrangements - David Sexton
"Seevolution" visualizes inversions - Marcel Martin
Breakpoints near the origin of the replication are 5-6x more frequent than at the terminus of replication. - Allyson Lister
ISMB/ECCB
HL08: Nickolai Alexandrov - Insights into corn genes derived from large-scale cDNA sequencing
ab initio gene prediction with accuracy of only 45% for protein coding genes - Diego M. Riaño-Pachón
many more discoveries by mapping transcript to genomic sequences - Diego M. Riaño-Pachón
corn genes divided into two groups by GC content, unknown why - Marcel Martin
Corn genome several times bigger than rice genome - Diego M. Riaño-Pachón
2400 Mb - Marcel Martin
There are a couple of million ESTs available - Diego M. Riaño-Pachón
less than 50000 full length cDNA, around 10000 high quality - Diego M. Riaño-Pachón
thorough quality checking was done - Marcel Martin
longer 5'UTR than in Arab and rice - Diego M. Riaño-Pachón
estimation of no. of genes by same method that was used for human genome, result: approx. 50000 genes - Marcel Martin
Estimated 50000 genes in the corn genome, just a bit more than in Rice (~42000) - Diego M. Riaño-Pachón
intron length distributions - Marcel Martin
Bimodal distribution of CDS GC content, similar observations in all grasses - Diego M. Riaño-Pachón
high GC-content genes tend to be intron-less. HGT from a bacteria? - Diego M. Riaño-Pachón
genes with more introns have stronger expression levels. did not understand why - Diego M. Riaño-Pachón
Other ways to read this feed:Feed readerFacebook