In 2005 we abandoned a monopolistic capillary electrophoresis; instead we have a couple and now 21 different technologies for sequencing. Resulted in a jump in rate of change of sequencing capacity
- Barb Bryant
He thinks that many of the sequencing companies will find a niche :)
- arne
Cost of personal genome: 2007: $57M; 2009 $1500, for 40-fold coverage.
- Barb Bryant
Sidetrack: One friend said when he started his PhD it took 6 month to sequence a bacteria and 6-60 month to analyse it. Not it takes 6 minuted to sequence it and still 6-60 month to analyze it.
- arne
limitation is several hundreds nm in scale on chip (positive charge molecules on hydrophobic background
- Dawei lin
7% human genome is missing so far because of technical challenges
- Dawei lin
trio genomics information (father, mother, child) is increasing important in genomics research
- Dawei lin
For $400M, Dupont made 27 changes to the 4.6 Mbp E. coli, to make a chemical.
- Barb Bryant
Another application: bio-petroleum from microbes.
- Barb Bryant
Identify enzymes that synthesize alkane. Many cyanobacteria made trace amounts; others made none. Did genome sequence "subtraction" to find which genes were in the former. Isolated & tested these genes. Overproduced them; it worked. Green chemistry.
- Barb Bryant
So: subtract my genome from Church's, then overproduce those genes --> TOTAL BRILLIANCE!
- Barb Bryant
Example of freeing up a codon by changing those codons to a different one./
- Barb Bryant
Is this not just the analysis. Not the sequence ? (or did I miss a link)
- arne
See the 'Datasets' header -> you can get 500k Affy data as well as exome
- Christiaan Klijn
Metabolic engineering example. Historically, you'd get obsessed with one step in the pathway and overproduce one enzyme. But then you'd get product inhibition, or the product might be toxic.
- Barb Bryant
Would be nice with a map to the reference genome as well, but guess that can be done
- arne
DNA Nanostructures: (DNA origami). Proposes a combination of DNA and proteins.
- arne
DNA nanostructures help solve structures of membrane proteins.
- Barb Bryant
First practical application: Made a long rod that was stiffer than other DNA. Used in NMR for membrane proteins (Cooooll idea but, it has been tried with proteins before)
- arne
caDNAno is a software tool that is free available
- Dawei lin
Shows picture of stages of cancer progression (ref Vogelstein, colon); poses the question of how metastasis occurs -- does this involve genetic or epigenetic changes?
- Barb Bryant
Tan Ince cultured two kinds of normal human mammary epithelial cells. He transformed them with oncogenes, resulting in different types of tumors.
- Barb Bryant
Concludes that the nature of the normal cell of origin is a strong determinant of the phenotype of the primary tumor, and whether it metastasizes. The playing field is tilted in the beginning.
- Barb Bryant
Self-renewing stem cells produce either more stem cells or transit amplifying cells which in turn lead to post-mitotic differentiated cells. Only the self-renewing stem cell could seed a new tumor.
- Barb Bryant
How do cancer cells acquire all of these capabilities (invasion, intravasastion, transport, metastasis...) Are there addiitonal mutations required? Is it epigenetic?
- Barb Bryant
epithelial-mesenchymal transition -- cells on the perimeter of the tumor are mesenchymal. This may be due to signals from the surrounding stroma.
- Barb Bryant
There are probably 1000 proteins that shift in EMT. Vaious transcription factors (TFs) induce EMTs.
- Barb Bryant
EMT program highly complex and occurs normally during development.
- Mickey Kosloff
from iPod
It seems likely that most of the invasion-metastasis program can happen without need for additional mutations; rather use signaling from microenvironment.
- Barb Bryant
P. Gupta transformed human primary melanocytes (pigmentation in the skin) with a cocktail of oncogenes. Found that in contrast to transformed epithelial cells, there was much higher likelihood of metastasis. Again, cell of origin is important in future behavior.
- Barb Bryant
One TF, Slug, was found to enable melanoma metastasis. (Even though the primary tumors grew a little faster.)
- Barb Bryant
Another TF, FOXC2, when expressed in epithelial cells induces migration and invasion. A subset of breast cancers have high levels of nuclear FOXC2, and these are more aggressive breast cancers.
- Barb Bryant
Speculates that different networks of EMT-inducing factors might program metastasis in different cell types./
- Barb Bryant
Stem cells identified by high CD44 and low CD24. (CD's are markers on cell surface which can be assayed fairly easily.)
- Barb Bryant
There are various ways to make cells acquire stem cell characteristics.
- Barb Bryant
Mentions Kornelia Polyak. There are stem-like cells in primary human breast samples. The stem cell program in normal human mammary gland is coopted by cancer cells.
- Barb Bryant
More proof that EMT creates stem cells.
- Barb Bryant
Most current chemotherapies preferentially kill non-cancer-stem-cells. The remaining stem cells can repopulate the tumor and are often more resistant to therapies.
- Barb Bryant
Gupta & Onder tested CSCs and non_CSCs with a bunch of drugs. There are some CSC-targeted agents (Salinomycin, Abamectin). Of 16,000 compounds only about a dozen preferentially killed CSCs as opposed to non_CSCs. Many were the other way round.
- Barb Bryant
This probably won't be the "answer". Christine Chaffer noticed that there were some floating cells in 2D cultured human mammary epithelial cells. She grew these up; these look more like CSCs.
- Barb Bryant
Interestingly, she found that non-CSCs could generate CSCs.
- Barb Bryant
Hm, isn't this kind of pouring cold water on the excitement about CSCs as drug targets? Or maybe you have to target both CSCs and non-CSCs simultaneously.
- Barb Bryant
Q: cancer biologists like to study druggable genome. But transcription factors seem most important. A: expression of TFs is controlled by cytoplasmic factors. Might want to go after those. Drugging the TF itself might be hard, but the signaling pathways might be more druggable.
- Barb Bryant
Q: has it been shown that change in the two forms of cadherins match the change in CD expression, and are these correlated with morphology? A: I showed that: CD44 high cells shut down E-cadherin; they expression vimentin, and other mesenchymal markers. I don't know whether CD44 is useful for non-mammary epithelial tissues.
- Barb Bryant
Q: So do normal non-SCs generate SCs? A: Yes. Same differences as in cancer.
- Barb Bryant
Spontaneous de-differentiation into SCs. Interesting phenomenon.
- Steve Chervitz Trutane
motivation is to understand genetic basis of human diseases
- Dawei lin
Genetic basis of human diseases - important disease mechanisms and bio pathways remain unidentified
- Venkata P. Satagopam
gap in knowledge of human disease biology contribute to high failure rates in drug development
- Dawei lin
Why understanding genetic mechanisms ? (1) Important mechanism remain unidentified (ii) Gaps in knowledge causes failure rate in drug development
- arne
It will be a long way to know if the two motivating hypotheses are true
- Dawei lin
one of the most research on T2D. It scaned 100k people for 10 yrs
- Dawei lin
10 years later 50% progressed to have the disease
- Dawei lin
10years of diabetic research - the out come is - 50% of people with good lifestyle improved
- Venkata P. Satagopam
lifestyle has a bigger impact than Metformin
- Dawei lin
Diabetes study with 10-year follow-up of diabetes incidence and weight loss, "T2D". Randomized into treatments: lifestyle, metformin, placebo. Best drug makes relatively little difference in incidence; lifestyle intervention is better than drug but still doesn't help a whole lot.
- Barb Bryant
best prevention was extensive lifestyle changes (50% -> 40% incidence)
- Mickey Kosloff
Diabetes is not only a matter of life style
- arne
success rate in current pharma industry is <5% of molecules entering the clinical trails
- Venkata P. Satagopam
key attributes of genetic mapping: (1) unbiased by prior assumptions about pathways (2) saturation mutagenesis reveal pathways
- Dawei lin
many mutants -> reveals coherence of pathways
- Ted Laderas
These days we have other methods that are unbiased like expression profiling, but genetic mapping has some unique characteristics relative to these (he’ll explain in a minute).
- Barb Bryant
Drosophola's mutations looked initially random, years they almost all related to pathways.
- Dawei lin
bottleneck is functional determination - biochemical approaches
- Ted Laderas
A lot of current knowledge can track back to genetic mapping
- Dawei lin
A slide based on Galzier et al, Science 2002
- Dawei lin
genetic mapping of human single gene disorders ...over 15 years Botstein paper in 1980, first genetic map in 1985 ....
- Venkata P. Satagopam
It took 10 year to find maker for Huntington disease
- Dawei lin
Once you find a linked region from genetic mapping, it still takes a long time to find the specific gene responsible.
- Barb Bryant
in the 1990's the idea was that common diseases were caused by rare mutations with large effects
- arne
"Chromosome shlepping" - Eic Lander's term for the identification of a very gene in some genomic region.
- Roland Krause
It is robust to find mendelian disease but to not common diseases
- Dawei lin
another approach: population genetics - QTL approach
- Ted Laderas
phenotypic variation is often continuous and may involve variation in many genes
- Dawei lin
Galton invented regression analysis to analyze the measuring of phenotypic data (heights of parents and offspring).
- Roland Krause
The biometric unit --- almost nothing was Mendelian
- arne
Most traits are continuously variable
- Ted Laderas
Francis Galton was a cousin of Darwin. Darwin didn’t explain the source of variation. Galton focused on this; he measured the heights of parents and their offspring, and found a relationship. He invented regression analysis to draw the line. The slope of the line is related to the inheritability of the disease.
- Barb Bryant
It was studied by the cousin of Darwin, Francis Galton (1885)
- Dawei lin
phenotypic variation is often continuous ... some history ... Francis Galton (1885), Ronald Fisher (1918), Hermann Muller (1920)
- Venkata P. Satagopam
This gave rise to the biometric movement – measure every living thing. Traits were related to genetic relatedness; and it wasn’t Mendelian. This led to the biometric-Mendelian debate.
- Barb Bryant
Ronald Fisher, was actually a geneticist, who also invented p-value and Fisher exact test
- Dawei lin
Ronald Fisher (the one with the exact test) was also a geneticist.
- Roland Krause
Solved by assuming that phenotype often is an effect of several Mendelian genes.
- arne
Fisher: individual genes are mendelian, effects of genes additive
- Ted Laderas
Hermann Muller 1920 (Nobel Prize for X-ray induced mutations). PhD thesis not Mendelian trait, but truncate wing. Wasn’t Mendelian. Did genetic mapping.
- Barb Bryant
Hermann Muller decided to use broken wing of fruit fly to study non-Mendelian diseases
- Dawei lin
Muller 1920 paper: 4 chromosomes in fly – 3 contain genes that influence the trait truncate wing. Muller wrote about implications for human traits, like psychological traits. Said that traits were going to be too complicated. Said you could figure out by looking at population, but not looking at Mendelian inheritance in families.
- Barb Bryant
Muller 1920 suggested that it needed to do study on a population.
- Dawei lin
mendelian fallacy - sub-populations are easily divisible in terms of risk
- Ted Laderas
Prediction will only be useful if there is an intervention that you would not use without the prediction. Otherwise, you should use the intervention anyway.
- Roland Krause
Huntington will not be a representative example - for most diseases/people identified risk will be <<100% even with full genetic information
- Mickey Kosloff
Cautionary tale - PSA prediction results in over-treatment, hasn't been shown that people live longer because of test
- Mickey Kosloff
Very cautious about PSA - no improvements on the mortality but many operations performed.
- Roland Krause
genetics offers a path to discover the underlying biology of human diseases ; the great value will drive from pathophysiology and treatment
- Venkata P. Satagopam
When grouping mutations into pathways up to 85% of GBM have a muation in the most important pathways, while individual genes are down to a few %
- arne
Each oncogene may have relatively low frequency across patients; but when you group genes across pathways, a pathway may explain a large fraction of patients with a given type of cancer.
- Barb Bryant
can see a change in pathway activation between primary tumor and mets
- Mickey Kosloff
Dominant alterations changes between cancer types and states.
- Roland Krause
GBM: copy number is rare (and noisier) Ovarian: more regular and higher
- arne
profiles of copy numbre variations differ between types of cancers
- Mickey Kosloff
Metastatic tumor samples have more copy number changes than primary tumors. Not surprising. But maybe primary samples with more copy number changes than others are more likely to metastasize? Generally, better outcome with fewer somatic copy number changes.
- Barb Bryant
BRCA1 and BRCA2 mutations convey germline inherited cancer risk
- Barb Bryant
These genes act in the homologous repair pathway. Half of all patients have mutations in some homologous repair pathway gene.
- Barb Bryant
and more generally, homologous repair genes are altered in > 50% of ovarian cancer
- Mickey Kosloff
Tumor suppressor genes can be inactivated in various ways: germline mutation, somatic mutation, epigenetic silencing, etc.
- Barb Bryant
There are drugs under development that might work particularly well in patients with defects in this particular pathway.
- Barb Bryant
Cancer genomics portal: www.cbio.mskcc.org/cancergenomics
- Barb Bryant
Instead of going through all the models that are possible, you derive statistical properties across a set of good models for each of the Wij weights in the model.
- Barb Bryant
This is sort of like partition functions in statistical physics
- Barb Bryant
after step 1 - generation of probability distributions then step 2- decimation
- Shannon McWeeney
So you have a probability distribution for each Wij, which represents the interaction between element i and element j. I'm not really getting how you "update" these probability distributions in the iterative steps. I do understand that at the end you take the most "certain" (narrowest) distribution and fix its value (some Wij) at the most probable value, then update all the other Wij's given this fixation. And so on. To get your final model in a sort of greedy fashion.
- Barb Bryant
And by the way, the underlying model is a simple differential equation sort of thing: change of one variable xi is a sigmoidal function of weighted (Wij) sum of all variables xj, less a decay term.
- Barb Bryant
Question: Interacting network tend to be modular, with strongly-interacting subnetworks that interact weakly with each other. ...
- Barb Bryant
Chris: Is the modular approach really useful in confronting the data? [Is that what he said?]
- Barb Bryant
Question: can you get at causal relationships?
- Barb Bryant
Chris: yes - if the network model allows you to predict correctly the result of a particular perturbation applied to a particular node, then you can simulate using that model.
- Barb Bryant
Question: with a big network, how many experiments will you need to model?
- Barb Bryant
Chris: Good question. Could use an entropy measure. Help us figure this out. Help us design the experiments. It's important because of the costs of experiment. This is going to be broadly applicable in cell biology.
- Barb Bryant
bb - he said one should see if approach is useful by confronting with real data
- Shannon McWeeney
from BuddyFeed
Chris gets at the difference between a model that tells a story and a model that is truly predictive.
- Barb Bryant
Question: yes, but, what are the semantics of the graph? What kinds of interaction? Answer: The semantics are in the mathematics of your model.
- Barb Bryant
Question: mean field approach is interesting. Compared to Monte Carlo approach, you are assuming some decoupling. Loss of posterior coupling between weights - is that an issue?
- Barb Bryant
Chris: If you look at a coupled system overall, the extent to which the algorithms work depends on correlations within the system. Long-range (in terms of network distance) correlations are problematic. There are some clever approaches to handle some of this. Mentions non-ergotic space; deal with parts of space separately or iteratively.
- Barb Bryant
Protein Folding Requires Crowd Control in a Simulated Cell Benjamin R. Jefferys⁎, Lawrence A. Kelley and Michael J. E. Sternberg J. Mol. Biol. (2010) 397, 1329–1338
- arne
HL23: Menachem Fromer - A probabilistic approach to the design of interfaces in proteins with multiple partners: Tradeoff between stability and promiscuity
Protein structure search - a computationally hard problem
- Mickey Kosloff
Looking for similar structures that are not easily found using SCOP and CATH.
- Mickey Kosloff
1st line tool - structural alignments. However, these are expensive computationally, not feasable for full PDB. Alternative - filter methods that can look at full-size PDB.
- Mickey Kosloff
disclaimer of un-objectivity - I'm a co-author on related paper with Rachel (Kosloff & Kolodny, Proteins 2008) that she just mentioned as motivation to look at the whole PDB rather than a non-redundant subset.
- Mickey Kosloff
FragBag = new filter method, based on library of fragments
- Mickey Kosloff
order of fragments in structure is not considered (similar to how Google indexes web pages). You lose information but gain speed dramatically. Worked for Google, so might work for structures.
- Mickey Kosloff
Uses SAS score as similarity measure = RMSD*100/length
- Mickey Kosloff
Uses SAS to find best of six different structure alignment methods as gold standard.
- Mickey Kosloff
checks if FragBag finds nearest neighbors found by gold standard. analysis with ROC curves.
- Mickey Kosloff
rank of methods by area under ROC curve: sequence alignment is worse (as expected) structural alignments range from > 0.7 to 0.9. FragBag does as well as CE etc., even though it's a lot less expensive in computational resources.
- Mickey Kosloff
FragBag also finds pairs of structures with same CATH classification
- Mickey Kosloff
Additional features: enables to query PDB with combination of non-continuous sub-structures
- Mickey Kosloff
Also enables to visualize protein structure space (shows rotating 3D projection of 30,000 structures) - you get the (known) separation of structures classes (alpha+beta, alpha, beta, alpha/beta)
- Mickey Kosloff
superimposes SCOP folds on this picture, visualizes co-localization of these folds.
- Mickey Kosloff
answer to Q: Can search entire PDB on laptop very quickly.
- Mickey Kosloff
LAST, like BLAST but faster. Handles repetetive regions and A+T bias much better than blast Blast etc used fixed seed length (Last uses a adaptive length)
- arne
Not just interesting, but most likely great. Svante is a fantastic speaker
- arne
If you’re interested in human history, the genome is a great source of information. To reconstruct history, we compare sequences of people (and other species) living today. We use models of how DNA changes over time to understand the differences that exist today. This is an indirect way to study history, because we are reconstructing from the present what we think has happened in the past.
- Barb Bryant
Human FoxP2 in mouse: The mouse can not speak ! Large scale phenotype study (323 phenotypic traits). -> More cautious in a novel area (stays close to the wall). No difference after 3 minutes. Second phenotype: Altered vocalization !!!
- arne
Truncated proteins might interfere with physiological function (dominant negative). The cell removes such transcripts through nonsense-mediated decay (NMD).
- Roland Krause
one family with many domain architectures, all sharing a kinase domain
- Ruchira S. Datta
Multidomain sequences evolve via gene duplication and domain shuffling.
- Gabriele Sales
multidomain sequences evolve via gene duplication and domain shuffling
- Ruchira S. Datta
The same domain may appear in multiple, unrelated proteins.
- Gabriele Sales
A definition will be presented that is in line with Fitch' proposition of homology.
- Roland Krause
can have case where genes share common ancestry, but domain architecture has changed
- Ruchira S. Datta
Difference between sequences related by vertical descent and related by domain insertion.
- Roland Krause
Two kinds of relations among genomes: relation by vertical descent or relation by domain insertion.
- Gabriele Sales
similarly can have the converse: through domain shuffling, genes that are not homologous can come to have the same domain architecture
- Ruchira S. Datta
It is possible to distinguish such two cases?
- Gabriele Sales
Given two sequences with similarity: Can one distinguish the two szenarios?
- Roland Krause
orthologs are a subset of homologs, and homologs intersect with the set of significantly similar sequences
- Ruchira S. Datta
A Venn diagram, including orthologs, homologs, distant homologs and significantly similar sequences with modification.
- Roland Krause
also have distant homologs which don't appear to be significantly similar
- Ruchira S. Datta
inferences that can be drawn from vertical descent (similar molecular functions) and domain insertion (bindng partners) are different
- Allyson Lister
Biological interpretation of vertical descent: molecular function; regulation; comparative mapping; processes of gene duplication and genome rearrangement.
- Gabriele Sales
Interpretations of domain insertion: protein specialization; ligand specificity; localization; process of domain shuffling.
- Gabriele Sales
vertical descent implies similar: molecular function, regulation, comparative mapping, and is useful for processes of duplication and genome rearrangement
- Ruchira S. Datta
domain insertion leads to relationships of protein specialization, ligand binding, and cellular localization
- Ruchira S. Datta
In animals and plants multidomain sequences become more important than in bacteria.
- Gabriele Sales
The more higher eukaryotes will be sequenced, the more the problem needs to be addressed.
- Roland Krause
therefore, among similar sequences, want to distinguish which are related by vertical descent, and which by domain insertion
- Ruchira S. Datta
people look at sequence similarity E-value, and at alignment coverage
- Ruchira S. Datta
Alignment length is typically used to distinguish domain re-arrangements. Needs a decent mode model.
- Roland Krause
Good example that sequence similarity or e-values are not capable of distinguishing the two caes.
- Roland Krause
The goal of this method is to identify sequence pairs related by VD and DI,and should work on a broad range of families
- Allyson Lister
And needs to be computationally feasible.
- Roland Krause
To test, they looked at 20 well-studied families related by vertical descent.
- Allyson Lister
They had a much larger set of negative examples (40,000).
- Allyson Lister
PSI-BLAST performs worse then BLAST for sequences with variable architecture multi-domain proteins(!) as it pulls in non-homologous parts of sequences.
- Roland Krause
All methods do well with conserved multidomain proteins. They were more challenged by Variable multidomain, where Psi-BLAST doesn't do as well as BLAST. Both methods are extremely challenged when all the sequences were put into the analysis together.
- Allyson Lister
Pairwise comparisons are not sufficient. Try networks instead.
- Gabriele Sales
Pairwise sequences might not be enough, use the structure of the similarity networks.
- Roland Krause
Two sequences are compared in the context of their respective neighborhoods (i.e. other sequences that show similarity).
- Gabriele Sales
Domain architecture is implicitly present in the network.
- Allyson Lister
Open question. The model is explicitly based on insertion and deletion. What about de novo sequence formation?
- Gabriele Sales
Comment by Kevin Karplus: Use log scale for false positives in the ROC plots.
- Roland Krause
mRNA level of regulator an (imprecise) indicator of regulator activity
- Oliver Hofmann
So you can model RNA level of a gene as a variable in a model
- Allyson Lister
The expression of a target genes can often be predicted by the expression of its regulators
- Diego M. Riaño-Pachón
Regulators: transcription factors, signal transduction, chromatin remodeling . . .
- Diego M. Riaño-Pachón
Broad view of a regulator gene: TFs, signal transduction proteins, RNA processing factors, anything that *might* play a direct or indirect role in gene regulation
- Oliver Hofmann
Second assumption: co-regulated genes have similar regulatory mechanisms, group genes into modules and predict expression profile for the entire module
- Oliver Hofmann
Modules provide increased statistical power.
- Gabriele Sales
A critical aspect is the structure of the regulatory program.
- Allyson Lister
(Segal 2003 Nat Genetics): notion of a regression tree for regulatory programs
- Oliver Hofmann
has disadvantages such as poor regulator selection lower in the tree, misses lot of regulators due to lack of statistical power
- Oliver Hofmann
another disadvantage: arbitrary choice among correlated regulators
- Gabriele Sales
Instead: Lasso (L1) regression approach
- Oliver Hofmann
Simple linear regression doesn't work well because the minimization procedure assigns a nonzero weight to each regulator.
- Gabriele Sales
This can be solved by using regularization: an extra termn penalizing nonzero weights.
- Gabriele Sales
The Lasso regression pushed the effect of many regulators towards zero, likely keeping only the significant regulators
- Diego M. Riaño-Pachón
Elastic net regression to avoid arbitrary regulator / feature choice
- Oliver Hofmann
Cluster genes to modules, learn regulatory program for module, repeat for all modules, iterate after re-assignment of genes to modules based on how well a program predicts the expression of a gene in the module
- Oliver Hofmann
initial assignment of genes into modules is often poor at the beginning, must be reiterated down in the analysis
- Diego M. Riaño-Pachón
It's a special kind of Bayesian network
- Allyson Lister
Test using eQTL data set (Brem, 2002 Science), two different yeast strains
- Oliver Hofmann
eQTL dataset (Brem at al, 2000) from yeast. Microarray measurements of 112 individuals over 6000 genes.
- Gabriele Sales
Adapt the regulatory network approach by including the genotype / markers. How do markers affect the expression level of a given module?
- Oliver Hofmann
She then show us one of the modules that comes out, the telomere module (40/42 of genes are in the telomere).
- Allyson Lister
Example: the telomere module. Enriched for telomere maintenance and helicase activity.
- Gabriele Sales
Controlled by a region on chr XII (which includes a regulator gene)
- Oliver Hofmann
23 modules out of 165 have 'chromosomal features', 16 chromatin regulators
- Oliver Hofmann
Puf3 module: 147/153 genes are pulldown targets of mRNA binding protein Putf3.
- Gabriele Sales
But Puf3 is not the most significant regulator of the module.
- Gabriele Sales
P-bodies are places where mRNA are stored temporarily and while they are there they are transcriptionally repressed.
- Allyson Lister
What regulates sequence-specific localization of mRNAs to P-bodies?
- Gabriele Sales
They did a microscopy experiment to test this, which demonstrates that PUF3 is specifically localized to p-bodies.
- Allyson Lister
Microscopy experiment: fluorescence shows that puf3 localizes in P-bodies.
- Gabriele Sales
So what regulates the regulators of the P-bodies?
- Oliver Hofmann
What regulates the p-bodies? What is one level higher up in the hierarchy? A locus on chromosome 14, but this is a large region and covers 30 genes and 300 polymorphisms.
- Allyson Lister
Therefore she came up with the idea of regulatory potential. The motivation is that not all SNPs are equally likely to be causal
- Allyson Lister
how to rank the polymorphisms (SNPs) by their regulatory potential?
- Diego M. Riaño-Pachón
Each regulator has its own prior dictated by the regulatory features (inside a gene? protein coding region? strong conservation? TF binds to module gene?)
- Gabriele Sales
Start by learning regulatory programs as described earlier. Second, learn regulatory weights (betas). Then compute the regulatory potential of each SNP in the genome. Then interate
- Allyson Lister
Regulatory potentials do not change the selection of strong regulators, but helps to disambiguate between multiple weak regulators
- Oliver Hofmann
Regulatory potentials do not change selection of strong regulators. They only help disambiguate between weak ones.
- Gabriele Sales
Strong regulators teach us what to look for in the putative weak regulators
- Oliver Hofmann
Some important features learned by the algorithm: conservation, cis-regulation.
- Gabriele Sales
Important factors: cis-relationship, conservation, stop-codons, (combination of gene functions such as RNA modification, DNA binding, ...)
- Oliver Hofmann
Statistical evaluation: uses PGV: % of genetic variation explained by the predicted regulatory program for each gene. It's a form of test data validation.
- Allyson Lister
These factors are *learned* from the model, not set
- Oliver Hofmann
Explained about 50% of the variation in about 50% of the genes
- Allyson Lister
Regulatory potentials are specific to organisms or even datasets.
- Gabriele Sales
Understanding the process underlying differentiation with the ImmGen consortium
- Oliver Hofmann
Can identify shared regulatory programs for all 60 cell types, but lumps together very different cell types
- Oliver Hofmann
By using G-Regulators, you allow programs to depend on genetic variation, but you don't have G-regulators here, you have cell types
- Allyson Lister
One network for each cell type overfits the data, but can bias towards shared regulation. Use the _ontogeny_ to guide conserved regulation.
- Oliver Hofmann
Two extremes: lumping every cell type together you "average" effects; working on a single cell type you overfit.
- Gabriele Sales
use ontogeny to guide conserved regulation. You do this by looking at differences, and penalize every place you change the regulatory power - penalize changes/divergences in the regulatory
- Allyson Lister
Expression changes and underlying phenotype. What are the mechanism underlying them?
- Allyson Lister
Example: transformation of FL to DLBCL occurs in 40-60% of patients, and diverse mechanisms seem to drive transformation.
- Allyson Lister
Represent each module as a metagene expression profile, and use machine learning to id modules distinguishing FL-t (pre-transformation) from transformed DLBCL
- Allyson Lister
Represent each module as a metagene, use ML technique to learn classifiers to distinguish FL-t (pre-transformed) from DLBCL
- Oliver Hofmann
Can you use a module-based approach to understand metabolic syndrome?
- Allyson Lister
pheontype network, where the nodes are modules and the edges are learned regulatory programs.
- Allyson Lister
An important module in this case is the biosynthesis liver module: the genes are almost disjoint but are all in the same module, therefore you would have missed it without these modules.
- Allyson Lister