Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Ruchira S. Datta
Alex Ensminger: The adaptive narrative of isogenic confinement #LAMG10
Legionella pneumophila causes Legionnaires' disease: severe pneumonia, 8-18,000 hospitalizations per year in the US - Ruchira S. Datta
intracellular pathogen of mammalian macrophages and amoebae - Ruchira S. Datta
individual gene knockouts often don't give phenotypes, difficult to work with - Ruchira S. Datta
reservoir: lives intracellularly in amoeba. contaminates water, enters human host. no host-host transmission, so this is an evolutionary dead end for the Legionella - Ruchira S. Datta
passed Legionella through a uniform mammalian host cell (macrophage) population for several hundred generations - Ruchira S. Datta
see: what mutations adapt to macrophages? verify whether Legionella is an "accidental" pathogen--not particularly well-adapted to infect people - Ruchira S. Datta
start w/ streptomycin resistant strain derived from clinical isolate, integrated luminescent (lux) operon, infect cultures of mouse bone marrow macrophages, keep them up - Ruchira S. Datta
infect macrophages 1:1 with two strains, compare input and output ratios (w/ luminescence as distinguishing mark). this measures competitive index. - Ruchira S. Datta
evolved strain infects macrophages better, but the natural host, amoebae, worse - Ruchira S. Datta
what genetic changes are responsible for these adaptations? - Ruchira S. Datta
approach: sequence individual clones (do several to get representative sample) - Ruchira S. Datta
checked which mutations occured at each day, their prevalence - Ruchira S. Datta
hard to say full name of the gene oivA, so now pronounces it "oy vey" - Ruchira S. Datta
oivA comes up first, does selective sweep - Ruchira S. Datta
fleN comes up next - Ruchira S. Datta
another appears in one clone but not another; was synonymous anyway, maybe just random - Ruchira S. Datta
noticed sdbA rose to near fixation, dropped out, then rose to near fixation again. hypothesis: another population was temporarily beating sdbA mutation, but later was outcompeted again - Ruchira S. Datta
the adaptive narrative points to the existence of different genomes in the population; iterative sequencing fills in the narrative - Ruchira S. Datta
the growth advantage of the evolved strain can be recapitulated by adding three mutations back to the ancestral strain - Ruchira S. Datta
evolved sdbA appears to ablate its function (a translocated substrate) - Ruchira S. Datta
other mutations: lysine biosynthesis - Ruchira S. Datta
several of the adaptive mutations result in lysine auxotrophy - Ruchira S. Datta
Coxiella burnetii, a bioterrorism agent developed by the US, is a closely related obligate intracellular mammalian pathogen, and is predicted to be a lysine auxotroph. - Ruchira S. Datta
it's often thought that pathogen loses lysine biosynthesis because host cell makes it. however, it may have a direct role in virulence - Ruchira S. Datta
the lysine biosynthesis pathway also produces meso-DAP at an intermediate step. meso-DAP is incorporated into the peptidoglycan layer and may be recognized by the immune system. the human host has an immune system, the amoeba doesn't - Ruchira S. Datta
will soon be measuring meso-DAP concentration directly - Ruchira S. Datta
are the competitive advantages of some strains lost in Nod1-deficient macrophages, which cannot recognize meso-DAP? - Ruchira S. Datta
will do more experimental evolution to check this out - Ruchira S. Datta
interested in getting clinical samples, seeing whether selective events occur during clinical outbreak. check matched clinical and environmental samples. will do whole genome resequencing - Ruchira S. Datta
will join U. of Toronto as prof in March 2011. currently postdoc in Ralph Isberg's lab - Ruchira S. Datta
Nina Salama: got type iv effector mutants. what about in the apparatus? A: don't even get many. do not get .icm component type iv secretion mutants. Salama: in Helicobacter it's thought that it gets in through the Type IV. A: this is secreting 250 substrates, so can't turn it off - Ruchira S. Datta
Jonathan Eisen: is antibiotic resistance evolving in amoeba reservoir or being invented anew? A: think have resistance reservoir in amoebae - Ruchira S. Datta
Ruchira S. Datta
Kimmen Sjölander: PhyloFacts-Microbe: Phylogenomic prediction of microbial (meta)genome function
# NB Kimmen is my PI, this is our work - Ruchira S. Datta
using structural phylogenomic analysis to improve the functional annotation of (meta)genome datasets - Ruchira S. Datta
even orthologous genes can have distinct functions - how can we flag potentially functionally divergent orthologs - Ruchira S. Datta
want to use 3d structure, conservation of specific residues, etc. - Ruchira S. Datta
how to interpret genetic variation across different strains? - Ruchira S. Datta
how to identify novel functional subtypes? even w/o experimental data, can predict, e.g., using homology model and seeing change in geometry of active site, conservation patterns - Ruchira S. Datta
can we flag sequence misannotations - Ruchira S. Datta
can we apply structural phylogenomic methods to human microbiome and environmental sample sequences? how do we scale? - Ruchira S. Datta
can we change the one-time static annotation to dynamic reannotation as new data becomes available, as Steve Salzberg suggested? - Ruchira S. Datta
how accuarate are existing methods of phylogenetic reconstruction of protein superfamilies? as Jonathan Eisen pointed out, need to sample intermediate taxa to get accurate trees - Ruchira S. Datta
phylogenomic tools for investigating and interpreting (meta)genome datasets. most tools designed for these data answer only "what species are here?" phylogenomics also wants to answer "what's going on?" are pathways accomplished cooperatively? - Ruchira S. Datta
primary sources of annotation error: neofunctionalization stemming from gene duplication. domain shuffling: BLAST is a local alignment algorithm and doesn't require alignment along entire length of the proteins. the hit could be aligned along one domain and annotated based on another. finally, percolation of annotation errors - Ruchira S. Datta
papers: "Getting Started in Structural Phylogenomics", INTREPID (sequence-based functional site identification, has webserver), Discern (sequence & structure-based functional site identification, no webserver yet), SATCHMO-JS simultaneous alignment & tree construction. have subfamily-specific structure and function variation. - Ruchira S. Datta
protein structure prediction can span large evolutionary distances. e.g., built model of VirB4 which helped Pat Zambryski elucidate Type IV secretion system - Ruchira S. Datta
Kimmen started PANTHER at UC Santa Cruz, then a duplication event occurred w/ Kimmen continuing w/ PhyloFacts and Paul Thomas continuing w/ PANTHER. neofunctionalization and subfunctionalization occurred. Kimmen participated in CASP, went to Celera Genomics to functionally annotate the human genome, and constructed PhyloFacts at UC Berkeley - Ruchira S. Datta
have pipeline for constructing protein families in PhyloFacts - Ruchira S. Datta
how to differentiate orthologs and paralogs? - Ruchira S. Datta
orthology prediction is critical to many areas of bioinformatics? phylogenomic inference of protein function (per Eisen 1998), prediction of protein-protein interaction and pathway participation, reconstructing Tree of Life, and phylogenetic profiles - Ruchira S. Datta
PHOG: orthology prediction webserver - Ruchira S. Datta
superorthologs (Zmasek and Eddy): *no* duplication events along the path between two sequences (not just that the most recent common ancestor is speciation). - Ruchira S. Datta
PHOG uses structural phylogenomic approach to identifying orthologs. gather families using both at domain level and domain architecture class level using FlowerPower. will focus on PHOG-T, thresholded version, which allows tuning of specificity vs sensitivity - Ruchira S. Datta
benchmarked against curated orthologs in TreeFam, precision-recall curve outperformed other methods compared against - Ruchira S. Datta
PHOG webserver shows available information on the orthologs in the group, e.g., GO annotations w/ experimental evidence - Ruchira S. Datta
used this to annotate H. pylori - Ruchira S. Datta
Discern uses both sequence and structural information to predict enzyme active sites. outperforms other methods. - Ruchira S. Datta
for metagenome analysis of short, fragmentary sequences, need new method: FAT-CAT Fast Approximate Tree Classification - Ruchira S. Datta
FAT-CAT uses HMMs placed along the nodes of the tree to classify sequences to placement in the tree. use this to identify both taxonomic distribution and functional assignment. - Ruchira S. Datta
want to do 3-dimensional chess for biology: many groups are constructing interactomes / functional networks for single genomes, some are using homology across two or three species - Ruchira S. Datta
we want to create the hyperdimensional network of proteins: link all the functional networks across all organisms. use different kinds of edges, e.g., homology, protein-protein interaction - Ruchira S. Datta
use rules to propagate inferences w/ different kinds of links - Ruchira S. Datta
also want to construct PhyloFacts Pathogen Commons - Ruchira S. Datta
Iddo Friedberg: computing resources are a constraint, use something like SETI @ Home? A: probably not as requires a lot of memory. but can talk offline about this - Ruchira S. Datta
Ruchira S. Datta
Athanos Lykidis: Genomics of bacterial lipid metabolism #LAMG10
work of postdoc Parwez Nawabi - Ruchira S. Datta
work part of Energy Biosciences Institute - Ruchira S. Datta
lipids are a structurally diverse class of molecules w/ multiple cellular functions. they provide structural stability and energy stability, e.g., triglyceride is one of the two major carbon storage molecule, the other being glycosyl. they are themselves information processing molecules, or precursors to those. they also regulate enzyme activities / catalysis - Ruchira S. Datta
they are funded by DoE and focus on energy storage. a lot of work has been done on lipids in health and nutrition, in the last 3-4 years their role in bioenergy has stimulated a lot of interest - Ruchira S. Datta
fattay acids as biofuel precursors. a lipid is a hydrocarbon with a carboxy group. if we decarboxylate, we get a fuel molecule, e.g., octane or diesel. or, if we methylate, get fatty acid methyl ester, which are used as additives in petroleum-derived diesel - Ruchira S. Datta
biodiesel comes from fatty acids in food, e.g., in the US mostly from soybean oil. - Ruchira S. Datta
reaction trigliceride + 3 methanol <-> glycerol + methyl esters. used for hundreds of years e.g., in soapmaking - Ruchira S. Datta
food oil not enough to satisfy demand. but problem w/ using fungi & bacteria: extracting the triglycerides from the cell. - Ruchira S. Datta
will talk about thioesterases and methylesterases, bypassing the fatty acid accumulation step - Ruchira S. Datta
in all organisms from humans to E. coli, have: carbon source -> pyruvate -> acetyl-CoA -> isoprenoids (being looked at by many labs, Keasling, etc--will not talk about this) or, acetyl-CoA -> Malonyl-CoA -> Malonyl-ACP -> (FASI, FASII) -> Acyl-ACP -> Acyl-P -> LysoPtdOH -> PtdOH -> CPD-DAG or DAG, leading to phospholipids or triglycerides. (fatty acid biosynthesis). When have enough, don't store the carbon in triglycerides (shut down the pathway). - Ruchira S. Datta
one way to shut down the pathway is to divert Acyl-ACP to free fatty acids using thioesterases. But then, what to do w/ these free fatty acids, they are not biofuelds? Can convert to Acyl-CoA to fatty alcohols or to triglycerides. Another route just started this last year: divert Acyl-ACP to fatty CHO to Alkanes. Keasling Lab: go directly from Acyl-ACP to alkanes. - Ruchira S. Datta
Lykidis will present how to go from free fatty acids to methyl esterases - Ruchira S. Datta
thioesterases: Voelker & Davis J. Bact 1994 - Ruchira S. Datta
depending on the origin of the thioesterase the profile of the fatty acids will change, e.g. from A. thaliana get acyl chains of 16 carbons - Ruchira S. Datta
found thioesterases in bacteria - Ruchira S. Datta
showing phylogenetic distribution of bacterial acyl-ACP thioesterases: in Mycobacterium, Bacteroides, Prevotella, Flavobacteriaceae, Enterobacteriaceae, Desulfovibrio - Ruchira S. Datta
sometimes have paralogous copies, may have different physiological roles - Ruchira S. Datta
diverse family in terms of sequence identity, down to ~30-35% - Ruchira S. Datta
when express these genes in E. coli, get activity. E.g., Clostridium acetobutylicum thioesterase releases free fatty acids, in particular a lot of 3-hydroxy free fatty acids. These are intermediate between fatty acid biosynthesis and degradation, and this is newly observed - plant enzymes don't generate these. - Ruchira S. Datta
Most of the bacterial thioesterases generate 3-hydroxy free fat acids, e.g., Mycobacterium marinum thioesterase - Ruchira S. Datta
one class of genes has much less sequence identity to others in the family, 12%, much shorter than others, doesn't align properly w/ automatic aligners, but overlaps at the active site. these produce huge amounts of 3-hydroxy free fatty acids - Ruchira S. Datta
thus, have identified bacterial thioesterases w/ distinct specificities toward 3-hydroxy fatty acids - Ruchira S. Datta
Sjögren,...,Kenne 2003: "Antifungal 3-Hydroxy Fatty Acids from Lactobacillus plantarum MiLAB 14". - Ruchira S. Datta
thus, lactobacillus in the vagina may prevent colonization by fungi by producing 3-hydroxy fatty acids - Ruchira S. Datta
Lykidis more interested in producing methyl esters. - Ruchira S. Datta
1970, Akamatsu and Law: The Enzymatic Synthesis of Fatty Acid Methyl Esters by Carboxylation - Ruchira S. Datta
in last 40 years, no genes for this identified - Ruchira S. Datta
do fatty acid methyltransferases exist? of course you cannot use BLAST, since none have been identified... - Ruchira S. Datta
however, in plants, there are carboxyl mehtyltransferases, e.g., Jasmonic acid carboxyl methyltransferase - Ruchira S. Datta
the A. thaliana genome contains 25 carboxyl methyltransferase genes - Ruchira S. Datta
25 bacterial genomes have a carboxyl methyltransferase gene (or occasionally two) - Ruchira S. Datta
checked Mycobacterium marinum and Mycobacterium smegmatis methyltransferases. Were able to find activity for the M. marinum gene but not the M. smegmatis gene, so will discuss the further - Ruchira S. Datta
the M. marinum gene is very specific: strong preference for 3-hydroxy free fatty acids, and almost all secreted into the medium (the supernatant) - Ruchira S. Datta
in vitro assay of activity demonstrates that the M. marinum gene is a fatty-acid methyltransferase - Ruchira S. Datta
what if we coexpress this methyltransferase w/ thioesterase? using the bacterial thioesterase that makes 3-hydroxy fatty acids, get 3-hydroxy-fatty acids, get 3-hydroxy fatty acid methyl esters - Ruchira S. Datta
AdoMet: adenosine methionine is synthesized via methionine adenosyltransferase (MAT) - Ruchira S. Datta
MetJ controls Met biosynthesis: homoserine -> .... -> homocysteine and serine -> ... -> 5-Methyl-THF -> methioinine -> (via MAT) S-adenosyl methionine (SAM). A metJ knockout increases methyl ester production. - Ruchira S. Datta
want to identify or design thioesterases w/ specificities that match fatty acid methylstransferases - Ruchira S. Datta
Nina Salama: does M. marinum secrete these in vivo? A: don't know, all assays were in E. coli - Ruchira S. Datta
Ruchira S. Datta
Jizhong Zhou: Rapid Genome Evolution and Adaptation to Salt Selection #LAMG10
will talk about this, and also about issues w/ pyrosequencing in metagenomics - Ruchira S. Datta
high throughput approaches: open format detection, or closed format -- microarrays - Ruchira S. Datta
connection issues prevented blogging first part of talk, on advantages of microarrays (GeoChip) over pyrosequencing - Ruchira S. Datta
long-term evolution experiment, under salty conditions and control conditions - Ruchira S. Datta
the transcription profiling changes became stable at 1000 generations - Ruchira S. Datta
at this point distinctly different gene expression profiles btw evolved salty and control strains - Ruchira S. Datta
analyzed which genes enriched. Glu and Ala: the major osmoprotectants, were more produced - Ruchira S. Datta
did whole genome resequencing of the D. vulgaris strains: the ancestral strains, the evolved strain under control conditions, the evolved strain under salty conditions - Ruchira S. Datta
some genes that changed: LytS. GAF domains are ubiquitous in signaling and sensory proteins across the domains of ilfe - Ruchira S. Datta
the SNP in LytS in the evolved strain is a loss-of-function mutation and contributes to salt resistance - Ruchira S. Datta
evolved strain more resistant than engineered mutant, complementation partially rescued - Ruchira S. Datta
mBio: ASM's first broad-scope, open access online journal. ASM has 11 journals now, but wants a more integrated journal, with potential impact - Ruchira S. Datta
I ask: advantages of GeoChip vs PhyloChip? A: PhyloChip requires PCR amplification, which can lead to sampling bias. GeoChip only does amplification when necessary, o/w quantify directly from chip w/o amplification. Also distribution of genes of interest in specific processes. - Ruchira S. Datta
Ruchira S. Datta
Richard Roberts: ComBrEx
easy case to make: we need to understand what these proteins are doing - Ruchira S. Datta
need to show that this is the way to do it - Ruchira S. Datta
that will only happen if everyone who can participate does so - Ruchira S. Datta
this is collaborative and not competitive: there is so much to be done that we can split up the work - Ruchira S. Datta
there is no answer to the general problem of how to get good functional annotation - Ruchira S. Datta
currently funded with stimulus money, hope NIH will issue RFA to which we can respond - Ruchira S. Datta
Ruchira S. Datta
Martin Steffen: What You Can Do On Combrex #LAMG10
submit bids, predictions, annotations regarding prior experimental validtion, or nominate genes for "high priority status" - Ruchira S. Datta
to submit bid, the form is already mostly filled in, except "Requested US$" (state in direct costs, though will also supply indirect costs). most bids would be around $10,000 - Ruchira S. Datta
fill in template of six items (can be very brief); rationale/motivation, current knowledge, experimental method, budget justification, prior experience (>3 PubMed ids), anything else. - Ruchira S. Datta
to submit prediction from the gene page, choose from dropdown prediction (vs annotation or nomination for top100_list). For annotation, send at least the UniProt accession and the PubMed ID; can also have couple of sentences of text. - Ruchira S. Datta
FAQ: are indirect costs covered? yes, in addition to requested funds. - Ruchira S. Datta
can submit own prediction and bid to validate it? yes, and can choose to keep prediction and bid private - Ruchira S. Datta
are bids treated equally if i am validating someone else's predictions? yes - Ruchira S. Datta
do i have to bid on a gene from E. coli or H. pylori, or can i bid on any organism? Any. - Ruchira S. Datta
how long will it take to get a response for my bid? not long - 2 weeks, has been so far. - Ruchira S. Datta
can i use His tags for purification? Yes, but ultimately validation needed with WT sequence. can use epitope affinity purification. Valerie deCrecy-Lagard: they can cause problems, cleave in the middle of protein. Alex Yakunin: most have His tags. Taedgh Begley: lot of extra work w/o much payoff. Andrei Osterman: have checked in many cases, His tags do not alter functions. Martin Steffen: in some cases they can change function. - Ruchira S. Datta
Barry Warner: complement mutant w/ the tagged version of the protein, check function preserved. someone else: sometimes do change wild-type growth rate. John Hunt: the cleavage process also can change function, why this bias of His tag - Ruchira S. Datta
question: this is a higher standard than in the biochemistry journals that we publish in - Ruchira S. Datta
Taedgh Begley: unless there are clear biochemical considerations of His tag disrupting function, this is overkill - Ruchira S. Datta
Q: how long are the bids for? A: for six months of work, but will consider other possibilities - Ruchira S. Datta
Rich Roberts: we do know that it may be possible to add a His tag without problems. our philosophy is that if it's possible to do it w/o His tag, we would prefer that. put on a transcription-translation system. Taedgh Begley: in wild type we often see mixture of methionines, could apply the same argument. - Ruchira S. Datta
Rich Roberts: people have not done things rigorously, taken shortcuts, and this has led to the problems in the databases. if it's possible to do it properly, why not do it properly? we would like to see things done properly and rigorously whenever possible. - Ruchira S. Datta
Ruchira S. Datta
Andrei Osterman: ComBrEx Experimental Discussion Group #LAMG10
what is a testable prediction of gene function? what is sufficient to be considered a valid demonstration of function? - Ruchira S. Datta
what is a testable prediction? it depends on what your definition of what "is" is - Ruchira S. Datta
what ComBrEx means by gene function: 1. functional activity of *likely* biological relevance 2. specificity rather than generality 3. testable by biochemical methods - Ruchira S. Datta
anything where that very same sequence was not tested experimentally, is still a prediction. the unit of prediction is an individual protein (rather than family) - Ruchira S. Datta
ComBrEx paradigm: supporting evidence: computational evidence, pathway reconstruction, genome context, genetic (in vivo) results, even high-throughput experimental data, all lead to prediction for gene X. the experimental verification is biochemical assay on pure protein - Ruchira S. Datta
what is sufficient to be considered a valid demonstration of biochemical function? - Ruchira S. Datta
1. Enzymes: in vitro biochemical assay - definitive, may use km/kcat - Ruchira S. Datta
2. Transporters and carrier proteins: uptake assays and alike - definitive - Ruchira S. Datta
3. Transcriptional regulators: 1. identify and confirm the binding site (gel-shift, etc.), 2. identify the effector (optional) - verified in same way. - Ruchira S. Datta
How to match computational and experimental scientists to focus on specific functional areas? ComBrEx will act as MatchMaker. Also expand the field and support newcomers. Leverage available resources (e.g. clone collections) if need to request these, then cc: to Rich Roberts. Make a case for the community and the funding agencies. - Ruchira S. Datta
How to incorporate microarray data systematically? these are thought of as supporting evidence, not as validation. high-throughput enzyme screen *may* be considered as validation. - Ruchira S. Datta
need to make the data accessible and use them to support predictions - Ruchira S. Datta
What criteria should be used for prioritization of predictions and validations? 1. Top 100 List 2. High-priority organisms: E. coli = H. pylori > other prokaryotes >> eukaryotes. 3. Others - bidder makes the case 4. expert assessment 5. novelty, impact, feasibility - Ruchira S. Datta
Taedgh Begley: get supplement to NIH grant to start off larger ComBrEx effort - Ruchira S. Datta
Valerie deCrecy-Lagard: how will publication work? Martin Steffen: for previously published predictions, use citations. o/w, collaborate. Martin Galperin: NIH ethics guidelines stipulate that you should discuss authorship before launching the project, - Ruchira S. Datta
Peter Karp: why H. pylori? experimental community large? gene expression datasets? Martin Steffen: 1) model organism, want to finish 2) the anti-E. coli, hardly anything experimentally validated - Ruchira S. Datta
Valerie deCrecy-Lagard: there is high-throughput data and also some experimentalists - Ruchira S. Datta
Rich Roberts: biochemists are protein family-specific, organism is irrelevant - Ruchira S. Datta
I ask: which is more important, assayability or finishing the organism? A: ComBrEx will use combination of criteria. Also, e.g., if bidder says "I can do it in two days for free", that's a good argument, or if bidder says "This prediction is actually biologically very important", etc. - Ruchira S. Datta
Peter Karp: where do Alexander Yakunin's techniques fit? they're not computational, but also not full experimental function. A: he does two kinds of work. he does high-throughput screen, it's a phosphatase for example. that falls in supporting evidence. on the other hand, he does subsequently drill down to specific substrates and assays, that is an experimental verification. - Ruchira S. Datta
Simon Kasif: in morning session, there was talk about misannotation, due to misuse of BLAST about annotation tool. if you're going to test a single protein with an assay, how will that solve the misannotation problem? you define function per-protein, not per family. A: to Osterman personally, unit of prediction is family. for purposes of ComBrEx, unit of prediction is individual protein. then propagating to similar sequences is someone else's job. - Ruchira S. Datta
Simon Kasif: don't you want to test a representative set of proteins in the family? A: ComBrEx may test e.g. 3 representatives of a single family. - Ruchira S. Datta
Ruchira S. Datta
Iddo Friedberg: CAFA: Critical Assessment of Functional Annotation #LAMG10
Friedberg 2006 Briefing in Bioinformatics (before next-gen sequencing): more and more diverse clusters with no relation to each other, i.e., sequence diversity is increasing - Ruchira S. Datta
"Critical Assessments" are community-engaging experiments drive their fields forward. e.g., CASP: Critical Assessment of Structure Prediction. In late 80's, every structural bioinformatician claimed to have solved the protein folding problem. John Moult had the idea of having crystallographers sequester their structures and see how well the structural bioinformaticians did. it turned out the folding problem had not been solved. - Ruchira S. Datta
others: CAPRI, CAMDA, CAGI - Ruchira S. Datta
only thing not done is CACA: Critical Assessment of Critical Assessments. Iddo: "that was really bad" audience member: "it was really shitty!" - Ruchira S. Datta
CAFA: 10,000 sequences in SwissProt that are not experimentally annotated were released today - Ruchira S. Datta
10,000 sequences from TrEMBL that are "hard" will be released by Oct 15 - Ruchira S. Datta
submissions due Jan 15, 2011 - Ruchira S. Datta
wait for experimental annotations to happen between Jan and June - Ruchira S. Datta
July 2011: CAFA meeting in Vienna in conjunction w/ ISMB - Ruchira S. Datta
assessments by Amos Bairoch & Sean Mooney - Ruchira S. Datta
Kimmen Sjölander: how will predictions be assessed? A: semantic similarity measures, Jaccard, and Manhattan block distance on GO - Ruchira S. Datta
Andrei Osterman: why not just right or wrong? A: different degrees of wrong. e.g., predicting it's an enzyme when it's not is completely wrong, serine protease instead of other protease is wrong but less wrong. need quantifiable measure. - Ruchira S. Datta
Ruchira S. Datta
Iddo Friedberg: ComBrEx Computational Breakout Group Report
how to select targets? ideally: large families that are assayable. other criteria: practical => fundable, impact, biomedical, biodefense, fuels, environment. Need to find intersection between experimentally assayable and computationally predictable. Universal proteins? - Ruchira S. Datta
what is a good prediction? granularity: the more specific the better. tweakable specificity/sensitivity tradeoff. a prediction should have: provenance (traceable to origin), confidence score - Ruchira S. Datta
i object: people will think confidence scores are comparable - Ruchira S. Datta
martin steffen: apply methods across a common gold standard set, and use that to generate confidence - Ruchira S. Datta
gary gippert: idea of uber-confidence score from ComBrEx itself - Ruchira S. Datta
what is a good method? keep history of method's performance - somewhat controversial, would this keep people to entering? it doesn't in CASP - Ruchira S. Datta
controlled vocabulary: Gene Ontology is the only game in town, but need to be able to quickly update the gaps in GO. many parts of the ontology are not descriptive enough, terms are missing. some of the relationships are misannotated. - Ruchira S. Datta
maybe enzyme commision numbers also for enzymes - Ruchira S. Datta
martin steffen: 2.7 million genes have no GO annotation - Ruchira S. Datta
what to do about that? - Ruchira S. Datta
martin steffen: they may be assigned to clusters - Ruchira S. Datta
then start annotating it - Ruchira S. Datta
michael galperin: we expect to find new functions that are not represented in GO - Ruchira S. Datta
peter karp: not every function prediction can have GO terms attached to it, as not all have an assigned GO term, but when possible it's very useful to attach GO terms so people can query for their interests - Ruchira S. Datta
barry wanner: GO is geared at eukaryotes, not very well aimed for infectious disease - Ruchira S. Datta
if GO does not have the terms, we should update it. the other alternatives: use your own standard vocabulary, or annotate in english, will not be helpful - Ruchira S. Datta
john hunt: ComBrEx should be a leader in making better controlled vocabulary. - Ruchira S. Datta
expand GO - Ruchira S. Datta
GOlets: provisional submission to GO, while GO is considering them ComBrEx can use them - Ruchira S. Datta
hard or easy targets? manage a portfolio, combination of "hard" and "easy" targets. Friedberg enjoys the challenge of hard targets, using e.g. micromotifs, genome location, etc. But not everyone may be interested in these. Sjölander had suggested the "portfolio" idea. - Ruchira S. Datta
Gary Gippert: question of outreach - how to share the findings so users of public databases can also benefit. - Ruchira S. Datta
also question of how to pair the appropriate computationalists and experimentalists. - Ruchira S. Datta
Ruchira S. Datta
Kimmen Sjölander: PhyloFacts-Microbe: Phylogenomic prediction of microbial (meta)genome function
as student w/ Haussler, worked on CASP, got into structural genomics and then functional prediction at Celera genomics - Ruchira S. Datta
at Berkeley, working on PhyloFacts 3.0 - Ruchira S. Datta
PHOG aims at super-orthologs as defined by Zmasek and Eddy - Ruchira S. Datta
# NB PHOG is my work - Ruchira S. Datta
PHOG uses a structural phylogenomic approach to predict orthologs, and is not restricted to whole genomes - Ruchira S. Datta
allows to avoid sparse taxon sampling, give more accurate trees - Ruchira S. Datta
PHOG was benchmarked on TreeFam manually curated orthologs to human in mouse, zebrafish and fruitfly - Ruchira S. Datta
has tunable parameter with precision-recall, vs OrthoMCL and InParanoid - Ruchira S. Datta
used this to make function predictions on H. pylori at tight, medium, and loose values of the tunable parameter (tree-distance threshold) - Ruchira S. Datta
have also worked with Andrej Sali and Ursula Pieper, to build homology models of H. pylori proteins - Ruchira S. Datta
orthologs don't always preserve function, also predict key residues so can see where those changed and may have changed function - Ruchira S. Datta
# INTREPID, Discern - Ruchira S. Datta
Iddo asks: these are for bacterial gene families? thought PhyloFacts was human-centered? A: was previously, now microbial - Ruchira S. Datta
Karp: many predictions are known already? Sjölander and I: we asked, ComBrEx wanted us to submit all - Ruchira S. Datta
Morgan Langille: useful anyway, as don't know where the "known" annotation came from - Ruchira S. Datta
Sjölander: many annotations of function are based on single Pfam domains. if the clustering is based on global, domain-architecture clustering, okay to transfer. if the clustering is based on a single Pfam domain, becomes questionable. but anecdotally, single domain trees actually cluster similarly to the global ones - Ruchira S. Datta
Ruchira S. Datta
ComBrEx Discussion Group: Computation
Iddo Friedberg: how to prioritize which computational predictions we would like verified? - Ruchira S. Datta
Jonathan Eisen: not sure that's a good question, but would try to think of benefits of different approaches, how many people would an approach help, when would it help them. E.g., filling out an organism is useful for those who want to do modeling of metabolic processes in an organism, but the last 500 genes of E. coli don't have homologs and thus won't have benefits in as many other organisms. - Ruchira S. Datta
Loren Hauser: concentrate on large paralogous family for which have assays. - Ruchira S. Datta
Kimmen Sjölander: could potentially spend 90% of time trying to get the last 10% of genes in a specific organism. look at genes of biomedical importance: pathogenesis, virulence, ... - Ruchira S. Datta
Iddo Friedberg: the F-word - fundable - Ruchira S. Datta
Loren Hauser: transporters. if you take the ABC transporters, and uptake assay the periplasmic binding proteins, will get a large amount of biological space. similarly w/ transcription factor ligands - Ruchira S. Datta
Peter Karp: not clear transcription factor binding predictions will go anywhere - Ruchira S. Datta
Loren Hauser: not the DNA binding site, but the other ligand - they're two-domain proteins - Ruchira S. Datta
Jonathan Eisen: rather than deciding specifics, decide on rules. Prematurely deciding specifics here may exclude people. Let the community decide what they can do. - Ruchira S. Datta
Iddo Friedberg: maybe, what people are already working on? - Ruchira S. Datta
Jonathan Eisen: one possibility, but can be interactive - community also wants guidance on what to work on. - Ruchira S. Datta
Iddo Friedberg: those predictions that are testable - Ruchira S. Datta
Loren Hauser: they naturally overlap - Ruchira S. Datta
Morgan Langille: the biologists should be the ones to decide - Ruchira S. Datta
Peter Karp: it may be useful to the effort to have early successes - Ruchira S. Datta
start with low-hanging fruit - Ruchira S. Datta
Loren Hauser: that comes back to having an assay - Ruchira S. Datta
Jonathan Eisen: there needs to be feedback - Ruchira S. Datta
Iddo Friedberg: find intersection of low-hanging experimental fruit and low-hanging computational predictions - Ruchira S. Datta
Jonathan Eisen: need to interfact the computational w/ the experimental. e.g., universal might be good computational, but most universal proteins that were easy to assay may already have been done. - Ruchira S. Datta
John Hunt: lab has worked on two of the universal proteins. Have looked at bioinformatics predictions. Different from enzymes. Specificity of predictions is problematic. In structural genomics, always look for substrates anyway for crystallization. For one protein, found it was an RNase through Yakunin's comprehensive battery of assays. No prediction came close. The other was an ATPase, and what was helpful was transcriptional co-regulation. Turned out to be transcription factors. - Ruchira S. Datta
Garry Gippert: is it substrate specificity that we're asking about? - Ruchira S. Datta
Iddo Friedberg: what level of granularity are we interested in - Ruchira S. Datta
Garry Gippert: anyone interested in eukaryotes? - Ruchira S. Datta
Sjölander: we are, but ComBrEx is for prokaryotes - Ruchira S. Datta
Eisen: homology in eukaryotes might be useful prioritization criteria - Ruchira S. Datta
Michael Galperin: huge number of Pfam domains occur across the three domains of life - Ruchira S. Datta
Iddo Friedberg: GO sucks, but it's the only game in town. - Ruchira S. Datta
Sjölander: Patsy Babbitt talked about GO holes. Report these back to GO. - Ruchira S. Datta
Karp: Presumably there's already a way that anyone can do that. - Ruchira S. Datta
Karp: What do the predictions look like right now? - Ruchira S. Datta
Kasif: Hard as what is currently in Genbank is also included. Also, website function not fully up yet. - Ruchira S. Datta
Gippert: if annotation is improved, how does that propagate? - Ruchira S. Datta
Sjölander: that would require a computational infrastructure which ComBrEx isn't funded for - Ruchira S. Datta
Karp: for model organisms, the model organism databases will pick them up through their usual literature mining process - Ruchira S. Datta
Kasif: a propagation is itself a prediction. First, ComBrEx is not funded to make predictions, and also problematic recruiting people to make predictions if I myself am making better predictions. For the Green set, using a very, very conservative method to propagate a prediction. That's all we're willing to do--don't compete in making predictions. - Ruchira S. Datta
Simon Kasif on projected prediction: David Horn, Prof. Vitkup cluster of HP0042, HP0043, and HP0045 - Ruchira S. Datta
Kasif: undergraduates check: are these predictions, new, more-specific, or already known in NCBI - Ruchira S. Datta
Sjölander: if i were a biologist, i would want to see the provenance of the prediction, the evidence supporting it - Ruchira S. Datta
Kasif: some methods can do that, Horn/Vitkup uses support vector machines where it's more difficult - Ruchira S. Datta
Kasif: also, want to make links to people's websites where they can describe their provenance - Ruchira S. Datta
Friedberg: but when they lose funding, their website disappears - Ruchira S. Datta
Karp: biologist would want to know the class of the prediction methods - Ruchira S. Datta
I: also, biologist would assume confidence values are comparable to each other - Ruchira S. Datta
some discussion explaining the meaning of the prediction website, which is not completely intuitive - Ruchira S. Datta
Hunt: have been looking for things like this in order to decide about crystallizing proteins. would want first to BLAST, then, find out how well other experimentalists have been able to validate the prediction - Ruchira S. Datta
Friedberg: hard to jumpstart that - Ruchira S. Datta
Hunt: but need long-term perspective. also provenance is very important to us: want the paper that established the annotation. - Ruchira S. Datta
Alexandra Schnoes: I thought in most of these there is no paper? - Ruchira S. Datta
Kasif: out of all Vitkup's predictions and all Horn's predictions, the ones Peter or I would find interesting would be only three. - Ruchira S. Datta
I: in our predictions, there is a paper: the paper describing the ortholog - Ruchira S. Datta
Hunt: but does that paper describe the specific strain? - Ruchira S. Datta
I: we're relying on GO, so no - Ruchira S. Datta
Iddo: not dealing with orphans? - Ruchira S. Datta
Sjölander: we can deal with what others call orphans, using advanced remote homolog detection. also, lay out the papers listed by UniProt on the tree so people can see where they lie - Ruchira S. Datta
Hunt: UniProt literature very incomplete - Ruchira S. Datta
Gippert: über-score assigned by ComBrEx, not by methods themselves? - Ruchira S. Datta
I: karma points for methods - Ruchira S. Datta
Sjölander: but want to encourage submission of predictions, assessment may discourage that - Ruchira S. Datta
Gippert: has it discouraged CASP? - Ruchira S. Datta
Sjölander: CASP brings funding with it. in ComBrEx, funding is for experimentalists, not computationalists - Ruchira S. Datta
Friedberg: want all kinds of predictions, right? - Ruchira S. Datta
Michael Galperin: targeting all bacterial and archaeal species is unmanageable task - Ruchira S. Datta
Kasif: Gold is what has already been completely validated. Green: e.g., 98% identity to Gold Gene - Ruchira S. Datta
Sjölander: does that include 80% coverage in both directions - Ruchira S. Datta
some discussion of whether this criterion was included or not - Ruchira S. Datta
Iddo Friedberg: what experimentalists are working on anyway should be included in priorities of ComBrEx - Ruchira S. Datta
I: ComBrEx website had list of experimentalists with classes of proteins they're interested in - Ruchira S. Datta
Karp: there are thousands of biologists, no way to assess that - Ruchira S. Datta
Eisen: if i were a program officer, i would say, who gives a shit about the low hanging fruit? it will get done anyway. need things that wouldn't get done otherwise - Ruchira S. Datta
I: also sanity check - Ruchira S. Datta
Karp: early successes - Ruchira S. Datta
Hunt: if go too low, won't impress study sections, and also won't be able to publish - Ruchira S. Datta
Sjölander: like stocks or bonds, have some in each - Ruchira S. Datta
Schnoes: in structural genomics initiative, a lot of effort was expended to find proteins that gave maximum bang for the buck. would be good to do the same. - Ruchira S. Datta
Friedberg: in function space, find targets that cover the biggest area - Ruchira S. Datta
Schnoes: like GEBA, fill in tree. here, fill in holes in functions. - Ruchira S. Datta
Eisen: but in GEBA, found could do nothing w/o cultures, and needed partner who would grow up the organism, for free as had no budget for that. DSMZ did that, we gave them priority list, they didn't grow all of them - Ruchira S. Datta
Schnoes: are there experimentalists committed already? - Ruchira S. Datta
Kasif: 30 experimentalists committed, 1/3 submitted bids, 6 are working. we're well-connected with the community. - Ruchira S. Datta
Schnoes: maybe we should start by working w/ the experimentalists we already have. - Ruchira S. Datta
Eisen: can submit everything, but can also give top 5, or 20, or 50. if the experimentalists have to go through the entire list themselves, that will be too hard - Ruchira S. Datta
Friedberg: not choose things experimentalists are going to do anyway - Ruchira S. Datta
Sjölander: disagree, if computationalists submit a lot of predictions that no one does they'll lose heart. do some where experimentalists have infrastructure for that gene family in place - Ruchira S. Datta
Hunt: would be impressed by one nontrivial function prediction that was verified, as long as number that were also tried w/o verification was also reported. also, clean up data structures for predictions. - Ruchira S. Datta
Friedberg: was asked whether non-protein coding genes also of interest. answer is no, not in scope. - Ruchira S. Datta
Karp: what is the workflow: submit prediction, submit bid, then it succeeds or fails. what happens then? - Ruchira S. Datta
Friedberg: two kinds of failure: failure of prediction but success of experiment - found it does something else. or, failure of experiment to determine function - Ruchira S. Datta
Karp: will failure be recorded in ComBrEx? that would be useful. - Ruchira S. Datta
I: failure could help train the active learner: the cost of asking that question could go up - Ruchira S. Datta
Karp: say an experimentalist successfully validates a function. what happens next? - Ruchira S. Datta
Kasif: protocol set by ComBrEx, personally not completely happy with it. have asked Martin Steffen and Rich Roberts to determine that, as they're the biologists - Ruchira S. Datta
Karp: is success determined before or after publication? - Ruchira S. Datta
Kasif: ask the SAB - Ruchira S. Datta
Karp: what happens to the successful annotation afterward? - Ruchira S. Datta
Kasif: ComBrEx can do the very conservative propagation, if agreed by group - Ruchira S. Datta
Karp: to where? - Ruchira S. Datta
Kasif: to ComBrEx site - Ruchira S. Datta
Karp: should also report, e.g., to UniProt - Ruchira S. Datta
Ruchira S. Datta
Simon Kasif: Breaking the Power-Law Curse: A Computational Bridge to Experiments
the origins of ComBrEx: computers compute rough hypotheses in the form of gene function predictions, predictions are stored on a portal, and biologists/biochemists identify high priority targets. ComBrEx: direct funding to bring these together. - Ruchira S. Datta
were able to bring together people from many disciplines: genomic annotators, sructural biologists/biophysicists, computational biologists, bioinformaticians, evolutionary biologists, etc. - Ruchira S. Datta
will store biochemical function predictions from any team with a published method. numerous predictions from multiple teams have been submitted and four sets have been manually annotated. - Ruchira S. Datta
also manual recommendations by registered scientists - Ruchira S. Datta
ComBrEx commmitted to transparency: want to be able to trace every annotation to its original experimental source or sources - Ruchira S. Datta
power law of citations of genes: most genes have few citations and a few have many - Ruchira S. Datta
bacteria top 10: recA, rpoB, rpoD, rpoS, dnaK, crp, rpoC, rpoA, ftsZ, rne - Ruchira S. Datta
do we want to flatten this distribution? - Ruchira S. Datta
do we use computers to recommend genes for exploration (testing) using active learning methodologies? - Ruchira S. Datta
Inspirations: Paul Erdos offered small prizes to anyone who could prove mathematical conjectures of different perceived complexities. Or Amazon's Mechanical Turk. - Ruchira S. Datta
can programs compute questions (hypotheses to be tested)? gold genes: strong experimental annotation; green genes: have good experimental annotation (lack details); blue genes: have some predictions. black genes: have no predictions. - Ruchira S. Datta
Google is excited about ComBrEx: Udi Manber (father of knol) says ComBrEx will increase ratio of real knowledge vs porn or other nonsense on Internet. - Ruchira S. Datta
what is gene function? EC number, Gene Ontology annotation, text, context (protein cluster) -- proteins that share or might share a function; probabilistic models, position in network... - Ruchira S. Datta
alternatives to BLAST: profiles (0th order Markov models - Stormo et al), HMMs (1st order Hidden Markov models) Haussler, Krogh, Mian, Sjölander 1994, or general Bayes networks which can capture complex relationships between residues: Delcher & Kasif - Ruchira S. Datta
probabilistic Bayesian Networks capture dependencies among residues - Ruchira S. Datta
BLAST is worse than even perceptron: take line and don't even rotate it, just shift it up and down! - Ruchira S. Datta
the number of experiments we need to do can be substantially and provably reduced if the model is used to compute the QUESTIONS, as in Active Learning - Ruchira S. Datta
see Delcher, Kasif et al, ISMB 1993, inspired by Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Judea Pearl - Ruchira S. Datta
PBNs are a natural abstraction of biophysical models - Ruchira S. Datta
functional linkage networks: function is predicted by guilt by association, Marcotte,...Eisenberg 1999 - Ruchira S. Datta
Letovsky & Kasif, ISMB 2003: probabilistic functional annotation via automated propagation of functional gene ontology annotation using random fields P(node = functional GO label | the GO annotations of neighbors) - Ruchira S. Datta
Bayes network based integrative models: proposed independently by a number of groups - Ruchira S. Datta
data sources to combine: operon or metabolic chain predictions, phylogenetic profiles, gene co-expression profiles, gene co-evolution profiles, domain fusion, presence/absence of domains, protein-protein interaction data, protein clusters; in future: RNAseq, Selex-Seq, etc. - Ruchira S. Datta
Anton,...,Roberts introduced whole genome role assignment - Ruchira S. Datta
active learning is popular in drug screening and has been used by many pharmaceutical companies. results in hit performance. - Ruchira S. Datta
simple active learning in ComBrEx: choose a cluster based on "phylogenetic spread" - Ruchira S. Datta
inside protein clusters, do crude active learning based on tree distances. phylogenomics much better: Eisen, Brenner, Sjölander - Ruchira S. Datta
as active learner, where in the tree to go next? Mike Steel in microbiome community has come up with criterion - Ruchira S. Datta
Ruchira S. Datta
Patricia Babbitt: Annotation error in public databases
thesis project of Alexandra Schnoes - Ruchira S. Datta
heroic piece of work: not just a lot of code and automated analysis, but also a lot of manual curation - Ruchira S. Datta
Babbitt Lab works on functionally diverse enzyme superfamilies. interested in "privileged" scaffolds, structural templates for evolution of new functions develop models for the evolution of new function & apply them for molecular functional inference and to guide protein engineering - Ruchira S. Datta
these proteins are so diverse, covering all the biosphere, that can't get good alignments - Ruchira S. Datta
many of the specificity families evolve at different rates, vastly complicating the problem. - Ruchira S. Datta
"Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies" PLoS CompBio, 5: e10000605 (2009) - Ruchira S. Datta
evaluated annotation accurcy for >7000 sequences in public databases annotated to a function in a highly curated Gold Standard set, SFLD. checked NR, TrEMBL, SwissProt, and KEGG. - Ruchira S. Datta
have well-curated gold standard set, with structures and metadata for each, mechanistic knowledge, superfamilies represent all available organisms - Ruchira S. Datta
functionally diverse; can check annotations at different granularity. these look alike so can be hard to annotate and easy to misannotate. - Ruchira S. Datta
enolase superfamily conserves a subset of active site residues; subgroups constrain further, e.g., MR subgroup; actual reaction specificity is a hard problem - Ruchira S. Datta
Alexandra searched public dbs for annotations to the Gold Standard families. 1) does it match known sequence patterns for superfamily and family? does it pass cutoffs to the hand-curated HMM for that family? - Ruchira S. Datta
Misannotation patterns for KEGG looked very similar to those for NR and TrEMBL. That spells trouble for systems biology. - Ruchira S. Datta
SwissProt was far and away the best, in this particular test. - Ruchira S. Datta
some families w/ some dbs had nearly 100% annotation error - Ruchira S. Datta
need family-specific thresholds for annotation transfer. looked for patterns in the errors, didn't find any. - Ruchira S. Datta
surprisingly, fraction of misannotations is not decreasing over time, but increasing - Ruchira S. Datta
largest percent of errors: overprediction - Ruchira S. Datta
Pfam is an extremely useful resource; Babbitt Lab starts from a Pfam set in own analyses - Ruchira S. Datta
MR_MLE family (PFAM): the name of this is a compilation of two different functions. No single enzyme does both of those, but people may not be aware of this. - Ruchira S. Datta
misannotation levels found are certainly an undercount, as in many cases did not have enough data to definitively say what the correct function was, but could find no evidence for the annotation - Ruchira S. Datta
developing database for interactive viewing of misannotations - Ruchira S. Datta
Michael Galperin asks: when have two families and separate them using cutoffs, how do you know the other family doesn't have that activity? A: in some cases, experiment. but in other cases, don't know--never will until experiment is done. reaction may have been invented more than once, or may have some level of ability to do that reaction. - Ruchira S. Datta
Q: all computational annotations should be "related to". this happened because NIH doesn't fund manual curation. A: need more experimental data. - Ruchira S. Datta
Barry Hanner: before correcting, make people aware, e.g., adding tracks as Bruno Sobral suggested. - Ruchira S. Datta
Ruchira S. Datta
John Gerlt: Enzyme Function Initiative (EFI)
how to discover unknown functions? operon context, physical library screening as heard in last two talks. - Ruchira S. Datta
will focus on third approach: in silico screening, using experimental structures or homology models - Ruchira S. Datta
start w/ sequence, get either experimental or predicted structure, do ligand docking, thus infer enymatic reaction - Ruchira S. Datta
3 component subprojects: aminohydrolase superfamily, enolase superfamily, and computational. Gerlt focuses on enolase superfamily. - Ruchira S. Datta
enolase includes TIM Barrel domain, doing acid/base chemistry, and capping domain w/ substrate specificity. these close on each other, the shape of the pocket determines substrate specificity. - Ruchira S. Datta
conserved enolization, but diverse products. mediation of Mg2+ and sometimes H+ in intermediates. - Ruchira S. Datta
Patricia Babbitt formed enolase superfamily network, clustered it into possible isofunctional families. but many families of unknown function. - Ruchira S. Datta
enolase specificity is determined by two loops in the capping domain, the 20s loop and the 50s loop - Ruchira S. Datta
homologs of unknown functions have different sequences in these loops. find their geometry, e.g., by homology modeling, then see what fits in there. - Ruchira S. Datta
e.g., NSAR: unknown protein catalyzes racemization of N-Succ-L-Arg to N-Succ-D-Arg. This is a novel function in biology and don't know its role in the cell. - Ruchira S. Datta
function predicted from PSI-2 structure: GalrD, binds galactarate. Other enzymes in the superfamily work on the other end of this compound. - Ruchira S. Datta
project also involves amidohydrolase superfamily, with > 23000 divergent members, diverse substrates and reactions. - Ruchira S. Datta
this approach doesn't always work, thus Enzyme Function Initiative - Ruchira S. Datta
glue grant U54 includes 7 new investigators besides those in the previously described project - Ruchira S. Datta
deliverables: develop a robust sequence/structure-based strategy, disseminate this to the community, and collaborate with the community. - Ruchira S. Datta
superfamily/genome core (Patsy Babbitt): protein core (Steve Almo), computation core, and microbiology core - Ruchira S. Datta
also have superfamily bridging projects, for specific superfamilies. trying to predict substrates. - Ruchira S. Datta
other superfamilies often found no substrates, added isoprenol synthase superfamily which extends molecules so will be able to identify substrates - Ruchira S. Datta
funnel: high throughput analyses, medium throughput analyses, and low throughput analyses - Ruchira S. Datta
Ruchira S. Datta
Alexander F. Yakunin: Screening of purified unknown proteins for enzymatic activity
from Center for Structural Proteomics in Toronto - SPiT - Ruchira S. Datta
there are many approaches to experimentally determining protein function - Ruchira S. Datta
SPiT produces many structures of proteins with unknown or putative functions - Ruchira S. Datta
family approach: try to get 1 structure per protein family - Ruchira S. Datta
for this, need to clone: 20-24 orthologous genes, to purify 10-12, to crystallize 1-2 proteins - Ruchira S. Datta
SPiT enzymology group (funded by Genome Canada since 2003) had hundreds of purified proteins to screen, 3196 different enzymes (from last edition of IUBMB, 1992), and several/many specific assays - Ruchira S. Datta
limited set of assays to screen for general activity, then secondary screens to identify specific substrates or substrate profiles; determine substrate affinities and catalytic activities to the substrates; other analyses - Ruchira S. Datta
hydrolases are well-represented both in E. coli and human, and already have water.. - Ruchira S. Datta
also, screen for phosphatases: many E. coli metabolites contain P - Ruchira S. Datta
for the initial screen, use relaxed conditions - Ruchira S. Datta
secondary phosphatase screen with 75 natural substrates of phosphatases - Ruchira S. Datta
get a graph of a substrate profile: affinity to various substrates - Ruchira S. Datta
secondary phosphodiesterase screen: Tchigvintsev A. et al JMB, 2010 - Ruchira S. Datta
screening for nucleases: use 5' 32P-isotope labeled single stranded (ss) and double stranded (ds) RNAs and DNAs. Beloglazova N. et al JBC 2008 - Ruchira S. Datta
screen for esterases: carboxylesterases, lipases, thioesterases, phospholipases, using their substrates - Ruchira S. Datta
esterase substrate profiling using various esters: methyl-, ethyl-, butyl-, isopropyl-, ... over 40 from Sigma. - Ruchira S. Datta
screening for glycosyl hydrolase activity using 26 chromogenic substrates from Sigma. - Ruchira S. Datta
5-8% of purified unknown proteins are colored! suggests redox activity, cofactor activity or electron carrier activity - Ruchira S. Datta
then assay for electron carrier activity, e.g., rubredoxins, ferredoxins - Ruchira S. Datta
TLC screening of the E. coli Glk: a promiscuous kinase: plate shows ATP and ADP production vs various substrates - Ruchira S. Datta
Martin Steffen: what cofactor makes a protein blue? A: don't know yet. copotin (sp?) - Ruchira S. Datta
Tony Michael: difference between what a protein can do and what it does do. e.g. glutamylase possibility shown by Janet Thornton, but mass spec showed presence in very low concentrations. need to put in conjunction w/ metabolomics. A: yes, need to integrate more information in future, optimize specific conditions - Ruchira S. Datta
Ruchira S. Datta
Manuel Ferrer: High Throughput Arrays for Protein Identification: Helicobacter pylori ATCC 26695 as example
microbial genomics is like Google Maps: need to zoom out and in from land to environment to organism - Ruchira S. Datta
going from genomes to metagenomes, dramatically increased number of sequences - Ruchira S. Datta
SwissProt: <1 of 85,000 proteins experimentally characterized - Ruchira S. Datta
many-to-many mutations result in new biochemistry not known from the sequence - Ruchira S. Datta
functions vary based on collateral factors: genetic determinants, proteomic determinants, and metabolic determinants. the functional consequences of this microbial reprogramming are partially unknown. - Ruchira S. Datta
have array with dyes which switch on due to enzymatic activity - Ruchira S. Datta
metabolic analysis: weakly aniline region, quenched dye, substrate recognition zone binding substrate, and linker to glass slide. when enzyme catches the substrate, it is released from the substrate recognition zone and the dye is unquenched - Ruchira S. Datta
instead of slide, can use microliter plate - Ruchira S. Datta
metabolic network: these moleculas are in pathways, see which ones light up based on the detected enzymatic activity - Ruchira S. Datta
can use for phenotyping of microbial knockout mutants, protein function identification - Ruchira S. Datta
did this to H. pylori strain ATCC 26695 - Ruchira S. Datta
once substrate was identified, created nanoparticle w/ gold layer, silica layer, and metabolic complex, for tagging - Ruchira S. Datta
Thiele et al (Palsson lab) 2005 had 476 reactions, found over 200 more - Ruchira S. Datta
have substrates for which H. pylori is known to have activity, the protein responsible was not known. assayed: in some cases single protein was involved, in some cases many proteins - Ruchira S. Datta
this requires at least a few mg of proteins without interference by contaminants - Ruchira S. Datta
identified functions of at least a dozen previous hypothetical proteins in H. pylori - Ruchira S. Datta
Valerie deCrecy-Lagard: how can you decide something is an elongation factor just because it binds RNA and GTP? A: it could be something else, this is a tentative assignment. deCrecy-Lagard: elongation factors are known in H. pylori. A: there could be more than one, like DNA ligase. this is just a tentative affiliaction. - Ruchira S. Datta
Taedgh Begley: worried about this approach, as when functionalize substrates you often lose the ability of the enzyme to turn over the molecule. the bound substrate is acting as a noncompetitive inhibitor. A: totally correct. the method has its own biases. Martin Steffen: this is a basis for further experimentation. - Ruchira S. Datta
Q: compared with traditional computational functional annotation? A: no, haven't tried that. Q: could do reverse BLAST as Peter Karp suggested. - Ruchira S. Datta
Ruchira S. Datta
Bruno Sobral: Prokaryotic Annotation Status
How to Build a Bad Biological Database: make submission difficult. don't let your file formats interconvert. keep your database independent. totally trust your automated systems (hard to deal with this when funding agencies don't fund curators). do not develop good visualization tools. - Ruchira S. Datta
researchers' core requirement: data/tool integration, followed by workspaces to upload data and share w/ friends, literature search, data sharing, wizards/ease-of-use, ... based on in-depth user surveys with "wet" id researchers/countermeasure developers. therefore focused on top three. - Ruchira S. Datta
typical bioinformatics website: you get there and it's unclear what to do. have "needle in the haystack" problem. need to do web analytics to see that users are doing the equivalent of "buying the product" - Ruchira S. Datta
have layered cyberinfrastructure behind their website PATRIC, specific to infectious disease researchers' needs - Ruchira S. Datta
primordial ooze ("warm little pond") cyberinfrastructure concept: modular loosely-coupled components are developed that can be incorporated and reconfigured into the broader infrastructure. following the model of the bazaar vs the cathedral - Ruchira S. Datta
the definition of a pathogen is contextual, so target all bacteria - Ruchira S. Datta
there are 21,490 unique last authors in the target genome community, so will have more focus wrt establishing relationships as these don't scale - Ruchira S. Datta
the importance of throughput: price per basepair is falling dramatically - Ruchira S. Datta
collaborating w/ NCBI, UniProt, Jackson Labs; NaCTeM, SEED, New City Media, Virus BRC, Eukaryote BRC - Ruchira S. Datta
to the user looks like everything in one place - Ruchira S. Datta
PATRIC 2.0 website: - Ruchira S. Datta
future directions: assembler front end to go directly from sequencing reads to annotated genome; SNP pipeline for epidemiology/population studies; submit MicroArray and RNAseq data against reference genome for analysis; NCBI compliant output for direct submission to GenBank; automatic generation of metabolic models for submitted genomes - biolog data/phenotypes. mapping of expression and RNAseq data against reference genomes. - Ruchira S. Datta
automated metabolic modeling: >2500 metabolic models - Ruchira S. Datta
model similar to SEED, see their paper in Nature Genetics. - Ruchira S. Datta
coming soon to PATRIC 2.0: MG-RAST. In PATRIC, focus on science related to pathogens: surveillance, reservoir detection, and horizontal gene transfer - Ruchira S. Datta
as we go to terabase sequences, will be able to assemble genomes out of that. e.g., Rickettsia fell out while assembling tick genome, where it is in the gut - Ruchira S. Datta
coming soon: Pan-Genomes in PATRIC 2.0. give broader view of, e.g., annotation and ortholog groups, essential vs non-essential vs strain-specific - Ruchira S. Datta
turning research into website analys tools. working with the community tells us what we should do to improve the website. - Ruchira S. Datta
PATRIC annotation is done by RAST. It takes a couple of minutes per genome; one can download RAST and run it on one's laptop. - Ruchira S. Datta
all tabs are contextualized, e.g., have been looking at a genome - then everything is specific to that - Ruchira S. Datta
using JBrowse, so can add tracks: will put it in there with attribution. prefer programmatic/API style (don't want to deal with submission) - Ruchira S. Datta
Ruchira S. Datta
Steven Brenner: Function and dysfunction in protein annotation
2 million new putative proteins added to UniProt in the past year - Ruchira S. Datta
0.5% of these are experimentally characterized - Ruchira S. Datta
M. genitalium, early and very small genome annotated, was annotated somewhat independently by several groups. 1) TIGR sequences genome and make initial annotation 2) GeneQuiz consortium automatically annotates, crows triumphally 3) Eugene Koonin at NCBI manually annotates 4) GeneQuiz again automatically annotates (overwriting its annotations from 2)). Brenner compared 1, 3, and 4 - Ruchira S. Datta
examples of GeneQuiz errors found by Eugene Koonin: mg010 was annotated as DNA primase, but Koonin noted that it was truncated. another GeneQuiz annotated as histidine permease due to top BLAST hit, but Koonin noted these change their functions rapidly. - Ruchira S. Datta
sets of annotations by the 3 groups were incompatible - Ruchira S. Datta
groups have been under pressure to increase coverage, therefore the fraction of correct annotations has gone down (since there was no way to check which are correct) - Ruchira S. Datta
where do errors in M. genitalium genome annotation quality come from? 1) poor sequence comparison. probably alleviated by now. 2) propagation of erroneous data. 3) incorrect inferences of function from homology - this is hard to deal with. - Ruchira S. Datta
What's wrong with BLAST? (and COGs and ...) - Ruchira S. Datta
e.g., BLAST Q8X1T6 (Aspergillus nidulans): see top hit and many further down are annotated as this. one hit among top is adenine deaminase. but this is the actual experimentally characterized one. - Ruchira S. Datta
need to look at phylogenomics (cf. Jonathan Eisen). after duplication in the gene tree, there was a functional change. also known as genosperology (Steven Brenner), and phylogenetic genome annotation (because now phylogenomics means something different-whole genome based species phylogeny) - Ruchira S. Datta
BLAST is essentially measuring distance along the tree, but there may be substantial rate variations causing problems with this: the nearest hit in the tree may have a duplication event in between - Ruchira S. Datta
BLAST will systematically give you greater and greater confidence in the wrong annotation - Ruchira S. Datta
challenge of phylogenetic annotation: expert knowledge depends on the expert and is time-consuming - Ruchira S. Datta
their method: start with BLAST, then HMMER, and finally SIFTER--go up and down the tree - Ruchira S. Datta
tested SIFTER carefully on adenine deaminase family predictions. there are 5 proteins w/ experimental evidence in GO database, and others in the literature. got one error. also experimentally verified one prediction. SIFTER got 5% wrong. - Ruchira S. Datta
have now introduced approximations to make SIFTER scale to large families. tested on larger family: 18000 proteins in Nudix protein family. general role: nucleotide pool sanitation; e.g., MutT degrades bad nucleotides. - Ruchira S. Datta
at 99% specificity: BLAST has 2% of specific annotations correct, and 2% of general ones; SIFTER has 24% of specific annotations correct, and 62% of general ones. - Ruchira S. Datta
if you try to predict everything, SIFTER has 47% accuracy, BLAST has 34% accuracy - Ruchira S. Datta
GO was significantly depleted for describing this family: much in the literature was not present in GO, known specific terms were only described with general terms, and there were actually errors in GO. an annotator read the paper and typed symmetrical activity rather than asymmetrical activity. - Ruchira S. Datta
Brenner's own grad student made 3 mistakes (detected because SIFTER got them right). just copying from literature into controlled vocabulary is error-prone. - Ruchira S. Datta
of 124 proteins annotated based on literature, 11 proteins did not contain a correct, specific annotation and in fact had an incorrect annotation at a specific level. - Ruchira S. Datta
if I find a Nudix enzyme of kcat/KM of 800 on 8-oxo-dGTP, how do I annotate? audience member suggests: look for the functional context. Brenner: almost every organism needs this, but also have paralogs, activity is much lower than expected. Michael Galperin: the answer is not to annotate. - Ruchira S. Datta
Michael Galperin: two of the examples may be a bit biased. adenine or adenosine deaminase--who cares? significant cries of protest from audience. Also, with Nudix, a lot of experimental data is itself wrong. Better not to annotate. - Ruchira S. Datta
Ruchira S. Datta
Peter Karp: Identification of Annotation Errors and Incompleteness Through Network Analysis,,,
besides sequences of unknown function, there are functions of unknown sequence. e.g., dozens of enzymes in E. coli whose enzymatic activity has been verified but the gene performing them is not known. - Ruchira S. Datta
other problems: correct annotation errors, make vague annotations more precise, and missing multi-functional genes - computational analysis systematically misses these. - Ruchira S. Datta
use systems level knowledge of the organism to critique the genome annotation. - Ruchira S. Datta
Briefings in Bioinformatics 11:40-79 2010 Pathway Tools software. identifies these problems using: 1) pathway holes 2) dead-end metabolites 3) reachability analysis 4) universal bacterial genes - Ruchira S. Datta
important addition to COMBREX is integrating the data for organisms of interest into a central database. why is there no curated database for H. pylori? - Ruchira S. Datta
Method 1: Pathway Holes. Definition: Reactions in the metabolic pathways for which no enzyme is identified in the genome. - Ruchira S. Datta
1) query UniProt for all sequences having EC# of pathway hole. 2) BLAST these against target genome. 3)-4) consolidate hits and evaluate evidence by combining information. Use Bayes classifier to evaluate P(protein has function X | e-value, average rank, alignment length, number of queries that hit it, adjacent reactions, pwy direction) - Ruchira S. Datta
why should the hole filler find things beyond the original genome annotation? reverse BLAST searches are more sensitive: search against the target genome rather than searching the target genome sequences against the large database, due to database length. reverse BLAST searches find second domains ignored in original annotation. integration of multiple evidence types. Bioinformatics 23:i205 2007: genome neighbors did not increase accuracy, but increased scope. - Ruchira S. Datta
BMC Bioinformatics 5:76 2004: Caulobacter crescentus Pathway Holes; has 130 pathways containing 582 reactions. 92 of those had 236 holes. - Ruchira S. Datta
previous genes that became hole fillers had no assigned function, mis-annotated function, or vague function - Ruchira S. Datta
related method: query PGDB for enzymes lacking sequence. these may or may not be pathway holes -- they might be single reactions. standing list of unsequenced E. coli enzymes:, now have 34 - Ruchira S. Datta
method 2: dead end metabolites: a small molecule that is only consumed or only produced by the metabolic network (including transport) - Ruchira S. Datta
also metabolite that is transported into the cell but never consumed by it. - Ruchira S. Datta
dead-end metabolites in EcoCyc. often these are due to missing transporter. past transport reactions added to EcoCyc: acetoacetate, dimethylsulfoxide, lipoate, 3-phenylpropionate, phenylethylamine, psicoselysine and fructoselysine,... had missed these from the literature, but found them once knew to look for them. - Ruchira S. Datta
ran on E. coli again 2 weeks ago. found 145 dead-ends total, of which 16 dead-ends are in metabolic pathways (Ian Paulson resolved one within the last week). - Ruchira S. Datta
Method 3: Reachability Analysis of Metabolic Networks. Romero and Karp, Pacific Symposium on Biocomputing, 2001. - Ruchira S. Datta
given: a PGDB for an organism and a set of initial metabolites, identify: a set of products that *can be* synthesized by the small-molecule metabolism of the organism. helps to characterize growth media and QC PGDB. currently can't handle metabolites required for their own synthesis. - Ruchira S. Datta
Algorithm: forward propagation through the system. start with reactants, then "fire" the reactions, produce compounds, and iterate. - Ruchira S. Datta
there are 41 essential compounds in E. coli - Ruchira S. Datta
visually display the fired and unfired reactions on the network. unfired reactions show which reactants were missing, trace backward to see where that went wrong - Ruchira S. Datta
when first ran in 2001, 21 of the 41 essential compounds were not reachable. fixed problems. - Ruchira S. Datta
Method 4: Known Universal Bacterial Genes. bring to user's attention that gene thought to be universally present has no homologs in the target genome. NCBI's Bill Klimke is assembling this list. - Ruchira S. Datta
limitations of evidence codes: Gene Ontology and EcoCyc make heavy use of evidence codes. had been assuming that experimental evidence code implies solid functional annotation - Ruchira S. Datta
but no - even IDA (Inferred from Direct Assay) codes can be attached to genes whose function is imprecisely defined. - Ruchira S. Datta
Note distinction between molecular function and biological process. - Ruchira S. Datta
Even molecular function can be vague, "protein binding", "ATP binding", "nucleotide triphosphatase activity" - Ruchira S. Datta
e.g., YedY is a periplasmic molybdoprotein with reducatse activity shown for several substrates, but nobody knows what the actual substrate is in vivo - Ruchira S. Datta
173 E. coli y-genes have experimental evidence codes, 37 y-genes have IDA codes, a minority do have known functions but are awaiting name assignments. - Ruchira S. Datta
multiple participants in a complex - may not know the exact role of individual constituents - Ruchira S. Datta
Q: status of E. coli? A: hard to say, because even genes with experimental evidence codes don't really have known functions. about 20% of E. coli genes have unknown function - Ruchira S. Datta
Jonathan Eisen: how is status of E. coli changing over time? A: learning more, but statistics fluctuating due to changing definitions? Jonathan Eisen: due to newly predicted genes? A: not many of those - Ruchira S. Datta
Ruchira S. Datta
Richard Roberts: COMBREX: A community-based project to improve functional annotation
essay in PLoS Biology 2004: "Identifying Protein Function--A Call for Community Action" - Ruchira S. Datta
his complaint: people were just using computational annotations w/o worrying about whether they were true. requires experimental validation - Ruchira S. Datta
microbial genome growth is exponential - Ruchira S. Datta
in every genome, large proportion of genes have unknown function. no way to do computational systems biology - Ruchira S. Datta
H. pylori 26695: 552 unknown genes (35%), known or good predictions: 1021; total: 1573 - Ruchira S. Datta
H. pylori is pathogen, but also interesting organism - Ruchira S. Datta
Martin Blaser has found, in Western world we get rid of H. pylori, and this is associated with increasing rates of asthma. Maybe want to reinfect ourselves, carefully. - Ruchira S. Datta
We still don't know everything about E. coli, e.g., in MG1655, 13% (550) genes unknown, out of total 4144. - Ruchira S. Datta
why so little progress in function determination? - Ruchira S. Datta
the inherent difficulty of the problem. 500 unknown genes - no single lab has the expertise - Ruchira S. Datta
cross-disciplinary set of skills required, e.g., for a helicase expert, they have all the reagents required and all the skills for helicase - Ruchira S. Datta
the appeal of genome-wide studies. but do we get a lot of understanding out of this? Roberts argue we get very very little understanding of it. we need to do what is necessary so we actually can understand these data. - Ruchira S. Datta
the lack of appropriate funding mechanisms. biochemistry is considered boring and old-fashioned. microbes were understudied: only recently, w/ human microbiome, did NIH realize that microbes are very important for human health. - Ruchira S. Datta
elements of a solution: high throughput by parallelization - Ruchira S. Datta
bioinformaticians should make *high quality* predictions. the bioinformatics community has done a good job of being inventive. very fortunate in Roberts's own field, the restriction enzymes, as they typically have no homologs but they sit next to DNA methyltransferases. thus, infer that something is a restriction enzyme from the fact that it has no homologs and sits next to a DNA methyltransferase. - Ruchira S. Datta
then, assemble a database of those predictions. - Ruchira S. Datta
biochemists should test predictions in their field of expertise. any biochemists has in their refrigerator a set of substrates, reagents, etc. for a set up expert lab, it doesn't take them that much time or effort to test the predictions. - Ruchira S. Datta
offer financial incentives to participate: give $5-10K to a lab to test individual predictions. e.g., even a rotation student could write a proposal and get a grant to bring into the lab. - Ruchira S. Datta
then the student can publish the paper, e.g., in PLoS One. have thought of starting an open access journal just for this purpose. Google has nice templates that make this very easy. - Ruchira S. Datta
how can we improve predictions? - Ruchira S. Datta
1. recognize that most current annotation is based on overall sequence similarity to previously annotated genes. - Ruchira S. Datta
2. recognize that much annotation may be wrong because of 1. - Ruchira S. Datta
3. provide a firm basis for similarity-based annotation by identifying the biochemically-characterized gene that led to the annotation. - Ruchira S. Datta
4. provide a means for reporting mis-annotations. - Ruchira S. Datta
5. we need a "gold standard" set of genes/proteins - Ruchira S. Datta
they went around to different databases. no database actually has the forensic record that the experimental characterization was done in one particular strain. - Ruchira S. Datta
The "Gold Standard" protein dataset: A set of proteins whose function has been determined biochemically and whose sequence is known to be accurate. - Ruchira S. Datta
1. For each protein in the set we find the reference that describes the determination of function. - Ruchira S. Datta
2. We make sure that the *strain* is well defined and noted. - Ruchira S. Datta
3. We make sure that the DNA/protein sequence determination was done on the gene in which the biochemical function was characterized. - Ruchira S. Datta
Peter Karp of EcoCyc has helped identify candidates to include in the Gold Standard. - Ruchira S. Datta
UniProt will prepare templated candidate entries. they have not kept track of which paper had the original functional annotation, which strain it was in. - Ruchira S. Datta
Curators will manually check: a) that the strain information is accurate, b) that the biochemical characterization is accurate, c) that the gene was in fact sequenced in the strain from which the candidate protein was isolated. This info will be deposited in UniProt and also in COMBREX database. - Ruchira S. Datta
How you can help: send in candidate gold standard genes/proteins; volunteer to serve as a manual curator (email Rich Roberts:; browse the COMBREX website and report genes you think should be in the gold standard. - Ruchira S. Datta
COMBREX - which genes first? now soliciting as many predictions as possible. - Ruchira S. Datta
then use criteria: how many organisms contain the gene (bacteria, plants, humans)? is it currently unknown in E. coli? is it currently unknown in H. pylori? the availability of a clone or purified protein, are structures available for members of the protein cluster, is it known or predicted to be essential, other criteria - Ruchira S. Datta
experimentalists can then submit bids. COMBREX doesn't want many experimentalists working on the same prediction, waste of manpower and resources. - Ruchira S. Datta
the Structural Genomics Initiative has lots of purified protein sitting in their freezers as only one in ten purified proteins forms crystals. can look at TargetDB to find these. - Ruchira S. Datta
who can help? computational biologists, biochemists, geneticists? university students, high school students in science fairs, professors emeriti, funding agencies - Ruchira S. Datta
currently have stimulus money for two years, looking to extend that. Howard Hughes is interested, see this as good way of teaching. Wellcome Trust may be interested. Will look in Middle East. This can be a worldwide effort, collaborative rather than competitive. This effort is just the way science should be. - Ruchira S. Datta
Kimmen Sjölander asks: will list the top targets? A: there will be a list of the 100 top targets on the website. - Ruchira S. Datta
Jonathan Eisen: your list of organisms is phylogenetically narrow, only Proteobacteria? A: no, include every sequenced genome in bacteria or archaea, 1200. E. coli & H. pylori focus as NIH likes pathogens. - Ruchira S. Datta
Ruchira S. Datta
Martin Steffen introduces the COMBREX Workshop #LAMG10
Rich Roberts, Chief Scientific Officer of New England BioLabs, came up with idea for COMBREX in last decade and wrote a paper about it. Simon Kasif joined him in leading the project. - Ruchira S. Datta
Iddo Friedberg
Fwd: Gautam Dantas on antibiotic resistance. #LAMG10 (via
first penicillin use in 1942. First resistance report in 1929 - Iddo Friedberg
Dantas talking about how pathogens evolve resistance: LGT or mutations. Claims we under-estimate teh true percentage of LGT - Iddo Friedberg
50% of antibiotics are natural products of Actinomyces - Iddo Friedberg
Can microbes use antibiotics as a sole source of carbon? - Iddo Friedberg
Ruchira S. Datta
Valerie de Crecy Lagard: A Tale of Two Proteins Discovering the Function of Two Protein Families of Unknown Function Conserved Throughout the Tree of Life
transition between the evolution and function sessions - Ruchira S. Datta
20-60% of genes in any given genome are orphans, of unknown function. this problem is increasing - Ruchira S. Datta
also have problem of misannotations: see Schnoes,...,Babbitt 2009 - Ruchira S. Datta
problems of misannotations: over-annotation, experimental mistakes, and paralog problem - Ruchira S. Datta
many examples in tRNA modifications: many orphan enzymes - Ruchira S. Datta
N6-threonylcarbamoyladenosine is widespread - Ruchira S. Datta
it was proposed that threonine and bicarbonate were precursors - Ruchira S. Datta
Galperin & Koonin NAR 2004 Oct 12;32(18):5452-63. YrdC and YgjD were in top ten conserved families of unknown function - Ruchira S. Datta
t6A biosynthesis occurs universally and is missing everywhere, so should be in all genomes. so looked for genes of unknown function present in all small genomes (such as Mycoplasma) as well as E. coli, B. subtilis, and yeast - Ruchira S. Datta
YrdC: history of many different assigned functions - Ruchira S. Datta
YrdC/Sua5 has homology to carbamoyl binding enzymes - Ruchira S. Datta
COG0009 hinted that this was ATP binding, and this was experimentally verified - Ruchira S. Datta
sua5 knockout mutant was unavailable as it was considered essential, but in fact the mutant is sick but can be grown. t6A biosynthesis was present in wild type but not in the sua6 knockout El Yacoubi et al NAR 2009 - Ruchira S. Datta
the gene is essential in E. coli, were able to complement the essentiality phenotype. - Ruchira S. Datta
The paralog problem: 30% of sequenced bacteria have two COG009 members YciO and YrdC. But YciO is not a functional homolog of YrdC: it is not able to complement the YrdC knockout mutant in yeast. This COG should be split. - Ruchira S. Datta
purified sua5 and yrdC, but assays for ATP and GTP hydrolysis failed to detect any activity so far - Ruchira S. Datta
Alexey Murzin, the SCOP guru, had predicted two families were involved in t6A biosynthesis by looking at structure - Ruchira S. Datta
he linked YgjD with YrdC as they are found fused in some organisms - Ruchira S. Datta
Aravind, Koonin J Mol Biol 1999 predicted YgjD ATP binding using PSI-BLAST, this was recently confirmed. - Ruchira S. Datta
Kae1/YgjD/Qri7: a history of misannotation and pleiotropic effects. wrong annotations led to a lot of lost postdoc-years. - Ruchira S. Datta
El Yacoubi 2010 submitted: checked kae1 knockout complementation w/ Qri7, YgjD - Ruchira S. Datta
anhydro-tetracycline induces growth. failed to get complementation except in E. coli - Ruchira S. Datta
COG0533 phylogeny: YeaZ looks like inactive paralog of YgjD. also clusters physically on chromosome with it. - Ruchira S. Datta
Rajagopala SV BMC genomics 2010, El Yacoubi 2010: if put *both* ygjD and yeaZ get weak complementation, none with either alone - Ruchira S. Datta
now have t6A biosynthesis genes, so can make mutants and thus elucidate t6A role in vivo - Ruchira S. Datta
essential in E. coli, associated with telomere defects, respiration defects - Ruchira S. Datta
modifications of anticodon stem loop change its interaction with messenger RNA - Ruchira S. Datta
get induced frameshift - Ruchira S. Datta
t6A minus strain misinitiate at GUG codons in yeast - Ruchira S. Datta
analysis of ANN codon distribution in yeast coding sequences. - Ruchira S. Datta
both mutants have telomerase defects in yeast - Ruchira S. Datta
structure of telomere looks like anticodon stem loop - Ruchira S. Datta
so on one hand, beginning to understand t6A biosynthesis and role of t6A in core biological processes. on the other hand, getting closer to discovering the functions of the universal orphan families YrdC/Sua5 and YgjD/Kae1. we have the biological function, but not the molecular function. - Ruchira S. Datta
Q: is t6A modification known to occur in f-met-tRNA A: no, it does not occur - Ruchira S. Datta
Taedgh Begley: when try reconstitution, tried crude extracts? A: did try, some complications - Ruchira S. Datta
Ruchira S. Datta
Federico Lauro: Integrative Approaches to the Studies of Natural Microbial Communities #LAMG10
Lauro is an oceanographer - Ruchira S. Datta
specifically, a microbial oceanographer. will talk about a microbial community that lives in Antarctica, in the Vestfold Hills, where there are a lot of little lakes formed 10,000 years ago at end of last Ice Age - Ruchira S. Datta
2000 years later, sea level rose, marine water invaded all the lakes - Ruchira S. Datta
about 5700 years ago, became lakes of marine origin again - Ruchira S. Datta
often frozen over - Ruchira S. Datta
stratification occurs every year due to freshwater influx from melting of snow - Ruchira S. Datta
Ace Lake is only 26m deep at deepest spot, but at a certain depth salinity rises steeply. some water never mixes all year, therefore it's a meromictic lake - Ruchira S. Datta
below 12.7m, the lake is anoxic - Ruchira S. Datta
top of lake is half salinity of seawater, bottom of lake is 1.5 times salinity of seawater. seawater is denser when colder - Ruchira S. Datta
the separation of layers is called the oxycline and the halocline (steep drop in oxygen and increase in salinity) - Ruchira S. Datta
also have change in turbidity at these layers, so bacteria and archaea at bottom cannot do any photosynthesis - Ruchira S. Datta
used JCVI GOS techniques to sample the lake - Ruchira S. Datta
several stages of filters: 20 micron pre-filter, 3-20 micron filter, 0.8-3 micron filter, <0.8 micron filter. (largest organism in lake is very tiny crustacean). sample each size fraction separately - Ruchira S. Datta
had both Sanger and 454, similar results - Ruchira S. Datta
>3K 16S and 18S sequences, very few 18S's. no eukaryotes below the oxycline halocline. microbially dominated ecosystem - Ruchira S. Datta
high-level taxonomic binning, e.g., virus, eucarya, bacterial phyla - Ruchira S. Datta
plotted Simpson index of diversity in each size fraction, this falls at the 12.7m level - Ruchira S. Datta
proteobacteria in all zones, but near surface mostly alpha, SAR11 clade; deeper, SAR11 almost disappears, and gamma, epsilon and delta increase. - Ruchira S. Datta
similarly, viruses infecting small photosynthetic eukaryotes occur only near surface - Ruchira S. Datta
Chlorobia do anoxis photosynthesis - Ruchira S. Datta
lots of methanogenic Euryarchaeota in anoxic zone - Ruchira S. Datta
functional stratification: different kinds of transporters in surface vs the bottom - Ruchira S. Datta
bottom layer: most reads cannot be classified either functionally or phylogenetically, though they cluster with each other - Ruchira S. Datta
some other samples cluster w/ reads from surface, suggesting they may be from dead surface organisms that sink down - Ruchira S. Datta
various strategies for surviving in marine environment. e.g., oligotrophs eat very little and are wise, like Yoda. - Ruchira S. Datta
copiotroph eats as much as possible whenever possible, like Jabba the Hut - Ruchira S. Datta
markers of trophic strategy distringuish oligotrophs from copiotrophs - Ruchira S. Datta
oligotrophs are frugal, not usually motile - Ruchira S. Datta
copiotrophs hunt food, attach to particles, secrete enzymes and degrade them; go find the particles they want to attach to - Ruchira S. Datta
w/ enough nutrients they grow out of control and are easy prey for phages - Ruchira S. Datta
PNAS (2009) 106:15527-15533 - Ruchira S. Datta
used self-organizing maps. could separate oligotrophs from copiotrophs in sequenced genomes. - Ruchira S. Datta
metagenomics: all the lake is oligotrophic, but the small fractions are more oligotrophic than the large fractions which are a bit more copiotrophic - Ruchira S. Datta
carbon cycle in ace lake: both aerobic respiration / aerobic carbon fixation (only at surface) - small contribution to pathways; most carbon fixation occurs anaerobically - Ruchira S. Datta
nitrogen is limiting in lake. had pathways for fixing nitrogen, but puzzlingly none for nitrification or ammonification. explained: having whole nitrogen cycle is energetically costly, instead keep the nitrogen in a reduced state - Ruchira S. Datta
sulfur cycle: both sulfur oxidation and sulfur reduction are very prevalent. dissimilatory sulfur reduction throughout the anoxic zone - Ruchira S. Datta
the 12.7m line keeps coming out. diversity in layer very low. complete absence of viral signatures. viruses should go through the filters, but may be attached to particles and thus captured by filter anyway. - Ruchira S. Datta
did Cyber Green staining. microscopic confirmation: see viruses at all size fractions except no viral particles seen at 12.7m - Ruchira S. Datta
ISME J (2010) 4:1002-1019: C-Ace, an almost clonal environmental genome - Ruchira S. Datta
assembled almost whole genome of dominant organism; one single species, may be one single clone. green sulfur bacteria - Ruchira S. Datta
Coolen et al 2006 had found same lipid signatures of this same green sulfur bacteria from 5.6kya in fossil record - Ruchira S. Datta
seasonal dynamics: in Antarctica, have oscillations of light every 6 mos rather than diurnally - Ruchira S. Datta
model population effects of diurnal and annual cycles as well as viral predator. this predicts booms and busts. hypothesize that are at a bust, thus only single bacterium in the lake - Ruchira S. Datta
need to integrate genomics, transcriptomics, proteomics: panta-omics ("WORST OMICS EVER") - Ruchira S. Datta
Eric Alm: checked metal cycle? A: have copper exporters, but haven't checked e.g. iron cycle. - Ruchira S. Datta
Jonathan Eisen: what do the green sulfur bacteria do over winter? A: still have to select the unfortunate person to go check it out - Ruchira S. Datta
Magdalene So: how do you avoid contaminating the lake? A: a lot of work, sterilize everything before sending it there. all water pumped out, we don't put back in the lake but take back to the lake. for another lake, pumped out 700kg hypersaline water, tried to transport in helicopter w/ 500kg capacity - Ruchira S. Datta
Ruchira S. Datta
Jonathan Eisen: The Importance of History (and other obsessions) #LAMG10
obsession: bacterial evolution - Ruchira S. Datta
evolution of Lake Arrowhead: BLAST the peptide "LakeArrowhead" - Ruchira S. Datta
build an evolutionary tree of LakeArrowhead. surprisingly, doesn't tell us much about the history of Lake Arrowhead - Ruchira S. Datta
the Tree of Life, or evolution of microbes: in 2002, known that there were at least 40 phyla of bacteria, yet most sequenced genomes came from 3 phyla. the Tree of Life was not happy - Ruchira S. Datta
why increase phylogenetic coverage? this is a common approach in eukaryotes, animals, plants, fungi - Ruchira S. Datta
in 2002 got NSF Tree of Life funded grant to sequence genomes from each of 8 previously unsequenced phyla - Ruchira S. Datta
after this, the Tree of Life was still pissed off: one per phylum is too sparse, Archaea severely undersampled, etc. - Ruchira S. Datta
joined JGI, with them started GEBA: Genomic Encyclopedia of Archaea and Bacteria. the solution: really fill in the Tree of Life - Ruchira S. Datta
overvie: identify major brances in rRNA tree for which no genomes available, id those w/ cultured rep in DSMZ, the DSMZ grew these up and preprepped them, so sequenced these. - Ruchira S. Datta
the entire JGI conceptual and production pipeline went into this. before Human Microbiome Project, this was first large multi-bacterial genome project - Ruchira S. Datta
all data released as quickly as possible, "you can publish papers on it before we do--we encourage it" - Ruchira S. Datta
phylogenetic distance in rRNA Tree of Life is similar to that in Whole Genome Tree, although topology a bit different - Ruchira S. Datta
identify extent of lateral gene transfer. if this were rampant, then position in rRNA tree would not be good predictor of gene novelty. did protein family rarefaction curve: how many new protein families appear as add new genomes. the rate of recovery is higher at higher levels of distance in the rRNA tree. - Ruchira S. Datta
this means synapomorphies exist, which Eric just basically told you as well: new genes were born in certain branches of the Tree of Life. if you haven't sampled that branch, you're missing all those genes. - Ruchira S. Datta
GEBA Lesson 3: phylogeny-driven genome selection improves functional annotation. Nikos Kyrpides and group took 56 GEBA genomes vs 56 randomly sampled new genomes. every measure of functional annotation improves more w/ phylogenetically diverse genomes vs narrow ones - Ruchira S. Datta
GEBA Lesson 4: metadata and individual genome papers important. is open access journal for people to publish metadata about how genome was sequenced and metadata about it. metadata standards being developed. - Ruchira S. Datta
GEBA Lesson 5: phylogeny-driven genome selection improves analysis of metagenomic data. - Ruchira S. Datta
uses of phylogenetic classification in metagenomics: assign reads to phylogenetic groups using multiple genes. phylogenetic binning - especially important if no reference genomes. phylogenetic ecology, as heard from Steve Kembel - Ruchira S. Datta
but all this is hampered by poverty of reference genomes - Ruchira S. Datta
improved metagenomic analysis w/ more reference genomes, though not as much as anticipated - Ruchira S. Datta
need to alter analysis methodology to make best use w/ metagenomic data - Ruchira S. Datta
phylogenetic binning using AMPHORA - Ruchira S. Datta
improving phylogeny for metagenomic reads: w/ reference trees - AMPHORA, Erick Matsen's PPlacer, Morgan Price's FastTree - Ruchira S. Datta
or variants: Steve Kembel, Thomas Sharpton, and maybe 30 groups working on this - Ruchira S. Datta
adding more protein families improves ability to incorporate the newly sequenced genomes. identify new markers satisfying criteria like evenness of copy number, universality, etc. - Ruchira S. Datta
other needs: more experimental data--will hear about tomorrow - Ruchira S. Datta
other future direction: the dark matter of the biological universe - Ruchira S. Datta
they had thought they were doing a good job of getting into the deep parts of the tree and sampling them well - Ruchira S. Datta
they sorted existing genomes by phylogenetic distance. added a lot of phylogenetic diversity using cultured organisms. however, would need 10,000 more genomes to sample the diversity present in uncultured organisms - Ruchira S. Datta
phylogenetic diversity is one of the important factors to consider when deciding what to sequence next, and will be very fruitful - Ruchira S. Datta
need field guide to the microbes. then we will have a happy Tree of Life - Ruchira S. Datta
Simon Kasif asks: how do you know you have a new protein family, and it's not just a fast-evolving one? A: great question, we really don't. for paper, did all-vs-all search of proteins, then used Markov clustering algorithm. tried many different parameters, didn't affect broad conclusions. also trying to search all genomes against known protein families. but the known protein families are a somewhat biased sample. we haven't ruled out rates of evolution corrupting this analysis, would love to discuss. - Ruchira S. Datta
Ruchira S. Datta
Eric Alm: Microbial genome changes on geological timescales: A story about how life learned to breathe and thrive as told by hundreds of microbial genomes
will talk now mainly on lab's work on macroevolution (they work with microevolution as well) - Ruchira S. Datta
Max Delbruck "Any living cll carries with it the experience of a billion years of experimentation by its ancestors." - Ruchira S. Datta
infer the evolutionary history of each major gene family. then ask: do genomes harbor signatures of major biogeochemical events (Great Oxidation Event, metal solubility, etc.) - Ruchira S. Datta
work of student Lawrence David: AnGST (and AdaptML, HuGE) - Ruchira S. Datta
also collaborating with Manolis Kellis, assembling the Tree of Life - Ruchira S. Datta
reconstructing a gene's history: basic concepts. start with some idea of tree of species. then many possible gene trees, e.g. one that simply reflects speciation, one that reflects almost all the species due to speciation and loss, or one that is only in a small subset of species as the gene was born later (after the most recent common ancestor of all those species) - Ruchira S. Datta
in last scenario, could imagine many gene losses, but later birth of gene is more parsimonious explanation - Ruchira S. Datta
another possible scenario: two copies of subtree of species tree, due to duplication event at the least common ancestor of that subtree - Ruchira S. Datta
finally, can have duplication and gene loss, which is especially hard to see if the gene tree is incorrectly rooted -- need to infer root as well - Ruchira S. Datta
Baldauf et al PNAS 1996: ancient duplication of transcription elongation factor seen in all three domains--before LUCA (last universal common ancestor) - Ruchira S. Datta
because bacteria are an outgroup in this gene tree, we may infer that the Bacteria are an outgroup in the species phylogeny - Ruchira S. Datta
"Origins and evolution of the recA/RAD51 gene family: Evidence for ancient gene duplication and endosymbiotic transfer" - here tree reconciliation is a mess - Ruchira S. Datta
to find history of every gene family, we need to infer each horizontal gene transfer (HGT) event - Ruchira S. Datta
might use nucleotide composition, presence/absence data, topological methods, or "expert" knowledge - Ruchira S. Datta
nucleotide composition only worked early on when only a few genomes were known - Ruchira S. Datta
presence/absence: instead of tree, look at phylogenetic profile. e.g., 1 HGT vs 4 independent loss events: might choose HGT explanation instead - Ruchira S. Datta
this is a powerful technique, but fails on genes like recA: fairly universal, but topology of recA tree looks nothing like that of species tree. fails on ubiquitous but mobile genes (recA is present everywhere) - Ruchira S. Datta
topological methods: work when most of the genes are present, but fail on sparsely distributed genes, in this case never favor horizontal gene transfer over multiple losses. they work best with ubiquitous genes. - Ruchira S. Datta
"expert" knowledge depends on the expert! can we build the expert knowledge into an algorithm? - Ruchira S. Datta
the AnGST algorithm: Analyzer of Gene and Species Trees - Ruchira S. Datta
evaluate multiple scenarios to explain the gene and species trees: count the cost of the events, generalized parsimony algorithm. solve w/ dynamic programming - Ruchira S. Datta
AnGST output: a gene tree-species tree reconciliation - Ruchira S. Datta
reference tree: start w/ reference Tree of Life from Pe'er Bork's group a few years ago, Ciccarelli et al - Ruchira S. Datta
needed to put times (Million years ago - Ma) on the tree - Ruchira S. Datta
not a lot to calibrate against, but used PhyloBayes to infer times - Ruchira S. Datta
problem: reference 'Tree of Life' is based on 1% of gene families. solution: repeated analysis using dozens of alternate trees and various evolutionary models - Ruchira S. Datta
problem: fitting model parameters: cost of HGT versus cost of loss? solution: choose to avoid abrupt changes in genome size. but of course sharp changes do happen, e.g. at start of parasitic lineages - Ruchira S. Datta
problem: large phylogenetic uncertainty in gene trees. solution: identify gene trees that correspond to the simplest reconciliation (conservative about identifying HGT, DUP, LOS events) - Ruchira S. Datta
processed 3870 microbial gene families (from EGGNOG) drawn from 100 eukaryotic, archaeal, and bacterial genomes - Ruchira S. Datta
inferred events; decorate Tree of Life with these. much birth early on. most duplication in eukaryotes; duplication events in bacteria are more recent (i.e., the duplicate copies don't tend to persist) - Ruchira S. Datta
quite a bit of HGT in eukaryotes. where from? alpha-proteobacteria - same bug that was the mitochondrial symbiont. similarly cyanobacteria - plant lineage - Ruchira S. Datta
plots events at times in billion years ago. birth of genes mostly during "archaean gene expansion" in the archaean eon. prominent functions: redox, electron transfer, e.g., top one is ubiquinone, ferrocytochrome c, FADH. expansion of genetic diversity fueled by invention of electron transfer - Ruchira S. Datta
oxygen came later, right at end of expansion 2.8 billion years ago, a little prior to most estimates of when O2 levels started to rise in the Great Oxidation Event - Ruchira S. Datta
genes metabolizing transition metals: metals soluble in reduced oceans such as manganese were prominent early on. an exception is iron--perhaps it became so important that it could not be avoided. - Ruchira S. Datta
Q: prior probability of HGT takes into account ecology or geology? A: a generalized parsimony approach that puts a penalty on HGT. - Ruchira S. Datta
Tony Michael: what came first, single- or double-membrane bacteria? A: i'm having some technical difficulty ;-) according to the reference tree used, would suggest single membrane came first, but haven't looked explicitly at the genes themselves. looking for the Tree of Life that optimizes the reconciliation of all the genes, will hold off until then. - Ruchira S. Datta
Q: these methods assume sequence profile and constant rate of change over time. suppose in one genus, recA develops new regulatory interactions that change its rate? A: we do estimate times, but not independently for each gene. have a couple of papers on problem. struggling as remapping onto likelihood and Bayesian, but not an issue with parsimony. - Ruchira S. Datta
Iddo Friedberg
Ruchira S. Datta
Michael Galperin: Evolution of Bacterial Signal Transduction #LAMG10
most signal transduction proteins are multidomain, and are very difficult to annotate - Ruchira S. Datta
sequence similarity does not indicate a common function, annotation of multidomain proteins is unreliable - Ruchira S. Datta
have large collection of "creative" annotations of multi-domain proteins, e.g., valium receptor in Methanosarcina??! - Ruchira S. Datta
Koretke,...,Brown: "Evolution of Two-Component Signal Transduction": co-evolution and co-transfer - Ruchira S. Datta
Ulrich et al proposed two-component systems evolved from one-component systems - Ruchira S. Datta
two-component systems consist of histdine kinase (periplasmic sensor domain + His kinase domain) and response regulator (CheY-like receiver domain and output domain, usually DNA binding) - Ruchira S. Datta
some domain: CHASE3 present both in histidine kinases and methyl-accepting chemotaxis proteins. i.e., signal may affect behavior either through transcriptional regulation or work directly on chemotaxis - Ruchira S. Datta
c-di-GMP is a universal bacterial secondary messenger: Roming,...,Galperin - Ruchira S. Datta
bacterial signal transduction system includes at least 5 types of receptors - Ruchira S. Datta
histidine kinase regulate transcription of specific operons, adenylate cyclase regulate transcription of numerous genes, methyl-accepting chemotaxis proteins regulate chemotaxis, etc. there may be others we don't know - Ruchira S. Datta
the same CHASE2 signal receptor is involved in different pathways - Ruchira S. Datta
the total number of signal transduction proteins grows as [Genome size]^2 - Ruchira S. Datta
some organisms have more than expected number - smart; some have less - stupid. thus define bacterial IQ - Ruchira S. Datta
e.g., Wolinella succinogenes is smart, Mannheimia succinoproducens is very stupid - Ruchira S. Datta
the highest IQ organisms (highest adaptability index) also have more complex metabolisms - Ruchira S. Datta
why quadratic dependence? growth of anything leads to the growth of the government. - Ruchira S. Datta
Eric Alm,..,Adam Arkin "The Evolution of Two-Component Systems in Bacteria Reveals Different Strategies for Niche Adaptation" also found HisK fraction growing quadratically with genome size - Ruchira S. Datta
conservation of signal transduction protein family profiles in Mycobacterium: the pathogens evolved from free-living ancestors by losing some genes - Ruchira S. Datta
M. marinum lives in open sea, large number of signal transduction proteins compared to other Mycobacteria - Ruchira S. Datta
pathogenic Mycobacteria lost signal transduction proteins evenly across different families - Ruchira S. Datta
the shape of the profile is the same in all members of the genus, similarly with Baccilaceae - Ruchira S. Datta
anomaly in Crenarchaea - due to misassignment of phylum - Ruchira S. Datta
Q: looked at uncultured bacteria? A: hard as can't count w/o complete genome, but have looked some - Ruchira S. Datta
Salama asks: sensor domain common to many systems, lost simultaneously? A: great question, haven't looked - Ruchira S. Datta
Q: most stupid organisms are pathogens? A: "don't forget that it was a joke!" most populous organisms are free-living marine, and also very stupid. - Ruchira S. Datta
Q: mechanism of domain shuffling w/o breaking something? A: not well known; may be through many rounds of recombination. addition of a new sensor that does nothing is evolutionarily harmless, so could add many and see what works. they don't impede each other's action. Mike Lau of MIT: not a lot of crosstalk - Ruchira S. Datta
Q: talked about two rumen bacteria of similar genomic size, one smart and one stupid. any comment? A: the smart one is very niche-adapted to rumen, the stupid one can survive in nature - Ruchira S. Datta
Ruchira S. Datta
Colin Manoil: Phenotyping a Non-Model Prokaryote at Genome Scale #LAMG10
although 1224 bacterial genomes have been sequences, there are <10 that have been well-characterized experimentally - Ruchira S. Datta
one is at the mercy of the assumption that similarity to model species sequences preserves function - Ruchira S. Datta
project goal: "annotate" a non-model genome using phenotyping - Ruchira S. Datta
Francisella novicida is an infectious gamma proteobacterium, work instead with surrogae Francisella tularensis which infects mice. Small genome. - Ruchira S. Datta
used transposon library to identify every non-essential gene, use highly parallel phenotyping of mutants to infer genotype-phenotype associations - Ruchira S. Datta
the transposon mutant analysis gave mutants in 1457 of 1742 genes, 3.1 mutants/gene, 97.5% confirmed by re-sequencing - Ruchira S. Datta
redundancy w/ 3 mutants was important for confidence in the genotype/phenotype associations - Ruchira S. Datta
phenotypes checked: carbon utilization, small molecule biosynthesis, and antibiotic and stress sensitivites - Ruchira S. Datta
e.g., growth on carbon sources: glucose, fructose, glycerol, glutamine - Ruchira S. Datta
e.g., specific for glycerol: had mutated glycerol kinase - Ruchira S. Datta
check phenotype profile of each mutant in each condition, measure relative growth - Ruchira S. Datta
of genes with mutant phenotypes, 55% had fully predicted functions, 25% had no predicted function, 20% had partially predicted function - Ruchira S. Datta
e.g., mutant phenotype was fructose utilization, gene annotation was already fructose kinase, then not that helpful - Ruchira S. Datta
but fructose utilization of gene annotated as hypothetical is more useful - Ruchira S. Datta
fructose utilization of gene annotated as sugar-proton symporter is more useful - Ruchira S. Datta
in existing proline biosynthesis annotation, the pathway appeared to be incomplete as proC was missing. but mutant phenotype showed it was indeed competent to produce proline. used alternative pathway: arginine via amidinotransferase to ornithine, and orthinithine via ornithine cyclodeaminase to proline - Ruchira S. Datta
quinolone antibiotic resistance (e.g., ciprofloxacin, nalidixic acid): found genes involved in recombination, replication and repair, efflux, and several novel genes not previously known to be involved - Ruchira S. Datta
partial overlap with Jeffrey Miller's list of quinolone resistance - could be due to technical issues, but also due to only partial conservation of the antibiotic resistance mechanisms - Ruchira S. Datta
detergent sensitivity/resistance: ABC transporter responsible for keeping phospholipids out of outer leaflet in E. coli. this is the mla system. But here in this small genome, there are two mla-gene related systems for keeping phospholipids out of outer membrane. Also found novel genes associated with this phenotype. - Ruchira S. Datta
can cluster phenotypes by their mutant profiles. e.g., ciprofloxacin clusters with nalidixic acid. but aminoglycoside antibiotics cluster with certain stresses, 46C or ph8.3 - Ruchira S. Datta
limitations of this approach: polar effects (don't know about downstream genes on same operon). however, these genes should be working together anyway, so may not be as big an issue. essential genes and unpredicted genes are missing from the analysis. - Ruchira S. Datta
large-scale phenotyping is a rich source of functional information and is feasible for non-model prokaryotes - Ruchira S. Datta
requires a mutant library, of which there are 10 or 12 published ones? not limited by that, TnSeq should open up many possibilities - Ruchira S. Datta
Bruno Sobral: vocabulary to describe phenotypes? is this controlled, can it be queried? A: just using traditional genetic language. assigning a number to each phenotype of each gene. Bruno Sobral: assay has a name? A: yes, presence of stressor or absence of nutrient. Bruno Sobral: how often was the starting annotation actually useful? A: hard to estimate. on average, verifying predicted phenotype or attaching new one to hypothetical. - Ruchira S. Datta
Q: phenotype of proC mutant? A: not yet. Q: if have something particular interested in? A: send an email - Ruchira S. Datta
Jonathan Eisen: why didn't do the unpredicted genes? A: did include some in gaps between genes, had phenotypes, not very systematically - Ruchira S. Datta
Ruchira S. Datta
Magdalene So: Commensal Neisseria Genomes: Clues to Pathogen Behavior #LAMG10
two pathogenic species of Neisseria infect only humans: N. meningitidis and N. gonorrhoeae - Ruchira S. Datta
N. meningitidis has 5-10% mortality rate. There exists an incompletely effective vaccine which the people where this is endemic cannot afford. - Ruchira S. Datta
Gonorrhoea has become resistant to almost all antibiotics now, so the CDC calls it a superbug. #LAMG10 - Ruchira S. Datta
asymptomatic carriage of N. meningitidis: 20-50% - Ruchira S. Datta
asymptomatic carriage of N. gonorrhaea: >5% - Ruchira S. Datta
wanted to study these asymptomatic commensals - Ruchira S. Datta
8 commensal Neisseria species (non-pathogenic) also isolated from humans - Ruchira S. Datta
tree shows commensal Neisseria came first, the two pathogens are relative newcomers - Ruchira S. Datta
170 Neisseria virulence genes; immune evasion, intracellular replication, capsule production - Ruchira S. Datta
many of these are present in commensal Neisseria species, which serve as reservoir of virulence genes - Ruchira S. Datta
Type 4 pilus: Wolfgang & Koomey, EMBO J 19:6408. Used for DNA uptake, attachment, motility - Ruchira S. Datta
assembly extrudes the pilus, and disassembly of pilin retracts the pilus - Ruchira S. Datta
N. gonorrhaea crawl on polyacrymalide pillars, pulling them w/ pili and bending them w/ mechanical force - Ruchira S. Datta
known Type IV pilus biogenesis genes - Ruchira S. Datta
DNA Uptake sequence (DUS) GCCGCCTGAA: Neisseria won't uptake DNA unless this sequence is present - Ruchira S. Datta
these are present in many Neisseria species. variant DUS GTCGTCTGAA used by a couple of species (N sicca and N mucosa), which also is effective for uptake. - Ruchira S. Datta
looked for evidence of horizontal gene transfer using Recombination Detection Program (RDPv3.8). detected recombination breakpoints in 53 of 69 vir genes, with >50% of these involving both commensal and pathogenic species - Ruchira S. Datta
phylogenetic method: make gene trees of the virulence genes, check whether they are topologically different from other gene trees and p-value significance. most disagree. - Ruchira S. Datta
virulence is a dynamic state: there may not be a genetic border separating pathogens and commensals - Ruchira S. Datta
Wu et al New Engl J Med vol 360 2009: Emergence of Ciprofloxacin resistant N meningiditis in N. America. This resistance gene came from a commensal Neisseria species present in throats of contacts. - Ruchira S. Datta
Type IV pili exhibit variation in the antigens they present. Silent copies of pilin genes combine with pilE (pil expression) - Ruchira S. Datta
Pilin antigenic variation occurs in vitro and in vivo, and is thought to be a strategy to evade immune recognition. - Ruchira S. Datta
cell pumps out DNA, autolysis releases the DNA, and the living cell takes up DNA via the Type IV pilus (Tfp) - Ruchira S. Datta
think the silent variant genes have a function, but don't know what - Ruchira S. Datta
commensals undergo limited pilin antigenic variation. how do they avoid our protective antibodies? - Ruchira S. Datta
retraction of the pili extends mechanical force into the cell, triggering many signaling events - Ruchira S. Datta
can trigger apoptotic cell death, which is pro-inflammatory - Ruchira S. Datta
but Tfp triggers anti-apoptotic pathways, activating ERK and preventing release of Cytochrome C and apoptosis - Ruchira S. Datta
Higashi et al showed this involvement in 2007 - Ruchira S. Datta
working hypothesis: Tfp was evolved by commensals to help them colonize silently, and this "cytoprotective" activity helps N. meningitidis and N. gonorrhoea infect asymptomatically - Ruchira S. Datta
differentiate commensal and pathogenic Tfp to look for pathogenic traits? e.g., host signaling - Ruchira S. Datta
Q: how to define commensal vs pathogenic, since many of the "pathogens" are carried asymptomatically? A: do have good tests for species identity. whether it's the cause of, e.g., septicemia, can only infer from lack of other bacteria - Ruchira S. Datta
Jonathan Eisen: any distance cutoff for uptake of the vir genes btw different species? A: haven't looked at that - Ruchira S. Datta
Other ways to read this feed:Feed readerFacebook