Bioinformatics In Bioinformatics, Vol. 25, No. 24. (15 December 2009), pp. 3323-3324. Summary: The Exon Array Analyzer (EAA) is a web server, which provides a user-friendly interface to identify alternative splicing events analyzed with Affymetrix Exon Arrays. The EAA implements the Splice Index algorithm to identify differential expressed exons. The use of various filters allows reduction of the number of false positive hits. Results are presented with detailed annotation information and graphics to identify splice events and to facilitate biological validations. To demonstrate the versatility of the EAA, we analyzed exon arrays of 11 different murine tissues using sample data provided by Affymetrix (http://www.affymetrix.com). Data from the heart were compared with other tissues to identify exons that undergo heart-specific alternatively splicing, resulting in the identification of 885 differentially expressed probe sets in 649 genes. Availability: The web interface is available at...
- Daniel Swan
Integrating Proteomic, Transcriptional, and Interactome Data Reveals Hidden Components of Signaling and Regulatory Networks - http://www.citeulike.org/user...
Molecular Systems Biology In Mol Syst Biol, Vol. 5 (17 November 2009) Bacteria communicate using secreted chemical signaling molecules called autoinducers in a process known as quorum sensing. The quorum-sensing network of the marine bacterium Vibrio harveyi uses three autoinducers, each known to encode distinct ecological information. Yet how cells integrate and interpret the information contained within these three autoinducer signals remains a mystery. Here, we develop a new framework for analyzing signal integration on the basis of information theory and use it to analyze quorum sensing in V. harveyi. We quantify how much the cells can learn about individual autoinducers and explain the experimentally observed input–output relation of the V. harveyi quorum-sensing circuit. Our results suggest that the need to limit interference between input signals places strong constraints on the architecture of bacterial signal-integration networks, and that bacteria probably have evolved...
- Daniel Swan
Science, Vol. 326, No. 5957. (27 November 2009), pp. 1263-1268. To understand basic principles of bacterial metabolism organization and regulation, but also the impact of genome size, we systematically studied one of the smallest bacteria, Mycoplasma pneumoniae. A manually curated metabolic network of 189 reactions catalyzed by 129 enzymes allowed the design of a defined, minimal medium with 19 essential nutrients. More than 1300 growth curves were recorded in the presence of various nutrient concentrations. Measurements of biomass indicators, metabolites, and 13C-glucose experiments provided information on directionality, fluxes, and energetics; integration with transcription profiling enabled the global analysis of metabolic regulation. Compared with more complex bacteria, the M. pneumoniae metabolic network has a more linear topology and contains a higher fraction of multifunctional enzymes; general features such as metabolite concentrations, cellular energetics, adaptability, and...
- Daniel Swan
Nucl. Acids Res. (24 November 2009), gkp1015. The primary objective of most gene expression studies is the identification of one or more gene signatures; lists of genes whose transcriptional levels are uniquely associated with a specific biological phenotype. Whilst thousands of experimentally derived gene signatures are published, their potential value to the community is limited by their computational inaccessibility. Gene signatures are embedded in published article figures, tables or in supplementary materials, and are frequently presented using non-standard gene or probeset nomenclature. We present GeneSigDB (http://compbio.dfci.harvard.edu/genesig...) a manually curated database of gene expression signatures. GeneSigDB release 1.0 focuses on cancer and stem cells gene signatures and was constructed from more than 850 publications from which we manually transcribed 575 gene signatures. Most gene signatures (n = 560) were successfully mapped to the genome to extract standardized...
- Daniel Swan
Lost in translation: an assessment and perspective for computational microRNA target identification. - http://www.citeulike.org/user...
Bioinformatics (Oxford, England), Vol. 25, No. 23. (29 September 2009), pp. 3049-3055. MicroRNAs (miRNAs) are a class of short endogenously expressed RNA molecules that regulate gene expression by binding directly to the messenger RNA of protein coding genes. They have been found to confer a novel layer of genetic regulation in a wide range of bio-logical processes. Computational miRNA target prediction remains one of the key means used to decipher the role of miRNAs in devel-opment and disease. Here we introduce the basic idea behind the experimental identification of miRNA targets and present some of the most widely used computational miRNA target identification programs. The review includes an assessment of the prediction quality of these programs and their combinations. Panagiotis Alexiou, Manolis Maragkakis, Giorgos Papadopoulos, Martin Reczko, Artemis Hatzigeorgiou
- Daniel Swan
Combining tissue transcriptomics and urine metabolomics for breast cancer biomarker identification - http://www.citeulike.org/user...
Bioinformatics, Vol. 25, No. 23. (1 December 2009), pp. 3151-3157. Motivation: For the early detection of cancer, highly sensitive and specific biomarkers are needed. Particularly, biomarkers in bio-fluids are relatively more useful because those can be used for non-biopsy tests. Although the altered metabolic activities of cancer cells have been observed in many studies, little is known about metabolic biomarkers for cancer screening. In this study, a systematic method is proposed for identifying metabolic biomarkers in urine samples by selecting candidate biomarkers from altered genome-wide gene expression signatures of cancer cells. Biomarkers identified by the present study have increased coherence and robustness because the significances of biomarkers are validated in both gene expression profiles and metabolic profiles. Results: The proposed method was applied to the gene expression profiles and urine samples of 50 breast cancer patients and 50 normal persons. Nine altered...
- Daniel Swan
Using the ratio of means as the effect size measure in combining results of microarray experiments - http://www.citeulike.org/user...
BMC Systems Biology, Vol. 3, No. 1. (2009), 106. BACKGROUND:Development of efficient analytic methodologies for combining microarray results is a major challenge in gene expression analysis. The widely used effect size models are thought to provide an efficient modeling framework for this purpose, where the measures of association for each study and each gene are combined, weighted by the standard errors. A significant disadvantage of this strategy is that the quality of different data sets may be highly variable, but this information is usually neglected during the integration. Moreover, it is widely known that the estimated standard deviations are probably unstable in the commonly used effect size measures (such as standardized mean difference) when sample sizes in each group are small. RESULTS:We propose a re-parameterization of the traditional mean difference based effect measure by using the log ratio of means as an effect size measure for each gene in each study. The estimated...
- Daniel Swan
Bioinformatics, Vol. 25, No. 22. (15 November 2009), pp. 3026-3027. Summary: Saint is a web application which provides a lightweight annotation integration environment for quantitative biological models. The system enables modellers to rapidly mark up models with biological information derived from a range of data sources. Availability and Implementation: Saint is freely available for use on the web at http://www.cisban.ac.uk/saint. The web application is implemented in Google Web Toolkit and Tomcat, with all major browsers supported. The Java source code is freely available for download at http://saint-annotate.sourceforge.net. The Saint web server requires an installation of libSBML and has been tested on Linux (32-bit Ubuntu 8.10 and 9.04). Contact: helpdesk@cisban.ac.uk; a.l.lister@ncl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. 10.1093/bioinformatics/btp523 Allyson Lister, Matthew Pocock, Morgan Taschuk, Anil Wipat
- Daniel Swan
BMC Bioinformatics, Vol. 10, No. 1. (2009), 354. BACKGROUND:The microarray data analysis realm is ever growing through the development of various tools, open source and commercial. However there is absence of predefined rational algorithmic analysis workflows or batch standardized processing to incorporate all steps, from raw data import up to the derivation of significantly differentially expressed gene lists. This absence obfuscates the analytical procedure and obstructs the massive comparative processing of genomic microarray datasets. Moreover, the solutions provided, heavily depend on the programming skills of the user, whereas in the case of GUI embedded solutions, they do not provide direct support of various raw image analysis formats or a versatile and simultaneously flexible combination of signal processing methods.RESULTS:We describe here Gene ARMADA (Automated Robust MicroArray Data Analysis), a MATLAB implemented platform with a Graphical User Interface. This suite...
- Daniel Swan
BMC Bioinformatics, Vol. 10, No. 1. (2009), 330. BACKGROUND:Microarray experiments are increasing in size and samples are collected asynchronously over long time. Available data are re-analysed as more samples are hybridized. Systematic use of collected data requires tracking of biomaterials, array information, raw data, and assembly of annotations. To meet the information tracking and data analysis challenges in microarray experiments we reimplemented and improved BASE version 1.2.RESULTS:The new BASE presented in this report is a comprehensive annotable local microarray data repository and analysis application providing researchers with an efficient information management and analysis tool. The information management system tracks all material from biosource, via sample and through extraction and labelling to raw data and analysis. All items in BASE can be annotated and the annotations can be used as experimental factors in downstream analysis. BASE stores all microarray experiment...
- Daniel Swan
ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization - http://www.citeulike.org/user...
BMC Bioinformatics, Vol. 10, No. 1. (2009), 358. BACKGROUND:Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks.RESULTS:We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature...
- Daniel Swan
Bioinformatics, Vol. 25, No. 21. (1 November 2009), pp. 2872-2877. Motivation: Whole transcriptome shotgun sequencing data from non-normalized samples offer unique opportunities to study the metabolic states of organisms. One can deduce gene expression levels using sequence coverage as a surrogate, identify coding changes or discover novel isoforms or transcripts. Especially for discovery of novel events, de novo assembly of transcriptomes is desirable. Results: Transcriptome from tumor tissue of a patient with follicular lymphoma was sequenced with 36 base pair (bp) single- and paired-end reads on the Illumina Genome Analyzer II platform. We assembled [~]194 million reads using ABySS into 66 921 contigs 100 bp or longer, with a maximum contig length of 10 951 bp, representing over 30 million base pairs of unique transcriptome sequence, or roughly 1% of the genome. Availability and Implementation: Source code and binaries of ABySS are freely available for download at...
- Daniel Swan
Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis - http://www.citeulike.org/user...
Bioinformatics, Vol. 25, No. 22. (15 November 2009), pp. 2906-2912. Motivation: The molecular complexity of a tumor manifests itself at the genomic, epigenomic, transcriptomic and proteomic levels. Genomic profiling at these multiple levels should allow an integrated characterization of tumor etiology. However, there is a shortage of effective statistical and bioinformatic tools for truly integrative data analysis. The standard approach to integrative clustering is separate clustering followed by manual integration. A more statistically powerful approach would incorporate all data types simultaneously and generate a single integrated cluster assignment. Methods: We developed a joint latent variable model for integrative clustering. We call the resulting methodology iCluster. iCluster incorporates flexible modeling of the associations between different data types and the variance-covariance structure within data types in a single framework, while simultaneously reducing the...
- Daniel Swan
Bioinformatics, Vol. 25, No. 21. (1 November 2009), pp. 2855-2856. Motivation: With the availability of many omics' data, such as transcriptomics, proteomics or metabolomics, the integrative or joint analysis of multiple datasets from different technology platforms is becoming crucial to unravel the relationships between different biological functional levels. However, the development of such an analysis is a major computational and technical challenge as most approaches suffer from high data dimensionality. New methodologies need to be developed and validated. Results: integrOmics efficiently performs integrative analyses of two types of omics' variables that are measured on the same samples. It includes a regularized version of canonical correlation analysis to enlighten correlations between two datasets, and a sparse version of partial least squares (PLS) regression that includes simultaneous variable selection in both datasets. The usefulness of both approaches has been...
- Daniel Swan
Wolfram|Alpha releases a webservice API
- Daniel Swan
It looks a little overpriced at the low end to me. I would have thought the way to do it would be to give a limited number of requests away for free to stimulate development, then charge more for the high-volume, because that probably has a revenue model behind it.
- Mr. Gunn
Nature, Vol. 461, No. 7266. (14 October 2009), pp. 881-881. Google Wave is the kind of open-source online collaboration tool that should drive scientists to wire their research and publications into an interactive data web, says Cameron Neylon. Cameron Neylon
- Daniel Swan
Nature, Vol. advance online publication (14 October 2009) DNA cytosine methylation is a central epigenetic modification that has essential roles in cellular processes including genome regulation, development and disease. Here we present the first genome-wide, single-base-resolution maps of methylated cytosines in a mammalian genome, from both human embryonic stem cells and fetal fibroblasts, along with comparative analysis of messenger RNA and small RNA components of the transcriptome, several histone modifications, and sites of DNA–protein interaction for several key regulatory factors. Widespread differences were identified in the composition and patterning of cytosine methylation between the two genomes. Nearly one-quarter of all methylation identified in embryonic stem cells was in a non-CG context, suggesting that embryonic stem cells may use different methylation mechanisms to affect gene regulation. Methylation in non--CG contexts showed enrichment in gene bodies and depletion...
- Daniel Swan
Molecular Systems Biology, Vol. 5 (13 October 2009) The advent of cost-effective genotyping and sequencing methods have recently made it possible to ask questions that address the genetic basis of phenotypic diversity and how natural variants interact with the environment. We developed Camelot (CAusal Modelling with Expression Linkage for cOmplex Traits), a statistical method that integrates genotype, gene expression and phenotype data to automatically build models that both predict complex quantitative phenotypes and identify genes that actively influence these traits. Camelot integrates genotype and gene expression data, both generated under a reference condition, to predict the response to entirely different conditions. We systematically applied our algorithm to data generated from a collection of yeast segregants, using genotype and gene expression data generated under drug-free conditions to predict the response to 94 drugs and experimentally confirmed 14 novel gene–drug...
- Daniel Swan
BMC Bioinformatics, Vol. 10, No. 1. (22 September 2009), 305. BACKGROUND:Chromatin immunoprecipitation on tiling arrays (ChIP-chip) has been employed to examine features such as protein binding and histone modifications on a genome-wide scale in a variety of cell types. Array data from the latter studies typically have a high proportion of enriched probes whose signals vary considerably (due to heterogeneity in the cell population), and this makes their normalization and downstream analysis difficult.RESULTS:Here we present strategies for analyzing such experiments, focusing our discussion on the analysis of Bromodeoxyruridine (BrdU) immunoprecipitation on tiling array (BrdU-IP-chip) datasets. BrdU-IP-chip experiments map large, recently replicated genomic regions and have similar characteristics to histone modification/location data. To prepare such data for downstream analysis we employ a dynamic programming algorithm that identifies a set of putative unenriched probes, which we use...
- Daniel Swan
BMC Genomics, Vol. 10, No. 1. (2009), 439. BACKGROUND:With the increasing number of expression profiling technologies, researchers today are confronted with choosing the technology that has sufficient power with minimal sample size, in order to reduce cost and time. These depend on data variability, partly determined by sample type, preparation and processing. Objective measures that help experimental design, given own pilot data, are thus fundamental.RESULTS:Relative power and sample size analysis were performed on two distinct data sets. The first set consisted of Affymetrix array data derived from a nutrigenomics experiment in which weak, intermediate and strong PPARalpha agonists were administered to wild-type and PPARalpha-null mice. Our analysis confirms the hierarchy of PPARalpha-activating compounds previously reported and the general idea that larger effect sizes positively contribute to the average power of the experiment. A simulation experiment was performed that mimicked...
- Daniel Swan