Angew. Chem. Int. Ed., Vol. 52, No. 19. (3 May 2013), pp. 5180-5183, doi:10.1002/anie.201300653 Melanie Schnell, Undine Erlekam, PR Bunker, Gert von Helden, Jens-Uwe Grabow, Gerard Meijer, Ad van der Avoird
- Egon Willighagen
Nature Biotechnology, Vol. 26, No. 3. (01 March 2008), pp. 274-275, doi:10.1038/nbt0308-274 The BLOSUM1 family of substitution matrices, and particularly BLOSUM62, is the de facto standard in protein database searches and sequence alignments. In the course of analyzing the evolution of the Blocks database2, we noticed errors in the software source code used to create the initial BLOSUM family of matrices (available online at ftp://ftp.ncbi.nih.gov/repository/blocks/unix/blosum/blosum.tar.Z Mark Styczynski, Kyle Jensen, Isidore Rigoutsos, Gregory Stephanopoulos
- Egon Willighagen
A hidden part of MACiE history is that Gemma and Gail at least on some occassions used JChemPaint, which I extended to their needs, for validation of the content they were cooking up...
- Egon Willighagen
(yes, it's hidden because it was a very small role...)
- Egon Willighagen
Science, Vol. 340, No. 6133. (10 May 2013), pp. 707-711, doi:10.1126/science.1231566 Some goods, such as widgets, are freely bought and sold in markets without protest, whereas others, such as indulgences, are not. Some mice that have been bred for use in laboratory experiments turn out to be surplus to requirements and are subsequently sacrificed. Falk and Szech (p. 707) studied the effect that marketplace negotiation has had on experimental subjects' willingness to pay for the upkeep of these surplus mice. Individuals were willing to pay much more to save the mice, but market-like exchanges lowered these prices. Armin Falk, Nora Szech
- Egon Willighagen
Journal of Molecular Biology, Vol. 48, No. 3. (28 March 1970), pp. 443-453, doi:10.1016/0022-2836(70)90057-4 A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed. From these findings it is possible to determine whether significant homology exists between the proteins. This information is used to trace their possible evolutionary development. The maximum match is a number dependent upon the similarity of the sequences. One of its definitions is the largest number of amino acids of one protein that can be matched with those of a second protein allowing for all possible interruptions in either of the sequences. While the interruptions give rise to a very large number of comparisons, the method efficiently excludes from consideration those comparisons that cannot contribute to the maximum match. Comparisons are made from the smallest unit of significance, a pair of amino acids, one from each protein. All possible pairs are...
- Egon Willighagen
PLoS ONE, Vol. 8, No. 5. (7 May 2013), e61951, doi:10.1371/journal.pone.0061951 Protein sequence databases are the pillar upon which modern proteomics is supported, representing a stable reference space of predicted and validated proteins. One example of such resources is UniProt, enriched with both expertly curated and automatic annotations. Taken largely for granted, similar mature resources such as UniProt are not available yet in some other “omics” fields, lipidomics being one of them. While having a seasoned community of wet lab scientists, lipidomics lies significantly behind proteomics in the adoption of data standards and other core bioinformatics concepts. This work aims to reduce the gap by developing an equivalent resource to UniProt called ‘LipidHome’, providing theoretically generated lipid molecules and useful metadata. Using the ‘FASTLipid’ Java library, a database was populated with theoretical lipids, generated from a set of community agreed upon chemical bounds. In...
- Egon Willighagen
Formalization, Annotation and Analysis of Diverse Drug and Probe Screening Assay Datasets Using the BioAssay Ontology (BAO) - http://www.citeulike.org/user...
PLoS ONE, Vol. 7, No. 11. (14 November 2012), e49198, doi:10.1371/journal.pone.0049198 Huge amounts of high-throughput screening (HTS) data for probe and drug development projects are being generated in the pharmaceutical industry and more recently in the public sector. The resulting experimental datasets are increasingly being disseminated via publically accessible repositories. However, existing repositories lack sufficient metadata to describe the experiments and are often difficult to navigate by non-experts. The lack of standardized descriptions and semantics of biological assays and screening results hinder targeted data retrieval, integration, aggregation, and analyses across different HTS datasets, for example to infer mechanisms of action of small molecule perturbagens. To address these limitations, we created the BioAssay Ontology (BAO). BAO has been developed with a focus on data integration and analysis enabling the classification of assays and screening results by...
- Egon Willighagen
Nature, Vol. 497, No. 7448. (8 May 2013), pp. 199-204, doi:10.1038/nature12073 LP Gaffney, PA Butler, M Scheck, AB Hayes, F Wenander, M Albers, B Bastin, C Bauer, A Blazhev, S Bönig, N Bree, J Cederkäll, T Chupp, D Cline, TE Cocolios, T Davinson, H De Witte, J Diriken, T Grahn, A Herzan, M Huyse, DG Jenkins, DT Joss, N Kesteloot, J Konki, M Kowalczyk, Th Kröll, E Kwan, R Lutter, K Moschner, P Napiorkowski, J Pakarinen, M Pfeiffer, D Radeck, P Reiter, K Reynders, SV Rigby, LM Robledo, M Rudigier, S Sambi, M Seidlitz, B Siebeck, T Stora, P Thoele, P Van Duppen, MJ Vermeulen, M von Schmid, D Voulot, N Warr, K Wimmer, K Wrzosek-Lipska, CY Wu, M Zielinska
- Egon Willighagen
we're trying to get the reaxys api access they promise in all their literature... seems like they shouldn't hype it if we have to tell them why we're using it and wait now > 2weeks to get started with it :(
- Christina Pikas
i think they might be trying to get out of supporting academic users with their api... but it doesn't say that on their website.
- Christina Pikas
Walt, yes. If you unglue it, I'll contribute and encourage others to also, for sure. Be sure to set a minimum level that makes you happy, though.
- Heather Piwowar
Egon: Since the serials crisis is over and everybody has access to all the subscription serials they could possibly read, you must--MUST--be able to read Beall's article. Of course, I can't (not without paying $23.68, an oddly specific sum), but that's because I'm one of those nonexistent unaffiliated folks who don't matter.
- Walt Crawford
Heather: I will do that. So far, I've received 0 email and 0 comments on the post itself, but it's early yet.
- Walt Crawford
I'm just puzzled. He's a librarian right? At a university? Who presumably has to argue for a budget? Which he's just lost all leverage over for ever and for all time because "there is no problem"? Am I missing something?
- Cameron Neylon
He's at the University of Colorado Denver Auraria Library. If his ScholComm role is similar to the one at my uni, he doesn't actually have any collection development responsibilities or a budget to manage. How the UCD electronic resources and collection development librarians feel about what he's saying would be very interesting to know.
- Hedgehog
yes, he started out as a cataloger. He previously had a holy war against Dublin Core and argued for MARC.
- Sarah
Which may make me indirectly partly responsible for him (except that I never argued *against* DC), for which I apologize. Come to think of it: I never argued *for* MARC except to say that if you're going to call it MARC, you should know what you're talking about.
- Walt Crawford
I'd give a lot to read his tenure file.
- Steele Lawman
I think I am detecting a tendency towards high profile tilting against windmills as a consequence of "being on t'internets" which seems to lead to highly polarized positions being taken up. Profile building seems to require taking extreme, even archetypal positions.</potCallingKettleBlack>
- Cameron Neylon
Cameron, indeed. The more "extreme" your statement, the higher the impact. And because it is hard to find new scientific results that are extreme, people focus on things around science. (or make new science finding sound more extreme than they are... which is *very* common deep inside the publishing world, as we all know)
- Egon Willighagen
(2 May 2013) Twitter is a micro-blogging social media platform for short messages that can have a long-term impact on how scientists create and publish ideas. We investigate the usefulness of twitter in the development and distribution of scientific knowledge. At the start of the life cycle of a scientific publication, twitter provides a large virtual department of colleagues that can help to rapidly generate, share and refine new ideas. As ideas become manuscripts, twitter can be used as an informal arena for the pre-review of works in progress. Finally, tweeting published findings can communicate research to a broad audience of other researchers, decision makers, journalists and the general public that can amplify the scientific and social impact of publications. However, there are limitations, largely surrounding issues of intellectual property and ownership, inclusiveness and misrepresentations of science sound bites. Nevertheless, we believe twitter is a useful social media tool...
- Egon Willighagen
Journal of Cheminformatics, Vol. 5, No. 1. (2013), 23, doi:10.1186/1758-2946-5-23 Background Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs). RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easier to scale up inference and data analysis.Results This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples. Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO; exposes more information from the database; and is now available as dereferencable, linked data. To demonstrate these new features, we present novel use cases showing further integration with other web resources, including Bio2RDF, Chem2Bio2RDF,...
- Egon Willighagen
J. Chem. Inf. Model. (14 March 2013), doi:10.1021/ci300182p The concept of molecular similarity is one of the most central in the fields of predictive toxicology and quantitative structure?activity relationship (QSAR) research. Many toxicological responses result from a multimechanistic process and, consequently, structural diversity among the active compounds is likely. Combining this knowledge, we introduce similarity boosted QSAR modeling, where we calculate molecular descriptors using similarities with respect to representative reference compounds to aid a statistical learning algorithm in distinguishing between different structural classes. We present three approaches for the selection of reference compounds, one by literature search and two by clustering. Our experimental evaluation on seven publicly available data sets shows that the similarity descriptors used on their own perform quite well compared to structural descriptors. We show that the combination of similarity and...
- Egon Willighagen
Concurrency Computat.: Pract. Exper., Vol. 25, No. 4. (1 February 2013), pp. 481-496, doi:10.1002/cpe.2922 Science, especially experimental science, has always depended on the careful capture of plans, actions, raw and processed data and conclusions. With scientific research now so inextricably dependent on computers, the use of an electronic laboratory notebook (ELN) is almost essential. The meticulous notebooks of Michael Faraday and other scientists of his era remain as role models for the recording that is necessary, but they cannot provide the essential support for discussion, sharing, collaboration and formal verification. A blog (a contraction of Web log) can form the basis for implementing an electronic notebook but does not suffice to meet all the needs of an ELN. This paper describes the LabTrove ELN, which is blog based but provides numerous additional features, such as version control, security policies and a flexible metadata scheme, and facilities for interchanging...
- Egon Willighagen
Excellent ! Thanks. I tried something earlier but it wouldn't install - I think it wanted to put a dll someplace banned by work.
- Christina Pikas
from iPhone
since i'd like to use eclipse, that looks good, too
- Christina Pikas
I asked my coworkers, since we use Git - one of our new guys recommended SmartGitHg 4 (for Win/Mac/Linux). Also recommended TortoiseGit (Win only).
- Laura H.
I am thinking we should change the Open Access Spectrum so that rather than "author retains copyright" that column says "author retain rights to distribute under any chosen license". Copyright in and of itself is useless...
- Cameron Neylon
nature lets authors retain copyright, but they require worldwide exclusive publishing rights
- DJF
from Android
...and exclusive commercial rights. So basically authors "own copyright" but actually have a set of rights that are pretty much limited to some personal uses.
- Cameron Neylon
Database, Vol. 2013 (1 January 2013), doi:10.1093/database/bat029 MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions.Database URL: http://www.ebi.ac.uk/metabol... Reza Salek, Kenneth Haug, Pablo Conesa, Janna Hastings, Mark Williams, Tejasvi Mahendraker, Eamonn Maguire, Alejandra González-Beltrán, Philippe Rocca-Serra, Susanna-Assunta Sansone, Christoph Steinbeck
- Egon Willighagen
Nucleic Acids Research, Vol. 41, No. D1. (01 January 2013), pp. D801-D807, doi:10.1093/nar/gks1065 The Human Metabolome Database (HMDB) (www.hmdb.ca) is a resource dedicated to providing scientists with the most current and comprehensive coverage of the human metabolome. Since its first release in 2007, the HMDB has been used to facilitate research for nearly 1000 published studies in metabolomics, clinical biochemistry and systems biology. The most recent release of HMDB (version 3.0) has been significantly expanded and enhanced over the 2009 release (version 2.0). In particular, the number of annotated metabolite entries has grown from 6500 to more than 40 000 (a 600% increase). This enormous expansion is a result of the inclusion of both ‘detected’ metabolites (those with measured concentrations or experimental confirmation of their existence) and ‘expected’ metabolites (those for which biochemical pathways are known or human intake/exposure is frequent but the compound has yet to...
- Egon Willighagen
PLoS ONE, Vol. 8, No. 5. (1 May 2013), e62325, doi:10.1371/journal.pone.0062325 Dispensing and dilution processes may profoundly influence estimates of biological activity of compounds. Published data show Ephrin type-B receptor 4 IC50 values obtained via tip-based serial dilution and dispensing versus acoustic dispensing with direct dilution differ by orders of magnitude with no correlation or ranking of datasets. We generated computational 3D pharmacophores based on data derived by both acoustic and tip-based transfer. The computed pharmacophores differ significantly depending upon dispensing and dilution methods. The acoustic dispensing-derived pharmacophore correctly identified active compounds in a subsequent test set where the tip-based method failed. Data from acoustic dispensing generates a pharmacophore containing two hydrophobic features, one hydrogen bond donor and one hydrogen bond acceptor. This is consistent with X-ray crystallography studies of ligand-protein...
- Egon Willighagen
A tandem regression-outlier analysis of a ligand cellular system for key structural modifications around ligand binding - http://www.citeulike.org/user...
Journal of Cheminformatics, Vol. 5, No. 1. (2013), 21, doi:10.1186/1758-2946-5-21 BACKGROUND:A tandem technique of hard equipment is often used for the chemical analysis of a single cell to first isolate and then detect the wanted identities. The first part is the separation of wanted chemicals from the bulk of a cell; the second part is the actual detection of the important identities. To identify the key structural modifications around ligand binding, the present study aims to develop a counterpart of tandem technique for cheminformatics. A statistical regression and its outliers act as a computational technique for separation.RESULTS:A PPARgamma (peroxisome proliferator-activated receptor gamma) agonist cellular system was subjected to such an investigation. Results show that this tandem regression-outlier analysis, or the prioritization of the context equations tagged with features of the outliers, is an effective regression technique of cheminformatics to detect key structural...
- Egon Willighagen
BMC Systems Biology, Vol. 7, No. 1. (2013), 15, doi:10.1186/1752-0509-7-15 BACKGROUND:The KEGG PATHWAY database provides a plethora of pathways for a diversity of organisms. All pathway components are directly linked to other KEGG databases, such as KEGG COMPOUND or KEGG REACTION. Therefore, the pathways can be extended with an enormous amount of information and provide a foundation for initial structural modeling approaches. As a drawback, KGML-formatted KEGG pathways are primarily designed for visualization purposes and often omit important details for the sake of a clear arrangement of its entries. Thus, a direct conversion into systems biology models would produce incomplete and erroneous models.RESULTS:Here, we present a precise method for processing and converting KEGG pathways into initial metabolic and signaling models encoded in the standardized community pathway formats SBML (Levels 2 and 3) and BioPAX (Levels 2 and 3). This method involves correcting invalid or incomplete...
- Egon Willighagen
(2 Apr 2013) The combination of the flexibility of RDF and the expressiveness of SPARQL provides a powerful mechanism to model, integrate and query data. However, these properties also mean that it is nontrivial to write performant SPARQL queries. Indeed, it is quite easy to create queries that tax even the most optimised triple stores. Currently, application developers have little concrete guidance on how to write "good" queries. The goal of this paper is to begin to bridge this gap. It describes 5 heuristics that can be applied to create optimised queries. The heuristics are informed by formal results in the literature on the semantics and complexity of evaluating SPARQL queries, which ensures that queries following these rules can be optimised effectively by an underlying RDF store. Moreover, we empirically verify the efficacy of the heuristics using a set of openly available datasets and corresponding SPARQL queries developed by a large pharmacology data integration project. The...
- Egon Willighagen
Genome Medicine, Vol. 5, No. 4. (2013), 40, doi:10.1186/gm444 BACKGROUND:A DNA methylation signature has been characterized that distinguishes rheumatoid arthritis (RA) fibroblast like synoviocytes (FLS) from osteoarthritis (OA) FLS. The presence of epigenetic changes in long term cultured cells suggests that rheumatoid FLS imprinting might contribute to pathogenic behavior. To understand how differentially methylated genes (DMGs) might participate in the pathogenesis of RA, we evaluated the stability of the RA signature and how DMGs are enriched in specific pathways and ontology categories.METHODS:To assess the RA methylation signatures the Illumina HumanMethylation450 chip was used to compare methylation levels in RA, OA and normal (NL) FLS taken at passage 3, 5, and 7. Then methylation frequencies at CpGs within the signature were compared between passages. To assess the enrichment of DMG in specific pathways DMG were identified as genes that possess significantly differential...
- Egon Willighagen