A tweet from Cameron Neylon earlier today about this "I'm sorry. What world does the publishers association actually live in? Because its not the one I inhabit".
- Graham Steel
This is some of the most egregious horseshit I've seen in a while. Probably best not to read while either drinking anything or in a public place as people will look at you funny as you tear your hair out...
- Cameron Neylon
"There is an entire publishing ecosystem built around this function." aka 'We managed to create a vendor lock-in around us' :)
- Egon Willighagen
Well, but you probably don't *want* to "improve the commercialisation of research"--you probably think it's too damn commercialized already. So you're in a world where research has to do with more than profit. Not the PA's world.
- Walt Crawford
Journal of cheminformatics, Vol. 3, No. 1. (7 October 2011), 33. ABSTRACT: BACKGROUND: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats. RESULTS: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge...
- Egon Willighagen
Elsevier, FooBar and Content-mining – yet another Digital Land Grab – wake up academia and fight. Or surrender for ever - http://blogs.ch.cam.ac.uk/pmr...
So important. #FRPAA doesn't solve this at all (PubMed Central can't be used for text mining). I'd like to hope #FRPAA is a positive incremental step toward text mining, but maybe energy will die without getting there. Should we be lobbying for inclusion of text mining rights in it now while energy is up, and or making it clear that we interpret #FRPAA as including text mining rights, or ?
- Heather Piwowar
Note that FRPAA doesn't actually mandate use of PMC so agencies have an opportunity to have differen repos with different rejuirements so there is the potential at least to do things better in the particular area of text mining
- Cameron Neylon
Checked more articles and added average. The minimum charge is actually 31.50. The only way they can make that claim is by factoring in OA articles at zero dollars - and even then it's doubtful and the usage of "per article cost" misleading.
- Björn Brembs
I'm sure the figure they have is based on subscription costs, not the per-article charges. That argument is a little more subtle - they've expanded page numbers faster than they've expanded charges. Of course if technology was being used effectively the cost per page should have dropped by an order of magnitude.
- Cameron Neylon
The point is that they could have taken the average of just a few chosen papers - or just made the numbers up. There's no way to check and what ways we have contradicts their statement.
- Björn Brembs
I just realized that I must have commented on the wrong post with my first comment. Sorry, it was early morning on a weekend with a toddler asking for attention... Now if I could find the post I wanted to comment on, the one with the Open Letter from Elsevier...
- Björn Brembs
elsevier vp of global marketing communications says there's a study of 4000 researchers in which 90% reported "very high satisfaction" with access to research articles... I have requested more details.
You mean like how many of those researchers were *not* at well-funded first-world institutions? (Or how many were independent research...oh, I forgot, there's no such thing.)
- Walt Crawford
yeah like who the hell they asked and precisely what question did they ask them.... the #$%^ moderator on liblicense hasn't seen fit to let my post through yet. If she returns it (which she has done with mine before), i'll resend directly to elsevier
- Christina Pikas
Sounds like something for FakeLibStats on Twitter.
- Andy
i would be interested in how many of that 90% acknowledge that their library probably pays for access.
- Georgie Bestie
You could tweet https://twitter.com/#!... Is that the guy? I think he has noted this stat in some of his blog post comments without a cite.
- Just Joe
this wasn't tom reller, this was another guy... so I sent the message to liblicense at 9:48 this morning and it still hasn't been posted... who runs a listserv like that?
- Christina Pikas
the guy is probably female (oops) but here's the name: Chrysanne Lowe
- Christina Pikas
The STM study done recently came back with results that said no problems with academic of SME access and I have no idea whatsoever how they managed to get those results. I think the questions do need close looking at tho.
- Cameron Neylon
I had an e-mail off list from Richard Poynder who has been in contact with E off list. the survey details are at: http://www.publishingresearch.net/documen... .. the sample is apparently authors who have published in one of 18,000 journals and the question is given on p9
- Christina Pikas
i find this astounding that 78% of respondents in africa said that research journal articles were very or fairly easy to access
- Christina Pikas
I find the whole survey and results fishy. 19 of 20 authors find research articles easy to find? Some of those African authors might get free access to Elsevier articles through a program they have, but that doesn't explain the 78% number.
- Just Joe
Let's see: 82 people in Africa. 96 people in the Middle East. 151 people in Latin America. And all of those people are already published in some set of journals. I'm impressed...
- Walt Crawford
the corporate numbers are way high, too. the people we get here who previously worked at gov't labs or corporations (the big defense companies) always comment on access to the lit
- Christina Pikas
Notably follow up missing...'how much of your access is legal?'
- Cameron Neylon
heh. excellent point. probably the way to ask is "how much of this is through colleagues not at your institution?"
- RepoRat
You'd probably also need to ask people to exclude informal email exchange of PDFs, since published authors are more likely to be part of the invisible colleges. I also wonder whether the low response rate says something...
- Walt Crawford
interestingly, the earlier 2009 study of small company researchers found negligible (1%) use of local academic libraries (respondents wanted online access)
- Christina Pikas
I would love to do a study where we looked at researchers personal libraries and quantified how much was actually legally obtained, how much was grey, and perhaps even how much was clearly black market (distinguishing the latter two is hard, looking at most recent additions and checking library holdings shouldn't be too difficult?)
- Cameron Neylon
For a scientific publishers is it rather sad that they cite results based on this question... there is no establishment as to what 'easy' is, some of the problems outlined above... and we put trust in a publisher that gets its basic act not together to 'improve' scientific dissemination for us? Elsevier can better just shut up, starting giving big boons, because every reply only makes...
more...
- Egon Willighagen
This is definitely in conflict with another survey: slide 35 - Learning from default mode network: the predictive value of resting state in traumatic brain injury.
- Björn Brembs
Nature Biotechnology, Vol. 30, No. 2. (15 January 2012), pp. 159-164. To better understand the molecular mechanisms and genetic basis of human disease, we systematically examine relationships between 3,949 genes, 62,663 mutations and 3,453 associated disorders by generating a three-dimensional, structurally resolved human interactome. This network consists of 4,222 high-quality binary protein-protein interactions with their atomic-resolution interfaces. We find that in-frame mutations (missense point mutations and in-frame insertions and deletions) are enriched on the interaction interfaces of proteins associated with the corresponding disorders, and that the disease specificity for different mutations of the same gene can be explained by their location within an interface. We also predict 292 candidate genes for 694 unknown disease-to-gene associations with proposed molecular mechanism hypotheses. This work indicates that knowledge of how in-frame disease mutations alter specific...
- Egon Willighagen
PLoS Comput Biol, Vol. 7, No. 12. (29 December 2011), e1002323. Combinatorial therapy is a promising strategy for combating complex disorders due to improved efficacy and reduced side effects. However, screening new drug combinations exhaustively is impractical considering all possible combinations between drugs. Here, we present a novel computational approach to predict drug combinations by integrating molecular and pharmacological data. Specifically, drugs are represented by a set of their properties, such as their targets or indications. By integrating several of these features, we show that feature patterns enriched in approved drug combinations are not only predictive for new drug combinations but also provide insights into mechanisms underlying combinatorial therapy. Further analysis confirmed that among our top ranked predictions of effective combinations, 69% are supported by literature, while the others represent novel potential drug combinations. We believe that our proposed...
- Egon Willighagen
Journal of Cheminformatics, Vol. 4, No. 1. (2012), 3. BACKGROUND:Representations of chemical datasets in spreadsheet format are important for ready data assimilation and manipulation. In addition to the normal spreadsheet facilities, chemical spreadsheets need to have visualisable chemical structures and data searchable by chemical as well as textual queries. Many such chemical spreadsheet tools are available, some operating in the familiar Microsoft Excel environment. However, within this group, the performance of Excel is often compromised, particularly in terms of the number of compounds which can usefully be stored on a sheet.SUMMARY:LICSS is a lightweight chemical spreadsheet within Microsoft Excel for Windows. LICSS stores structures solely as Smiles strings. Chemical operations are carried out by calling Java code modules which use the CDK, JChemPaint and OPSIN libraries to provide cheminformatics functionality. Compounds in sheets or charts may be visualised (individually or...
- Egon Willighagen
BMC Bioinformatics, Vol. 13, No. Suppl 1. (2012), S3. BACKGROUND:Semantic Web technologies have been developed to overcome the limitations of the current Web and conventional data integration solutions. The Semantic Web is expected to link all the data present on the Internet instead of linking just documents. One of the foundations of the Semantic Web technologies is the knowledge representation language Resource Description Framework (RDF). Knowledge expressed in RDF is typically stored in so-called triple stores (also known as RDF stores), from which it can be retrieved with SPARQL, a language designed for querying RDF-based models. The Semantic Web technologies should allow federated queries over multiple triple stores. In this paper we compare the efficiency of a set of biologically relevant queries as applied to a number of different triple store implementations.RESULTS:Previously we developed a library of queries to guide the use of our knowledge base Cell Cycle Ontology...
- Egon Willighagen
Plant Molecular Biology, Vol. 48, No. 1. (1 January 2002), pp. 155-171. Metabolites are the end products of cellular regulatory processes, and their levels can be regarded as the ultimate response of biological systems to genetic or environmental changes. In parallel to the terms `transcriptome' and `proteome', the set of metabolites synthesized by a biological system constitute its `metabolome'. Yet, unlike other functional genomics approaches, the unbiased simultaneous identification and quantification of plant metabolomes has been largely neglected. Until recently, most analyses were restricted to profiling selected classes of compounds, or to fingerprinting metabolic changes without sufficient analytical resolution to determine metabolite levels and identities individually. As a prerequisite for metabolomic analysis, careful consideration of the methods employed for tissue extraction, sample preparation, data acquisition, and data mining must be taken. In this review, the...
- Egon Willighagen
BMC Systems Biology, Vol. 6, No. 1. (2012), 8. BACKGROUND:The creation and modification of genome-scale metabolic models is a task that requires specialized software tools. While these are available, subsequently running or visualizing a model often relies on disjoint code, which adds additional actions to the analysis routine and, in our experience, renders these applications suboptimal for routine use by (systems) biologists.RESULTS:The Flux Analysis and Modeling Environment (FAME) is the first web-based modeling tool that combines the tasks of creating, editing, running, and analyzing/visualizing stoichiometric models into a single program. Analysis results can be automatically superimposed on familiar KEGG-like maps. FAME is written in PHP and uses the Python-based PySCeS-CBM for its linear solving capabilities. It comes with a comprehensive manual and a quick-start tutorial, and can be accessed online at http://f-a-m-e.org/ .CONCLUSIONS:With FAME, we present the community with an...
- Egon Willighagen
Nature Genetics, Vol. 44, No. 2. (27 January 2012), pp. 127-130. Jonathan Derry, Lara Mangravite, Christine Suver, Matthew Furia, David Henderson, Xavier Schildwachter, Brian Bot, Jonathan Izant, Solveig Sieberts, Michael Kellen, Stephen Friend
- Egon Willighagen
Douglas, I would recommend Jmol (Open Source). Bob implemented the DSSP algorithm, and Jmol has a such a large user base, that any irregularities have been detected by now. This is the scripting command listed there to calculate hydrogen bonding patterns with DSSP: calculate hBonds structure The scripting documentation provides more detail.
- Egon Willighagen
Nucleic Acids Research, Vol. 40, No. D1. (1 January 2012), pp. D9-D12. Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of...
- Egon Willighagen
Nature Genetics, Vol. 44, No. 2. (27 January 2012), pp. 121-126. Susanna-Assunta Sansone, Philippe Rocca-Serra, Dawn Field, Eamonn Maguire, Chris Taylor, Oliver Hofmann, Hong Fang, Steffen Neumann, Weida Tong, Linda Amaral-Zettler, Kimberly Begley, Tim Booth, Lydie Bougueleret, Gully Burns, Brad Chapman, Tim Clark, Lee-Ann Coleman, Jay Copeland, Sudeshna Das, Antoine de Daruvar, Paula de Matos, Ian Dix, Scott Edmunds, Chris Evelo, Mark Forster, Pascale Gaudet, Jack Gilbert, Carole Goble, Julian Griffin, Daniel Jacob, Jos Kleinjans, Lee Harland, Kenneth Haug, Henning Hermjakob, Shannan Sui, Alain Laederach, Shaoguang Liang, Stephen Marshall, Annette McGrath, Emily Merrill, Dorothy Reilly, Magali Roux, Caroline Shamu, Catherine Shang, Christoph Steinbeck, Anne Trefethen, Bryn Williams-Jones, Katherine Wolstencroft, Ioannis Xenarios, Winston Hide
- Egon Willighagen
The Chemistry Development Kit can do this with the StructureDiagramGenerator. A full code example can be found in this blog post. The basic use looks like: StructureDiagramGenerator sdg = new StructureDiagramGenerator(); sdg.setMolecule(someMolecule); sdg.generateCoordinates(); Molecule layedOutMol = sdg.getMolecule();
- Egon Willighagen
Yeah, but remember the PLoS letter to Science? Hardly anyone kept their promise. Not even PLoS bothers to keep a copy online. Bet the same thing will happen here -- they rush to get a free warmfuzzy but then the first time they face a teeny perceived career risk, they'll cave. (What, me jaded?)
- Bill Hooker
I think there are two differences here in the way this might play out - although I am pretty much as cynical as Bill. First this is just Elsevier - people will bend over backwards for Nature and Science but Elsevier...in many cases its not going to be a big problem for them. The other is the question of whether the whole community has become much more interested and active. My sense is...
more...
- Cameron Neylon
These things come in waves, bigger each time. Eventually you end up with a tsunami. Folks on the inside tell me that California faculty were ready to CRUCIFY NPG. That sure as heck didn't happen during the earlier Big Deal crisis in '04 or so. There's a sociology dissertation in that somewhere.
- RepoRat
I also heard yesterday that our local digital-humanities fanatic came back from MLA with the words "scholarly communication" on his lips. He'd never heard or heeded them before. This is how change happens: not with a bang but with a whole lotta whimpers.
- RepoRat
good news: page is revamped a bit, now more clear what signators are protesting. signed. Yeah, I don't think signators will necessarily live up to it. But I do think initiatives like these make people think twice and become more aware of their role in where we are.
- Heather Piwowar
I'm really and truly hoping that this time is different from the PLoS nonsense. (Not that PLoS is nonsense; the supposed pledge was.) Unfortunately, signing the earlier pledge--by tens of thousands of scientists--didn't apparently yield a tsunami of OA activism. We shall see.
- Walt Crawford
It's a variation on the Alma Swan tickybox problem. :) What encourages me this time is that more faculty are *spontaneously verbalizing pledges in public* rather than just ticking tickyboxen. That demonstrates more commitment, and creates more of a need to hold to one's word.
- RepoRat
"In the visualisation above, hydrogens should be ignored as they are not included in the paths (we could add an option to the SVG depiction to suppress these if necessary). Also, aromatic bonds are depicted as single bonds unless a complete aromatic ring is present in the fragment."
- Egon Willighagen
The visualisation is just a quick hack. The point is to get people thinking about using these fragments in some way, more than just for calculating the Tanimoto.
- Noel O'Boyle
Indeed... Jonathan implemented the Klekotha fingerprint, which consists of some 4k(!) possibly biologically interesting substructures... to find out, one indeed only needs to regress that fingerprint against some bioactivity to find which of those 4k are really interesting... and as you write, that applies to any fingerprint!
- Egon Willighagen
Good luck! (Organic) Chemistry is not particularly know for caring about any of this (nor semantics, nor computing)... but I second Rich' pointer to the Beilstein Journal of Organic Chemistry. And there is Chemistry Central too. What other gold OA options are there for chemists? There is Molecules as ChemComm replacement... others?
- Egon Willighagen
I have an idea for a project that might be a good fit with a friendly OA journal near you and could also bring down the costs of peer review in the longer term. It might be a way of shifting at least synthetic org chem into a new space.
- Cameron Neylon
Something like Molecules is nice, but the journal has/had no editorial standards, making text mining impossible. Of course, a new journal would not just standardize on format, but also require semantics, which for a ChemComm-like journal is very feasible...
- Egon Willighagen
Egon, that's what I'm thinking. A synth org chem process similar to that for Acta Cryst E. basically a defined input format with specified allowed data formats that would then be automatically tested tossed if they're consistent with the supposed structure.
- Cameron Neylon
Let me know if you seek someone for an editorial board... (btw, taking about edit boards, I was thinking of a ORC write up of this R package http://cran.r-project.org/web... ... would that fit the journal? I would need to write proper testing, which I have not getting around to... eta would probably be summer)
- Egon Willighagen
That looks like a very interesting resource? Who's behind the wiki? Email address? The wiki is devoid of information of copyright/licensing info...
- Egon Willighagen
The JIF should leave you cold... but remember that Web of Science was originally just to learn what cites you! the JIF is just a derived journal descriptor, just as useful as any molecular descriptor...
- Egon Willighagen
I see it as a reflection of the audience (rather than the publisher) - that still hankers after a JIF and will use it for evaluation. I assume a JIF is a selling point for a journal and J Cheminf has to cater to an audience that desires one.
- Rajarshi Guha
Keep in mind that metrics like JIF and h-index are considered in evaluations like tenure. If we want cheminformatics to be taken seriously, we need these for better or worse.
- Geoffrey Hutchison
Geoff, for that we also need JChemInf to start promoting data citations, etc...
- Egon Willighagen
Molecular Genetics and Metabolism, Vol. 101, No. 2-3. (22 October 2010), pp. 134-140. Genetic databases contain a variety of annotation errors that often go unnoticed due to the large size of modern genetic data sets. Interpretation of these data sets requires bioinformatics tools that may contribute to this problem. While providing gene symbol annotations for identifiers (IDs) such as microarray probe set, RefSeq, GenBank, and Entrez Gene is seemingly trivial, the accuracy is fundamental to any subsequent conclusions. We examine gene symbol annotations and results from three commercial pathway analysis software (PAS) packages: Ingenuity Pathways Analysis, GeneGO, and Pathway Studio. We compare gene symbol annotations and canonical pathway results over time and among different input ID types. We find that PAS results can be affected by variation in gene symbol annotations across software releases and the input ID type analyzed. As a result, we offer suggestions for using commercial...
- Egon Willighagen
BMC Bioinformatics, Vol. 11, No. 1. (2010), 449. BACKGROUND:It is necessary to analyze microarray experiments together with biological information to make better biological inferences. We investigate the adequacy of current biological databases to address this need.DESCRIPTION:Our results show a low level of consistency, comprehensiveness and compatibility among three popular pathway databases (KEGG, Ingenuity and Wikipathways). The level of consistency for genes in similar pathways across databases ranges from 0% to 88%. The corresponding level of consistency for interacting genes pairs is 0%-61%. These three original sources can be assumed to be reliable in the sense that the interacting gene pairs reported in them are correct because they are curated. However, the lack of concordance between these databases suggests each source has missed out many genes and interacting gene pairs.CONCLUSIONS:Researchers will hence find it challenging to obtain consistent pathway information out of...
- Egon Willighagen
Journal of Computational Chemistry, Vol. 26, No. 10. (30 July 2005), pp. 1063-1068. Many applications require a method for translating a large list of bond angles and bond lengths to precise atomic Cartesian coordinates. This simple but computationally consuming task occurs ubiquitously in modeling proteins, DNA, and other polymers as well as in many other fields such as robotics. To find an optimal method, algorithms can be compared by a number of operations, speed, intrinsic numerical stability, and parallelization. We discuss five established methods for growing a protein backbone by serial chain extension from bond angles and bond lengths. We introduce the Natural Extension Reference Frame (NeRF) method developed for Rosetta's chain extension subroutine, as well as an improved implementation. In comparison to traditional two-step rotations, vector algebra, or Quaternion product algorithms, the NeRF algorithm is superior for this application: it requires 47% fewer floating point...
- Egon Willighagen