Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »

Paulo Nuin › Comments

Steve Koch
Meeting with two Media Arts majors at UNM, discussing ideas for collaborations on science outreach. Thinking that a video explaining Open Science to the general public may be a good idea?
Definitely! - Mr. Gunn
We were mulling over a few ideas: (1) Like I mentioned in the title of this post: A combined video / screencast to explain what scientists are doing nowadays to make progress towards open science. Audience would be general public and scientists. Could use friendfeed as a starting point and then show things like open notebook science, etc... - Steve Koch
(2) Making a 5 minute video explaining the Open Notebook Science we're doing in Junior physics lab here. Target audience: other science instructors around the globe who'd be interested in using our experience to add open notebook science to their own courses. - Steve Koch
(3) (This what inspired our meeting, but then we thought up 1 and 2 above while talking) Making science videos explaining the research in our lab. Target audience: general public. - Steve Koch
(4) (This just occurred to me, we didn't talk about it): We could collaborate on making high-quality protocols videos. Something like in JoVE, but produced locally. I like all of these ideas and glad I met these students! - Steve Koch
Very cool, Steve. What can I do to help? - Mr. Gunn
Why don't you make a similar video explaining Open Science to scientists? I bet you will find a larger (or smaller) audience. - Paulo Nuin
I agree it would be larger or smaller :). @Mr. Gunn, thanks! I don't know what help -- the barrier now is to know what to do. Which of those #1-4 seem most worthwhile to you and most in need of digital media experts? #4 is the easiest for me to imagine, since I actually know what I'm trying to convey. #1 is the hardest, since I'm not sure how to coherently explain that. - Steve Koch
I am going to put a bit in my NSF CAREER budget for these projects. Talking with the students today, it seems like they could get academic credit for the project in the Media Arts programs, and also get access to the equipment. So, even there, I'm not sure what I'd need to budget for. - Steve Koch
And BTW: Both of these students took my Physics 102 "Conceptual Physics" course and did very well. So, this is cool validation for the whole teaching/research synergy idea. - Steve Koch
What is Open Science? - Paulo Nuin
I'll give this some thought, Steve. I've got my hands busy right now in terms of developing a presentation, but of course you're welcome to reuse any of my or Mary Canady's slides. - Mr. Gunn
@Paulo: what I mean by the term is the application of the free-as-in-speech principles of Free/Open Source Software to the entire enterprise of science. The Blue Obelisk folks refer to ODOSOS (http://blueobelisk.sourceforge.net/wiki... Open Data, Open Source, and Open Standards); to this I would add Open Access (publishing) and Open Licensing, to give the "five pillars" of Open Science. - Bill Hooker
Hmm. Open Data, Open Source, Open Access, and Open Standards are all forms of open licensing, aren't they? Or is there something specifically called Open Licensing that I'm not aware of? - Michael R. Bernstein
What I mean by Open Licensing is CC-style explicit permissions for re-use in copyright, and things like Science Commons' MTA or CAMBIA's BiOS licenses for stuff that's covered by patents. So yes, that's all part of Open Foo, but I think there's more to Foo than just the licensing and more to licensing than Data/Source/Access/Standards. - Bill Hooker
Ah. Open Content covers CC licenses, I'm not sure what to call MTA and CAMBIA BiOS. - Michael R. Bernstein
Other possibilities might be to include a citizen science project as an example of what can be done with greater collaboration -- such projects are good entryways into outreach since they seem to make science "closer" to a non- or emerging scientist. You could even go so far as to tailor portions of your own projects to younger audiences (K-12) -- and by doing so entice classrooms of... more... - Mickey Schafer
Sounds like a great idea and one I'm all in favour of. An idea, Steve. You might want to alert SciVee's Prof Phil Bourne http://www.scivee.tv/user/phil as I'm pretty sure he'd be [1] interested and [2] willing to lend a hand in hosting/promoting this type of project. - Graham Steel
Hey Mickey, Thanks for those ideas! Definitely interesting to me and will think about it. @Graham -- thanks for that link, I will definitely contact him when we have a clearer idea of what we're wanting to do. - Steve Koch
If you're looking for additional inspiration, try googling "digital ethnography" -- you should get to the Kansas State stuff (both their blog/web pages and youtube videos) -- you may or may not like the science-y part (anthropology -- the first video ever produced in the series was fairly annoying to me, though the subsequent ones are quite interesting), but they have some great teaching experiments using web re/sources that you might find useful. - Mickey Schafer
Second Mickey's recommendation. Very interesting stuff - e.g., a short presentation at: http://www.youtube.com/watch.... I really like the idea of doing something of similar impact for open science. The longer 55 minute Library of Congress talk from the same channel is well worth watching, IMO. - Michael Nielsen
That's spectacular! - Steve Koch
STeve - It'd be really nice to have something similar for open science. Some visual ideas I like are to make a movie of the version history on, e.g., OWW, or the Polymath wiki; visualizations of the evolving network of relationships from, e.g., GeneWiki... I'll bet things like diseases gradually being understood are actually visible in the network of links on Wikipedia. - Michael Nielsen
Steve - Jon Udell does an awesome Wikipedia movie at http://jonudell.net/umlaut.swf Things get really fun a couple of minutes in... - Michael Nielsen
Hey Michael, that too is very cool. And I really like your idea of adapting that for OWW or the other sites. I don't know if the students I talked with are good at that or not. Looking at the things you linked me made me realize that if I'm going to be heavily involved (say as "producer" with the $), then I think I'm missing some talents to lead those kinds of things. So, maybe I'm not... more... - Steve Koch
Demonstration is a more effective way of explaining a concept than just explanation is anyway! Hmm. That could've been more elegantly expressed. In any case, Demo of what you are doing with some discussion of implications (pedagogical to philosophical) also sounds like a great project. - Mickey Schafer
Hi All -- Met again with Noel and Leslie (the Media Arts peeps), and we've planned to start making the video on November 23. We're planning on short video showing what we're doing in Junior Lab with Open Notebook Science. General plan can be seen here: http://docs.google.com/Doc... and a list of potential interview questions is being generated here: http://docs.google.com/Doc... - Steve Koch
Muchas excellentos - Graham Steel
Neil Saunders
Question for bioinformaticians and computational biologists
How many of you have a professional programming qualification? Either from an educational institution (e.g. a CS degree) or something like Sun Java certification? Is it even possible, useful or desirable to get "recognised" qualifications in other programming languages? Or are most of us self-taught? - Neil Saunders
Useful for what? For landing a job with an established software company, a CS degree seems like a good idea. For a job at an outsourcing company in India, certification is a must. But if it's actual programming skills you're after, nothing beats practice (even if it's just with an open source hobby project). Ideally you get to work with people who have more experience than you do, so you're not just "self-taught", but also "group-taught". - Eric Jain
Useful for bioinformaticians and computational biologists. Just interested to know if this has affected career development for anyone; either within those subject areas, or if they've moved out into other jobs; e.g. people who've left academic life science research to become software developers. - Neil Saunders
I would think in most cases, commercial and academic, a good track record with some awesome projects would trump certification. FWIW, we don't have many people with certs, but Masters and PhDs are still common. - Matt Wood
Let me put it this way: You don't want to work for a place that filters candidates for a software development position based on their formal qualifications, rather than their experience. For what it's worth: I have some formal qualifications for being allowed near computers, but as far as I can tell that was never a factor (both in and outside of academia) for being invited to an interview or hired. Arguably that's a rather small and biased sample set, but there you go. - Eric Jain
I got RHEL certified at one point, but it was only because work paid for it to happen. But I'm a 'biologist turned informatician' and therefore have no qualifications in any CS related field - just experience! - Daniel Swan
I am self taught, no professional qualification. I thought of getting certifications but in the end decided not to, as I thought it wouldn't be useful getting them. - Paulo Nuin
I have a double bachelor's degree in CS and Bio. I felt that it helped me get into a comp bio graduate school program, at any rate. As for certifications, I personally see them as a waste of time, as most biologists in academia couldn't care less how you munge their data, as long as you do it quickly and efficiently. I'd be interested to hear if things are different in industry. - Chris Miller
The only reason I know some people got certified was because it helped them focus down and learn something they wanted to. I haven't been in any situation during a hiring decision where certification comes into play. I'd rather be pointed to a website someone has developed or some code that's on sourceforge - Deepak Singh
I've never heard of anyone asking for professional bioinformatics certification. Publish, show your previous work, yes. But certification? Never. - Andreas Matern from Alert Thingy
I have a bioinformatics masters degree, which was taken post-PhD. As part of that I was taught Java, but very much in a 'Java for Bioinformatics' style. Maybe my approach to projects earlier in my career would have benefited from some software engineering training, but you pick that stuff up as you go along :) - Simon Cockell
I have no certification, just a master's in Bioinformatics where I was taught Java, Linux and R. I taught myself Ruby. I've got no interest in certification and I think Chris Wansworth's short essay is a good guide to follow - https://gist.github.com/0a2655a... . I think this echoes the same sentiments expressed above. - Michael Barton
Paulo Nuin
Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. - http://www.citeulike.org/user...
Journal of molecular evolution, Vol. 39, No. 3. (September 1994), pp. 306-314. Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. The first, called the "discrete gamma model," uses several categories of rates to approximate the gamma distribution, with equal probability for each category. The mean of each category is used to represent all the rates falling in the category. The performance of this method is found to be quite good, and four such categories appear to be sufficient to produce both an optimum, or near-optimum fit by the model to the data, and also an acceptable approximation to the continuous distribution. The second method, called "fixed-rates model", classifies sites into several classes according to their rates predicted assuming the star tree.... - Paulo Nuin
Paulo Nuin
Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology - http://www.citeulike.org/user...
Science, Vol. 294, No. 5550. (14 December 2001), pp. 2310-2314. 10.1126/science.1065889 John Huelsenbeck, Fredrik Ronquist, Rasmus Nielsen, Jonathan Bollback - Paulo Nuin
Paulo Nuin
The Information Content of a Character under a Markov Model of Evolution - http://www.citeulike.org/user...
Molecular Phylogenetics and Evolution, Vol. 17, No. 2. (November 2000), pp. 231-243. The rate of evolutionary change associated with a character determines its utility for the reconstruction of phylogenetic history. For a given age of lineage splits, we examine the information content of a character to assess the magnitude and range of an optimal rate of substitution. On the one hand an optimal transition rate must provide sufficiently many character changes to distinguish subclades, whereas on the other hand changes must be sufficiently rare that reversals on a single branch (and hence homoplasy) are uncommon. In this study, we evolve binary characters over three tree topologies with fixed branch lengths, while varying transition rate as a parameter. We use the character state distribution obtained to measure the “information content” of a character given a transition rate. This is done with respect to several criteria—the probability of obtaining the correct tree using parsimony,... - Paulo Nuin
Paulo Nuin
Estimating the Transition/Transversion Ratio from Independent Pairwise Comparisons with an Assumed Phylogeny - http://www.citeulike.org/user...
Journal of Molecular Evolution, Vol. 44, No. 1. (24 January 1997), pp. 112-119. Abstract. A method is presented for estimating the transition/transversion ratio (TI/TV), based on phylogenetically independent comparisons. TI/TV is a parameter of some models used in phylogeny estimation intended to reflect the fact that nucleotide substitutions are not all equally likely. Previous attempts to estimate TI/TV have commonly faced three problems: (1) few taxa; (2) nonindependence among pairwise comparisons; and (3) multiple hits make the apparent TI/TV between two sequences decrease over time since their divergence, giving a misleading impression of relative substitution probabilities. We have made use of the time dependency, modeling how the observed TI/TV changes over time and extrapolating to estimate the “instantaneous” TI/TV—the relevant parameter for phylogenetic inference. To illustrate our method, TI/TV was estimated for two mammalian mitochondrial genes. For 26 pairs of cytochrome... - Paulo Nuin
Paulo Nuin
Multiple Sequence Alignment in Phylogenetic Analysis - http://www.citeulike.org/user...
Molecular Phylogenetics and Evolution, Vol. 16, No. 3. (September 2000), pp. 317-330. Multiple sequence alignment is discussed in light of homology assessments in phylogenetic research. Pairwise and multiple alignment methods are reviewed as exact and heuristic procedures. Since the object of alignment is to create the most efficient statement of initial homology, methods that minimize nonhomology are to be favored. Therefore, among all possible alignments, the one that satisfies the phylogenetic optimality criterion the best should be considered the best alignment. Since all homology statements are subject to testing and explanation this way, consistency of optimality criteria is desirable. This consistency is based on the treatment of alignment gaps as character information and the consistent use of a cost function (e.g., insertion–deletion, transversion, and transition) through analysis from alignment to phylogeny reconstruction. Cost functions are not subject to testing via... - Paulo Nuin
Paulo Nuin
Rooting with Multiple Outgroups: Consensus Versus Parsimony - http://www.citeulike.org/user...
Cladistics, Vol. 14, No. 2. (1998), pp. 193-200. Using outgroup(s) is the most frequent method to root trees. Rooting through unconstrained simultaneous analysis of several outgroups is a favoured option because it serves as a test of the supposed monophyly of the ingroup. When contradiction occurs among the characters of the outgroups, the branching pattern of basal nodes of the rooted tree is dependent on the order of the outgroups listed in the data matrix, that is, on the prime outgroup (even in the case of exhaustive search). Different equally parsimonious rooted trees (=cladograms) can be obtained by permutation of prime outgroups. An alternative to a common implicit practice (select one outgroup to orientate the tree) is that the accepted cladogram is the strict consensus of the different equally parsimonious rooted trees. The consensus tree is less parsimonious but is not hampered with extra assumption such as the choice of one outgroup (or more) among the initial number of... - Paulo Nuin
Jan Aerts
Cameron, Mendeley and Google Wave on BBC News http://news.bbc.co.uk/2...
Wot no http://www.citeulike.org ? Which one is betamax, which one VHS? http://en.wikipedia.org/wiki... - Duncan Hull
because they do not have a Pr machine ... - Paulo Nuin
Cool article - looks really similar to the Channel 4 one though! - Euan
Chris Lasher
I have Paulo Nuin sightings on FriendFeed. Confirm? http://friendfeed.com/search...
Confirmed! :) - Ricardo Vidal
I´m around ... - Paulo Nuin
Jeff Habig
Sorry, hit enter early. Looking for http://bit.ly/1IM1qX&... and/or http://bit.ly/43o3AV&... if possible.
the links are still bad ... - Paulo Nuin
Neil Saunders
Just struck me that despite installing/configuring skype many times, I have not once used it to speak to someone.
Why install/configure it then? - Benjamin Tseng
I'm b.hooker. - Bill Hooker
call me then! - Paulo Nuin
Skype is the only app that can correctly do video calls in my laptop (maybe kopete/amsn have some issues with V4L). I also have found that TeamSpeak is very useful for small groups as you can install a private server for use of your lab, research group, etc. - Marcos de Carvalho
Paulo Nuin
Human genomes as email attachments. - http://www.citeulike.org/user...
Bioinformatics (Oxford, England) (7 November 2008) SUMMARY: The amount of genomic sequence data being generated and made available through public databases continues to increase at an ever-expanding rate. Downloading, copying, sharing and manipulating these large data sets is becoming difficult and time-consuming for researchers. We need to consider using advanced compression techniques as part of a standard data format for genomic data. The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs. We present a series of techniques that in combination reduces a single genome to a size small enough to be sent as an email attachment. AVAILABILITY: Our algorithms are implemented in C++ and are freely available from http://www.ics.uci.edu/~xhx.... CONTACT: xhx@ics.uci.edu or chenli@ics.uci.edu. Scott Christley, Yiming Lu, Chen Li, Xiaohui Xie - Paulo Nuin
Andrew Su
Vista or Windows 7? (One year macbook experiment is over -- going back to windows...)
What happened? (Or didn't happen?) - Chris Lasher
95 or Me. - Paulo Nuin
ME, definitely. Best. OS. Ever. - Chris Lasher
Go on, give Ubuntu a go. - Neil Saunders
2nd the Ubuntu recommendation ;) - Mitchell McKenna
thanks for the thoughts. (Well, except for the snarky ME thoughts... ;) ) Ubuntu maybe as a VM. I'm in the corporate world, and playing nice with the rest of the infrastructure is important. - Andrew Su
I have very, very few interoperability issues using Ubuntu. OpenOffice handles pretty much every MS Office document. One exception is connecting to Exchange servers (I believe this is improving in Evolution). And if you really must, you can always run Windows software under Wine. - Neil Saunders
were you able to play nice with the corporate infranstructure on a mac? - Mitchell McKenna
I can play nice with the corporate and academic infrastructure with a Mac. I don't waste hours of me time fighting to make Windows bearable and don't waste days trying to configure my video card to make the UI less crappy. Not mentioning fighting with OpenOffice. - Paulo Nuin
@Mitchell, yes, the mac platform is fully supported for me. I just didn't like the OS. Very few things were more efficient (shell scripting) and many things were less efficient (office apps, the lack of an alt key for keyboard shortcuts, exchange). and even with 4GB memory, I found myself fighting apparent memory issues all the time. My Dell XPS 13 with 256GB SSD, 8GB RAM arrived today... - Andrew Su
I'll be running what little Win I need using VirtualBox on my Mac. For reasons I can't go into, I need MS Access for a short time, just until I port over to MySQL. Ick. - Christopher Fields
Windows 7, b/c Vista I think is (unfairly) being euthanized. Also, I heard 7 is having a coming out party, so get your tiara ready! - delagoya
apparently IT wasn't ready to go 7 quite yet. I guess that means I'll just get them to reimage when they do (or maybe VM it)... - Andrew Su
win7 is very similar to vista, with modifications to lower system resource usage. disable aero in vista to achieve something similar - Mike Chelen
Win7 and its not even close. Lots of UI improvements and back end stuff. - John Hogenesch
win 7 (+ubuntu). i'm also going back to win next week, after using a mac for almost 3 years :) never got used to the keyboard layout, also memory issues, FF 3.5 crashes all the time, office 4 mac sucks, no total commander, no picasa, etc, etc - Endre Sebestyen
Are you sure it wans't that you weren't understanding how memory is reported? 0 free memory is the most optimal state. - Rich
win7! Been running it since beta came out and now have the final version (out ahead of time by our uni) on all my work machines. Best OS for me so far (and I've tried most of them). - Björn Brembs
Paulo Nuin
I would appreciate anyone sending me this ref:http://www.landesbioscience.com/journal.... Please send to paulo.nuin at gmail. Thanks in advance.
Sent! - Walter Jessen
Thanks a lot! - Paulo Nuin
Paulo Nuin
LEARNING ABOUT MODES OF SPECIATION BY COMPUTATIONAL APPROACHES - http://www.citeulike.org/user...
Evolution, Vol. 9999, No. 9999. (2009) How often do the early stages of speciation occur in the presence of gene flow? To address this enduring question, a number of recent papers have used computational approaches, estimating parameters of simple divergence models from multilocus polymorphism data collected in closely related species. Applications to a variety of species have yielded extensive evidence for migration, with the results interpreted as supporting the widespread occurrence of parapatric speciation. Here, we conduct a simulation study to assess the reliability of such inferences, using a program that we recently developed MCMC estimation of the isolation-migration model allowing for recombination (MIMAR) as well as the program isolation-migration (IM) of Hey and Nielsen (2004). We find that when one of many assumptions of the isolation2013migration model is violated, the methods tend to yield biased estimates of the parameters, potentially lending spurious support for... - Paulo Nuin
Paulo Nuin
integrOmics: an R package to unravel relationships between two omics data sets. - http://www.citeulike.org/user...
Bioinformatics (Oxford, England) (25 August 2009), btp515. MOTIVATION: With the availability of many 'omics' data, such as transcriptomics, proteomics or metabolomics, the integrative or joint analysis of multiple datasets from different technology platforms is becoming crucial to unravel the relationships between different biological functional levels. However, the development of such an analysis is a major computational and technical challenge as most approaches suffer from high data dimensionality. New methodologies need to be developed and validated. RESULTS: integrOmics efficiently performs integrative analyses of two types of 'omics' variables that are measured on the same samples. It includes a regularized version of Canonical Correlation Analysis to enlighten correlations between two data sets, and a sparse version of Partial Least Square regression that includes simultaneous variable selection in both data sets. The usefulness of both approaches has been demonstrated... - Paulo Nuin
Paulo Nuin
Personalized copy number and segmental duplication maps using next-generation sequencing. - http://www.citeulike.org/user...
Nature genetics, Vol. advance online publication (30 August 2009) Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73-87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 x 10(-16)). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based... - Paulo Nuin
Paulo Nuin
EDGE3: A web-based solution for management and analysis of Agilent two color microarray experiments - http://www.citeulike.org/user...
BMC Bioinformatics, Vol. 10, No. 1. (2009), 280. BACKGROUND:The ability to generate transcriptional data on the scale of entire genomes has been a boon both in the improvement of biological understanding and in the amount of data generated. The latter, the amount of data generated, has implications when it comes to effective storage, analysis and sharing of these data. A number of software tools have been developed to store, analyze, and share microarray data. However, a majority of these tools do not offer all of these features nor do they specifically target the commonly used two color Agilent DNA microarray platform. Thus, the motivating factor for the development of EDGE3 was to incorporate the storage, analysis and sharing of microarray data in a manner that would provide a means for research groups to collaborate on Agilent-based microarray experiments without a large investment in software-related expenditures or extensive training of end-users.RESULTS:EDGE3 has been developed... - Paulo Nuin
Paulo Nuin
Re: Google Scholar metadata quality and Mendeley hype - http://iphylo.blogspot.com/2009...
"Ideally publishers should help on the extraction, maybe adding better metadata to the pdfs, even creating better filenames. But we know they won't do that if there's no clear commercial benefit, so we have to rely on Google's mammoth structure to actually spend some time working on it, or expect that that Mendeley's PR gets halted and allow them to spend some time actually coding. About my harsh critics of Mendeley, they are not without merit. First, it's a PR-only-software with horrible usability and even worse performance. Second, the hype. And third, the attempted censorship of my critics. But that's just me." - Paulo Nuin
Paulo Nuin
Bayesian phylogeography finds its roots. - http://www.citeulike.org/user...
PLoS computational biology, Vol. 5, No. 9. (25 September 2009), e1000520. As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable... - Paulo Nuin
Paulo Nuin
The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature - http://www.citeulike.org/user...
BMC Bioinformatics, Vol. 10, No. 1. (22 September 2009), 303. BACKGROUND:One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM --- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature.RESULTS:The taxonomy is based on expert annotation of 1297 abstracts downloaded from relevant PubMed journals. It classifies 1742 unique keywords found in the corpus to 48 classes which specify core evidence required for CRA. We report promising results with inter-annotator agreement tests and automatic classification of PubMed abstracts to taxonomy classes. A simple user test is also reported in a near real-world CRA... - Paulo Nuin
Paulo Nuin
TagDust - A program to eliminate artifacts from next generation sequencing data. - http://www.citeulike.org/user...
Bioinformatics (Oxford, England) (7 September 2009), btp527. MOTIVATION: Next-generation parallel sequencing technologies produce large quantities of short sequence reads. Due to experimental procedures various types of artifacts are commonly sequenced alongside the targeted RNA or DNA sequences. Identification of such artifacts is important during the development of novel sequencing assays and for the downstream analysis of the sequenced libraries. RESULTS: Here we present TagDust, a program identifying artifactual sequences in large sequencing runs. Given a user-defined cutoff for the false discovery rate (FDR), TagDust identifies all reads explainable by combinations and partial matches to known sequences used during library preparation. We demonstrate the quality of our method on sequencing runs performed on Illumina's Genome Analyzer platform. AVAILABILITY: Executables and documentation are available from http://genome.gsc.riken.jp/osc.... CONTACT:... - Paulo Nuin
Paulo Nuin
PhyloPattern: regular expressions to identify complex patterns in phylogenetic trees - http://www.citeulike.org/user...
BMC Bioinformatics, Vol. 10, No. 1. (19 September 2009), 298. BACKGROUND:To effectively apply evolutionary concepts in genomic-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise. Complex architectures must be recognized within the trees, so that associated information can be extracted. RESULTS:Here, we present a new software library, PhyloPattern, for automating tree manipulations and analysis. PhyloPattern includes three main modules, that address essential tasks in high-throughput phylogenetic tree analysis: node annotation, pattern matching, and tree comparison. PhyloPattern thus allows the programmer to focus on: i) the use of predefined or user defined annotation functions to perform immediate or deferred evaluation of node properties, ii) the search for user-defined patterns in large phylogenetic trees, iii) the pairwise comparison of trees by dynamically generating patterns from one tree and applying them... - Paulo Nuin
Paulo Nuin
Fast mapping of short sequences with mismatches, insertions and deletions using index structures. - http://www.citeulike.org/user...
PLoS computational biology, Vol. 5, No. 9. (11 September 2009), e1000502. With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For... - Paulo Nuin
Paulo Nuin
Real lives and white lies in the funding of scientific research: the granting system turns young scientists into bureaucrats and then betrays them. - http://www.citeulike.org/user...
PLoS biology, Vol. 7, No. 9. (15 September 2009), e1000197. Peter Lawrence - Paulo Nuin
Paulo Nuin
Bayesian statistical methods for genetic association studies. - http://www.citeulike.org/user...
Nature reviews. Genetics, Vol. 10, No. 10. (01 October 2009), pp. 681-690. Bayesian statistical methods have recently made great inroads into many areas of science, and this advance is now extending to the assessment of association between genetic variants and disease or other phenotypes. We review these methods, focusing on single-SNP tests in genome-wide association studies. We discuss the advantages of the Bayesian approach over classical (frequentist) approaches in this setting and provide a tutorial on basic analysis steps, including practical guidelines for appropriate prior specification. We demonstrate the use of Bayesian methods for fine mapping in candidate regions, discuss meta-analyses and provide guidance for refereeing manuscripts that contain Bayesian analyses. Matthew Stephens, David Balding - Paulo Nuin
Paulo Nuin
Social tagging in the life sciences: characterizing a new metadata resource for bioinformatics - http://www.citeulike.org/user...
BMC Bioinformatics, Vol. 10, No. 1. (2009), 313. BACKGROUND:Academic social tagging systems, such as Connotea and CiteULike, provide researchers with a means to organize personal collections of online references with keywords (tags) and to share these collections with others. One of the side-effects of the operation of these systems is the generation of large, publicly accessible metadata repositories describing the resources in the collections. In light of the well-known expansion of information in the life sciences and the need for metadata to enhance its value, these repositories present a potentially valuable new resource. Here we characterize the current and prospective contents of two scientifically relevant metadata repositories created through social tagging. This investigation helps to establish how such socially constructed metadata might be used as it stands currently and to suggest ways that new social tagging systems might be designed that would yield better aggregate... - Paulo Nuin
Paulo Nuin
Re: Google Scholar metadata quality and Mendeley hype - http://iphylo.blogspot.com/2009...
"That Mendeley page is nauseating. And why would they keep Ricardo's My Biotech Life "review" there when he's a paid employee of the company? The problem of extracting metadata from publications is a problem too hard for Mendeley's staff to understand. They are too busy praising themselves and seeing ways to improve their failed PR machine. It's sad that Google is not investing hard on this type of problem, it would be a godsend." - Paulo Nuin
Paulo Nuin
An enhanced RNA alignment benchmark for sequence alignment programs - http://www.citeulike.org/user...
Algorithms for Molecular Biology, Vol. 1, No. 1. (24 October 2006), 19. BACKGROUND:The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone - the similarity range where alignment quality drops drastically - starts at 60 % for RNAs in comparison to 20 % for proteins. In this study we enhance the previous benchmark.RESULTS:The RNA sequence sets in the benchmark database are taken from an increased number of RNA families to avoid unintended impact by using only a few families. The size of sets varies from 2 to 15 sequences to assess the influence of the number of sequences on program performance. Alignment quality is scored by two measures: one takes into account only nucleotide matches, the other measures... - Paulo Nuin
Other ways to read this feed:Feed readerFacebook