Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
- arek
Can you see different colors for elements, species and chemicals in this Open Access article..again, only works in Internet Explorer: http://www.chemspider.com/Article...
I see nothing. Any chance we'll see a FF version?
- Egon Willighagen
What tool are you using on the backend for entity extraction? Out of curiosity (I don't have IE handy) how many terms does it highlight in that text?
- Rajarshi Guha
Works here. What tool is used to extract the terms?
- Rajarshi Guha
Michael, cool! That works for me too! But that's not ChemMantis, or is it? Where does this come from? And a very cool TLD, btw!
- Egon Willighagen
Oh, and does it use semantic markup, like microformats or RDFa? Would love to see our Userscripts work on it!
- Egon Willighagen
I noticed the class="reflect_chemical"... I'll update my userscript to support that (soon)
- Egon Willighagen
Reflect is using the STRING/STITCH text-mining engine, plus lots of Java/browser plug-in magic by Evangelos Pafilis et al. I'm sure that if you contact Evangelos or Sean O'Donoghue with an idea about microformats etc. they might be able to embed something.
- Michael Kuhn
(the approach is dictionary based, so no fancy recognition of IUPAC names etc.)
- Michael Kuhn
Can you give an idea of how many terms it recognizes in that doc? For example, OSCAR (via the Greasemonkey script) gets 150 terms.
- Rajarshi Guha
Short question to databases masters - I have Uniprot XML (it's just an example, the final data will be much more complex and much less complete) which I would like to query in many different ways. What is the most efficient storage? RDBMS? CouchDB? HBase? Something else?
Depends what you are trying to do, but an XML database might be the best options. There are several open-source "free" ones but also some decent commerical ones (some of which are described at http://en.wikipedia.org/wiki...)
- Duncan Hull
Pawel I don't have any experience with CouchDB and HBase. They are quite fresh projects so there might be some troubles with support (correct me if i'm wrong). I was using mysql for some Uniprot data and with good indexation there were no problem at all to effectively find and fetch few thousands of rows at once.
- arek
Depends on the structure of the data... if the XML reflects a simple RDBS design, then RDBS; if you have, instead, unpredictable structures in the XML, or very many fields in general which are only sparsely filled, you either have to carefully think about normalization (etc), or just use a XML db. The latter situation is where XML is really paying off: multiple namespaces.
- Egon Willighagen
Thanks a lot guys! Amount of data isn't really an issue, but their structure indeed is, so I'll have a look at XML dbs. Uniprot was just an example of dataset that doesn't have all fields filled in (probably not the best one). I didn't want to suggest anything, but I was blown away by the new Parallax interface to Freebase - I'm aiming at something partially similar to it.
- Pawel Szczesny
Postgres recently added XML support in the DB, so you can do XPath queries right in the SQL I think
- Rajarshi Guha
Pawel: I'm curious to see what solution you come up with and how well it works ... I had to parse Uniprot/Trembl XML last week - I only needed one quick and dirty pass, so I used the Python SAX parser. Silly idea ... it took many hours to run over all 26 Gb.
- Andrew Perry
Thanks Rajarshi, I'll have a look on Postgres (it was my favourite rdbms some time ago). Andrew, I'm going to share whatever I find useful - in theory dumping biodata into Freebase would be the fastest solution, but it's not that simple due to various licensing issues...
- Pawel Szczesny
Yeh, the complexities of dealing the the licensing really put me off Freebase. Many many months ago (when I first got Freebase access) I was disappointed with the protein data in there. I decided rather than complaining, I'd just remedy the situation and import some large, freely available datasets. The definition of "freely available" was the problem. It was enough to make me give up on the idea in the short term ... it seems you must be allowed to re-license any data you put into Freebase as CC-BY.
- Andrew Perry
That means dumping in the whole PDB is probably okay (it's essentially public domain, with a few extra 'requests' like attribution and filenaming to avoid confusion ... http://tinyurl.com/58hcpj ). The situation with Uniprot (IMHO probably the coolest protein database around), is a little grey ... it is CC-BY-ND, so I assume that dumping it into Freebase could be considered a "derivative" which is not allowed by the license (http://www.uniprot.org/help...) ? I'm not an intellectual property lawyer ...
- Andrew Perry
Does anyone know any automated way to prepare up to date crossreferences of protein id's ? I was thinking of simple scripts fetching data from ftp biological servers,parsing and then loading data into mysql db server ? But I am suppose there are better ways :).Maybe someone has already made it ?
Wow what a marvellous response time ;-). Pawel and Duncan, Real thanks :)
- arek
Prompt http://www.geneinfo.eu/prompt... is also interesing tool as far as not only protein id's mapping but also retrieval, analysis, and comparison are concerned (software described in publication recommanded by Duncan)
- arek
It's worth to mention that PICR enables automation of resolving mapping problems (implemented by the means of SOAP and REST protocols)
- arek
Note that PICR maps on 100% sequence identity (as it's based on UniParc) -- even across species -- which may or may not be what you want. The UniProt mapping service limit should be removed in one of the next releases (or so I'm told) and it also has a REST interface (see http://www.uniprot.org/faq...). But it can be slow for large sets of identifiers, and only makes sense when mapping to or from UniProt(KB).
- Eric Jain
This will get you through to the links for the live video of my talk tomorrow 'Science in the YouTube Age' (see also http://blog.openwetware.org/sc...). I will also be aiming to record a screen cast (video card permitting). I'm taking two laptops with me - so will try to answer any questions (and possibly alter flow of talk) at either at this item, via twitter (@cameronneylon) or on the liveblog service at the above link. I make no promises that any of this will actually work :)
- Cameron Neylon
just occurs to me - can we do anything cool with a linkup to ISMB? Was going to use the coverage there as one of my examples...
- Cameron Neylon
Cameron - Just to confirm, you'll be on at 12:45 UTC? Not sure whether I'll be able to watch it live (it's a bit early, here in Toronto!), but good luck!
- Michael Nielsen
Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers: * are simpler * are 3 to 10 times smaller * are 20 to 100 times faster * are less ambiguous * generate data access classes that are easier to use programmatically
- arek
Don't see much benefit for large files that are streamed (gzip does a good job there)? But could be great for storing or transferring large quantities of small bits of data -- e.g. when using Google's App Engine datastore!
- Eric Jain
My first reaction. Not another data interchange format :)
- Deepak Singh
You're an addicted user of FriendFeed, and we understand you! isn't it just so Magic!. NoiseRiver is not intended to replace FriendFeed, it's an experimental service still in early alpha stage of developement that aims to extend friendFeed with some cool features like ranking on your interests and/or your feelings about other friendfeed's users. Give it a try, login with your nickname and remote Key!
- arek
(...) Typical personal computers calculate 64 bits of data at a time. A 64- qubit quantum computer would be about 18 billion billion times faster.(...)
- arek
In "The God Particle" Leon Lederman genuinely presented some laws,principles connected with mathematics,physic, chemistry. Definately recommended for everyone impressed with the article about mathematics in the natural science. Maybe someone has other recommendations ? - http://www.amazon.com/God-Par...
Genome Biology | Full text | Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli - http://genomebiology.com/2008...
Couple of Linux boxes plus single machine with Windows. After two years spent with Apple laptops I would rather use something that works ;)
- Pawel Szczesny
admission time - I'll start. Win XP - PC
- Graham Steel
I am a full Mac guy, but I am not a bioinformatician, so all my demands are fulfilled. at home: latest black MacBook 4GB RAM, at work: oldest white MacBook 1.25 GB RAM Oh, yeah and there is a new HP Compaq official XP PC on the top of my desk, somebody turned it on/off once.
- Attila Csordas
Vista Box, Linux (virtual machines at home, box at work), iBook, iMac, XP as virtual machine in Vista and iMac.
- Paulo Nuin
Since our data acquisition is all controlled by Windows based software, it's a Windows box for me. But that doesn't bother me; I'm agnostic when it comes to operating systems.
- NatBlair
Win XP PC, thinking of moving to Mac or Ubunto. Linux on clusters
- Pedro Beltrao
Macbook Pro (recently acquired) + Linux VMs + Windows at work
- Deepak Singh
Linux ! I can't imagine doing bioinformatics, programming without it !!
- Pierre Lindenbaum
XP PC at work and home and additional PC Linux mint at home, had Vista, got rid of it thank god, hoping to find new PC laptop with XP on it, but hard to find....
- Nils Reinton
Mac OS X for most things at work, but it's an old box so for some stuff I have to use the new XP PCs. Win/XP at home because the spousal unit used to do Win-based IT for a living. Have a Red Hat install disc somewhere and all kinds of good intentions -- probably be better off with Ubuntu now though? Vista eats babies.
- Bill Hooker
Mac OS X Leopard on iMac and laptop. I have an old Dell on XP Pro but it is almost dead and crawls at the pace of a snail.
- Sally Church
At work, *nix for big machines and clusters; my workstation is an 8-core Mac Pro. At home, the PC I built runs Ubuntu/WinXP and I have an older Macbook. I prefer to develop for Linux, but I'm OS agnostic otherwise.
- Adam Kraut
I've noticed that this is a question many people like to answer but they don't really like it. :) The statistics is interesting.
- Attila Csordas
Now, I would be curious to know who is working at the bench and still analyzes the 'omics' data with microsoft excel ....
- Pierre Lindenbaum
That's a question many people would NOT like to answer
- Deepak Singh
No, Deepak, people answer to this question just don't like it. (How many people liked this message?) That's a difference. :)
- Attila Csordas
Mac OS X (Used to be PC Win NT also)
- Mitchell Tsai
Does it really matter? All heavy tasks are done on remote machines which are always *nix based, so even if sb uses win, putty is a must. Personally, I use OS X on Mac, about 30% in Terminal.
- Piotr Byzia
Vista at home, Ubuntu mostly for development.
- Michael Nielsen
after a few years with OSX, switched to Linux / PC for cheaper hardware... Linux blade servers at work for the heavy stuff... lately 've been checking out BSDs, amazed by the lightweight-ness yet full power of NetBSD and contemplating a transition towards that direction (who needs multimedia ?!!? ) :-)
- Ntino
After reading this I was curious about how using the old Dell would go, so I cranked it up (zzzzzz) alongside a PB of the same genre and boy, what a difference! I wouldn't go back to a PC and Windoze if you paid me - too slow and clunky and Firefox mysteriously closed twice. Hi hum.
- Sally Church
Linux (Ubuntu these days). I also have a (rarely used) Vista partition on my laptop ... it's hard to buy one with the hardware specs you want without being forced to also absorb the cost of the bundled Windows installation.
- Andrew Perry
OS X on my laptop, Linux on our servers. VMWare for cross browser testing.
- Matt Wood
100% Linux at home and work, mostly Ubuntu. Quit Windows about 6-7 years ago, never looked back. Like the idea of OS X but haven't used it much.
- Neil Saunders
Linux (Ubuntu at work and Debian on laptop). Virtualbox for some nice soft on Win XP.
- arek
Windows Vista at work, and I like it with UAC disabled. XP at home. Symbian is my mobile OS of choice. I like to build my own boxes and tinker, so PCs have been the obvious choice. Still waiting for a linux distro as useable as windows.
- Mr. Gunn