"Arguably the one producing the best (most accurate) results is Tesseract. It is a technology initially developed by HP Labs between 1985 and 1995, then they open-sourced it in 2005. Tesseract can recognize text in 7 different languages: English, German, French, Italian, Spanish, Brazilian Portuguese and Dutch. You can install more than one dictionaries if you need. It does not support layout analysis, so multi-column text, images, equations etc. should give you a garbled text output. Also, it only supports TIFF images as input."
- Mike Chelen
from Bookmarklet
"OCRopus(tm) is a state-of-the-art document analysis and Optical Character Recognition (OCR) system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. OCRopus development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications."
- Mike Chelen
from Bookmarklet
"the USGS has made available an instantaneous values water web service that may be of interest to many users. You can use this service to retrieve recent values for streamflow (as well as other real-time parameters served by the USGS) rather than the current and somewhat cumbersome method of downloading tab-delimited (RDB) files from the USGS Water Data for the Nation site. This service provides recent real-time USGS water data in Extensible Markup Language (XML), which is a more modern way of acquiring data."
- Mike Chelen
from Bookmarklet
"SELECT a dataset in the left-hand menu. CREATE and customize your table by clicking on "current data selection". RESHAPE your table using "pivot dimensions" to move rows and columns. TAKE AWAY the data to Excel or CSV, print your query or save it for later use."
- Mike Chelen
from Bookmarklet
"This page allows you to submit a webpage URL and have the taxon names within the page automatically identified and linked up to projects which have information about those names. This demo uses the NameTag API"
- Mike Chelen
from Bookmarklet
"Last weekend the Young Rewired State event in London, England, a few dozen hackers aged 15-18 years came to Google's office to hack with government data."
- Mike Chelen
from Bookmarklet
"After 47 great entries, we have three finalists in the Apps for America contest, and now it is time for us to figure out the winners."
- Mike Chelen
from Bookmarklet
"We investigate the government datasets using semantic web technologies. Currently, we are translating such datasets into RDF, getting them linked to linked data cloud, and developing interesting applications and demos on linked government data."
- Mike Chelen
from Bookmarklet
"Find the most on-time flight between two airports or check how late your flight is on average, in good weather and bad, before you leave."
- Mike Chelen
from Bookmarklet
"BibApp matches researchers on your campus with their publication data and mines that data to see collaborations and to find experts in research areas."
- Mike Chelen
from Bookmarklet
I am a non-technical lead on the BibApp, and would be happy to answer questions. The devs are in the middle of a serious code push.
- D0r0th34
"Use one of the forms below to search either the contents of the PLoS Journals sites or the PLoS.org site and PLoS.org blogs. Don't feel restricted to searching on only a few words, enter as many as you like because we're powered by DeepDyve search technology that returns results from the Deep Web."
- Mike Chelen
from Bookmarklet