Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Data Mining

Data Mining

A room about Data Mining
Blog
Mike Chelen
Mike Chelen
@ginatrapani's twitalytic at GitHub - http://github.com/ginatra...
@ginatrapani's twitalytic at GitHub
"Twitter data crawler, replies archiver, and statistics generator" - Mike Chelen from Bookmarklet
Siddharth Mitra
Mike Chelen
Fwd: Trevor Hastie - The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Second Edition) - http://www-stat.stanford.edu/~hastie... (via http://friendfeed.com/the-lif...)
Fwd: Trevor Hastie - The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Second Edition) - http://www-stat.stanford.edu/~hastie/pub.htm (via http://ff.im/95Gqi)
Data Mining
DDJ OpenSource: Milepost GCC Now Available: An open-source machine-learning compiler that intelli.. http://tinyurl.com/lreqca - http://twitter.com/schamsc...
Data Mining
SemanticBot: #SemNav : IBM offers open source machine learning compiler: By Paul Krill | InfoWorld IBM is announcing on Tuesday ... http://bit.ly/D0oCh - http://twitter.com/SemNav...
Roger Chen
TechnoCalifornia: "I like it... I like it not" or How miss-behaved users are when giving feedback - http://technocalifornia.blogspot.com/2009...
TechnoCalifornia: "I like it... I like it not" or How miss-behaved users are when giving feedback
Mike Chelen
"Arguably the one producing the best (most accurate) results is Tesseract. It is a technology initially developed by HP Labs between 1985 and 1995, then they open-sourced it in 2005. Tesseract can recognize text in 7 different languages: English, German, French, Italian, Spanish, Brazilian Portuguese and Dutch. You can install more than one dictionaries if you need. It does not support layout analysis, so multi-column text, images, equations etc. should give you a garbled text output. Also, it only supports TIFF images as input." - Mike Chelen from Bookmarklet
Mike Chelen
"OCRopus(tm) is a state-of-the-art document analysis and Optical Character Recognition (OCR) system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. OCRopus development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications." - Mike Chelen from Bookmarklet
Mike Chelen
Mike Chelen
"the USGS has made available an instantaneous values water web service that may be of interest to many users. You can use this service to retrieve recent values for streamflow (as well as other real-time parameters served by the USGS) rather than the current and somewhat cumbersome method of downloading tab-delimited (RDB) files from the USGS Water Data for the Nation site. This service provides recent real-time USGS water data in Extensible Markup Language (XML), which is a more modern way of acquiring data." - Mike Chelen from Bookmarklet
Data Mining
Mike Chelen
OECD Statistics (GDP, unemployment, income, population, labour, education, trade, finance, prices...) - http://stats.oecd.org/Index...
"SELECT a dataset in the left-hand menu. CREATE and customize your table by clicking on "current data selection". RESHAPE your table using "pivot dimensions" to move rows and columns. TAKE AWAY the data to Excel or CSV, print your query or save it for later use." - Mike Chelen from Bookmarklet
Mike Chelen
"This page allows you to submit a webpage URL and have the taxon names within the page automatically identified and linked up to projects which have information about those names. This demo uses the NameTag API" - Mike Chelen from Bookmarklet
Mike Chelen
Young Rewired State: Letting young hackers show the government the way (Yahoo! Developer Network Blog) - http://developer.yahoo.net/blog...
Young Rewired State: Letting young hackers show the government the way (Yahoo! Developer Network Blog)
"Last weekend the Young Rewired State event in London, England, a few dozen hackers aged 15-18 years came to Google's office to hack with government data." - Mike Chelen from Bookmarklet
Mike Chelen
"After 47 great entries, we have three finalists in the Apps for America contest, and now it is time for us to figure out the winners." - Mike Chelen from Bookmarklet
Mike Chelen
Installing Twitalytic on DreamHost - twitalytic - GitHub - http://wiki.github.com/ginatra...
Mike Chelen
Datawocky: More data usually beats better algorithms, Part 2 - http://anand.typepad.com/datawoc...
Mike Chelen
"We investigate the government datasets using semantic web technologies. Currently, we are translating such datasets into RDF, getting them linked to linked data cloud, and developing interesting applications and demos on linked government data." - Mike Chelen from Bookmarklet
Mike Chelen
"Find the most on-time flight between two airports or check how late your flight is on average, in good weather and bad, before you leave." - Mike Chelen from Bookmarklet
Mike Chelen
BibApp a Campus Research Gateway and Expert Finder - http://bibapp.org/
BibApp a Campus Research Gateway and Expert Finder
"BibApp matches researchers on your campus with their publication data and mines that data to see collaborations and to find experts in research areas." - Mike Chelen from Bookmarklet
I am a non-technical lead on the BibApp, and would be happy to answer questions. The devs are in the middle of a serious code push. - D0r0th34
Mike Chelen
New Public Library of Science Search using DeepDyve - http://www.plos.org/search.php
New Public Library of Science Search using DeepDyve
"Use one of the forms below to search either the contents of the PLoS Journals sites or the PLoS.org site and PLoS.org blogs. Don't feel restricted to searching on only a few words, enter as many as you like because we're powered by DeepDyve search technology that returns results from the Deep Web." - Mike Chelen from Bookmarklet
Arturo Servin
Wonder if they are using RL (they say something about a reward). Robot Teaches Itself to Smile | Wired Science - http://www.wired.com/wiredsc...
Arturo Servin
Ranking NFL Teams with Neural Nets | Neural Market Trends - http://www.neuralmarkettrends.com/2009...
Data Mining
Sylvain Carle: RT @nside: lots of good stuff in the everyblock sources; powerful geo data mining code in the ebdata pkg - http://www.everyblock.com/code... - http://twitter.com/afrognt...
Data Mining
Denis Laprise: lots of good stuff in the everyblock sources; powerful geo data mining code in the ebdata package.. - http://twitter.com/nside...
Data Mining
SemanticBot: #SemNav : DDJ HPC: Milepost GCC Now Available: An open-source machine-learning compiler that intelligently optimiz.... http://bit.ly/T0d7o - http://twitter.com/SemNav...
Data Mining
SemanticBot: #SemNav : "Using The Most Advanced Machine Learning Algorithm (R.I.P.P.E.R) To Predict The Market" Richard Stevenso... http://bit.ly/SZN5B - http://twitter.com/SemNav...
Data Mining
ChinaQuake,WouldYouHelp?: Chris: Download Microsoft MyMedia Multimedia Recommender System - http://news.softpedia.com/news...
Data Mining
SemanticBot: #SemNav : DDJ OpenSource: Milepost GCC Now Available: An open-source machine-learning compiler that intelli.. http:... http://bit.ly/G4AB3 - http://twitter.com/SemNav...
Other ways to read this feed:Feed readerFacebook