OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
- Josh Young
"Your interview with Josh Cohen was killer, btw. Thank you for that. If Google's operating margins are really 63%, why isn't there room for MSFT to pay up for content to index? If, as Ryan Chittum argues, search traffic just isn't that valuable for a newspaper's website--on top of the fact that Bing should be interested in grabbing at market share--why is it so clear that GOOG's counteroffer to News Corp with be nothing more than "token"? I want to be clear here. I'm not saying that the possibility of lacking News Corp's content in its index should outright terrify Google. I am suggesting, modestly, that the situation has more than zero plausibility. It's a legitimately interesting business question, in other words."
- Josh Young
"Don't miss Ryan Chittum's post at the Columbia Journalism Review. He reacts to some odd assumptions in Danny Sullivan's post and runs the numbers himself. http://www.cjr.org/the_aud..."
- Josh Young
Prof. Goel is helping twitter with its reputation system. | Research interests: Methodological: Algorithms, optimization, stochastics, graph theory. Applications: Network and Internet algorithms; molecular algorithms; Internet commerce and social networks.
- Josh Young
Wikipedia turns out to have distinct division into communities. Communities in Wikipedia turn out to contain more semantically similar Wikipedia articles. Within such communities, the highest PageRank score typically gets assigned to the article that gives in its title the topic of the whole community.
- Josh Young
Last weekend I wrote about how the big social gaming companies are making hundreds of millions of dollars in revenue on Facebook and MySpace through games like Farmville and Mobsters. Users are tricked into these lead gen scams.
- Josh Young
Michael Arrington posted over the weekend about CPA offers within social games and questioned why facebook, myspace, zynga and others would expose these to our users. He raises good points about ‘scammy’ advertisers and the bad user experience they create. I agree with him and others that some of these offers misrepresent and hurt our industry.
- Josh Young
"The supply of content produced by newspapers will crater soon enough, sure, but I doubt that the supply of sufficiently close substitutes will diminish enough that newspapers regain significant pricing power. Those days are simply gone, no?"
- Josh Young
A feed fetching and parsing library that treats the internet like Godzilla treats Japan: it dominates and eats all. Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the taf2-curb gem for faster http gets, and libxml through nokogiri and sax-machine for faster parsing.
- Josh Young
This book describes the important ideas in data mining, machine learning, and bioinformatics in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees, and boosting--the first comprehensive treatment of this topic in any book.
- Josh Young
Django-Supertagging is an automated tagging application. It is based on Django-Tagging and uses Open-Calais to retrieve the data. Vist the wiki for more information.
- Josh Young
The goal of the django-calais project is to help manage the complexity involved in retrieving, storing, and processing Calais results for your Django models. Essentially it lets you submit any Django model to the Calais service for analysis, then automatically parses the results and stores them in a set of semantic models.
- Josh Young
Our new custom RSS tool is intended for all Times readers — not just developers. It provides a simple way to query the Times Article Search API and a standard way to consume the results. The options for creating a feed are intentionally limited — there’s no way to create a feed for one term OR another, for instance, only combinations of terms — in order to keep the application simple and approachable.
- Josh Young
Zoie is a real-time search and indexing system built on Apache Lucene. It's a mature open source project and has been deployed at LinkedIn.com handling millions of searches as well as hundreds of thousands of updates daily.
- Josh Young
Four approaches to playing MUDs are identified and described. These approaches may arise from the inter-relationship of two dimensions of playing style: action versus interaction, and world-oriented versus player-oriented. An account of the dynamics of player populations is given in terms of these dimensions, with particular attention to how to promote balance or equilibrium. This analysis also offers an explanation for the labelling of MUDs as being either "social" or "gamelike".
- Josh Young
Today, at the Semantic Technology Conference, Rob Larson and Evan Sandhaus of the New York Times announced together that the Times will soon be publishing its copious index as Linked Data.
- Josh Young
I run DecentURL.com, and many people have found it pretty useful to date, but the problem is that now most of the URLs submitted are spam. I don't like the fact that 95% of the links in my growing database are spam.
- Josh Young
If you are are interested in user interface design for faceted search, then be sure to check out this free book chapter by Moritz Stefaner, Sébastian Ferré, Saverio Perugini, Jonathan Koren, and Yi Zhang.
- Josh Young
Why? Why the macaroni-shaped bowls?! With duck confit mac and cheese, things were looking so bright! And now I'm crestfallen, repelled by the kitsch.
- Josh Young
Why? Why the macaroni-shaped bowls?! With duck confit mac and cheese, things were looking so bright! And now I'm crestfallen, repelled by the kitsch.
- Josh Young
Trust in news media has reached a new low, with record numbers of Americans saying reporting is inaccurate, biased and shaped by special interests, according to a survey by the Pew Research Center.
- Josh Young