We extracted 14.1 billion HTML tables from Google’s general-purpose web crawl, and used statistical classification techniques to find the estimated 154M that contain high-quality relational data. Because each relational table has its own “schema” of labeled and typed columns, each such table can be considered a small structured database. The resulting corpus of databases is larger than any other corpus we are aware of, by at least five orders of magnitude.
- Pierre
XLWrap is a spreadsheet-to-RDF wrapper which is capable of transforming spreadsheets to arbitrary RDF graphs based on a mapping specification. It supports Microsoft Excel and OpenDocument spreadsheets such as comma- (and tab-) separated value (CSV) files and it can load local files or download remote files via HTTP.
- Pierre
The Tashi project aims to build a software infrastructure for cloud computing on massive internet-scale datasets (what we call Big Data). The idea is to build a cluster management system that enables the Big Data that are stored in a cluster/data center to be accessed, shared, manipulated, and computed on by remote users in a convenient, efficient, and safe manner.
- Pierre
I learned Esperanto when I was in grad school, just to support what I thought then (and think now) was a wonderful idea. It's easy, and it works -- in a few weeks you can have regular conversations with people who don't speak your native language, nor you theirs. To all the objections I've ever heard raised against the idea or use of Esperanto, some spurious and some valid, I have just this one reply: it's easy, and it works. I still don't really understand why it has never made much headway.
- Bill Hooker
a simple standards based object-oriented policy language that allows expression of management policies using condition-action rules. Imperius provides an extensible set of over 100 operations for expressing conditions and actions.
- Pierre
Apache JSPWiki is the apachified version of JSPWiki, a leading open source wiki engine. It is currently ongoing a transformation from an LGPL project to an Apache project within the protective sheets of Apache Incubator.
- Pierre