"Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster."
- Mike Chelen
from Bookmarklet
Hadoop World NYC is Oct 2: http://www.cloudera.com/hadoop-.... Learn how enterprises like JP Morgan, Visa, eBay, IBM, Booz Allen, and more are using Hadoop.
"I used Yahoo's Hadoop clusters to compute the 1,000,000,000,000,001st bit of π. The 7 hexadecimal digits of π starting at the 10^15+1 bit are: 6216B06"
- Mike Chelen
from Bookmarklet
"The Hadoop project is extremely important to us here at Yahoo!. We run the world's largest Hadoop clusters, work with academic institutions and other large corporations on advanced cloud computing research and our engineers are active participants in the Hadoop community."
- Mike Chelen
from Bookmarklet
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problem.
- New Tech Books
Handed out at Hadoop Summit. Thorough and articulate. A worthwhile read.
- Alexis Lê-Quôc
Mochi is a visual, log-analysis based debugging tool correlates Hadoop's behavior in space, time and volume, and extracts a causal, unified control- and data-flow model of Hadoop across the nodes of a cluster.
- Sidharth Shah
IMO best used for different kinds of parallelism. Hadoop is very data parallel and MPI works better (in my limited experience) for task parallel jobs. Also MPI scales terribly. I'd like to see it being replaced by better frameworks for task parallelism
- Deepak Singh
utility which allows users to create and run jobs with any executables (e.g. shell utilities) as the mapper and/or the reducer
- Mike Chelen
from Bookmarklet
VideoMosaic is a set of Java classes capable of generating a series of photomosaics from frames of a video. The project was implemented under the context of my final project for Distributed Systems at the University of Washington. The Java implementation is built for rendering on Hadoop cluster.
- Sidharth Shah
Redpoll is a distributed machine learning library written in java. It works by the power of hadoop, which is an open source implementation of google's MapReduce computing Model. We intent to parallelize some traditional classification, clustering algorithms like Naive Bayes, K-Means, EM so that can deal with large-scale data sets. It's Apache 2.0 licensed
- Sidharth Shah
can anyone point out some applications of Hadoop framework (Mapreduce programming model + HDFS storage) in computational finance? what kind of algorithms in computational finance is suitable MapReduce programming model (I know MPI does most of the existing work)?
- platformgeek
platformgeek: what problems in computational finance are you trying to solve?
- Amund Tveit
platformgeek: drop me a line (jeff.hammerbacher@gmail.com) and i'll let you know a few. i'm curious to know what is sparking your interest in using hadoop for computational finance.
- jeff hammerbacher