Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
New Tech Books
Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs. The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and... more... - New Tech Books
İnanç Gümüş
"Hazelcast and GridGain are the best choice for an easily-parallelized, low-data, CPU-intensive tasks. Moreover, they are even better choice, when some unexpected node failures can happen." - İnanç Gümüş
İnanç Gümüş
"That covers all MapReduce apps I recall hearing about via commercial companies and users, and also includes most of what’s in the two big sources I found online." - İnanç Gümüş
turkiye'de var mi bildiginiz mapreduce algoritmalari kullanan (hadoop veya baska bir platform ile)? - İnanç Gümüş
For an updated list, check here: http://wiki.apache.org/hadoop... - İnanç Gümüş
biz test cluster ı kurduk, biraz test ettik, hbase ile daha çok ilgilendik. Ancak production a geçirmedik şimdilik. - denizoktar
@denizoktar ben de 200 - 300 kadar web sitesinin populer iceriklerinin analiz edilip istatistiklerinin toplanmasinda kullanmak istiyorum - İnanç Gümüş
production level kullanan var mi? sikintilarini da merak ediyorum, avantajlari bariz - İnanç Gümüş
@winterismute cikan sikintilardan bahsetmek pek trade secret olmasa gerek :) - İnanç Gümüş
anlamasi zor olmamisti daha ilk basta da. cunku bazi sorunlari bu yontem ile cozmeyi dusunmeye baslamistim kendi kendime. bundan birkac sene sonra da mapreduce'u duymustum... bahsettigin gibi hicbir sikintisi olmadigina inanamiyorum (production'da). - İnanç Gümüş
@Gökhan acaba #google arama sonuclarini getirirken canli olarak #mapreduce yapiyor mudur diye dusunuyorum. mapreduce'un dedigin gibi batch processing icin daha uygun oldugunu dusunuyorum. #cassandra bayaa sirket tarafindan kullaniliyormus, ilgi cekici. onu da arastirmak lazim. twitter su an kullaniyormus cassandra'yi bu arada... digerleri: Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala... - İnanç Gümüş
#mapreduce'u gericilik olarak gorenler de var bu arada. bkz: http://bit.ly/awhq5f - İnanç Gümüş
evet, dediklerine katiliyorum. yerine gore fonksiyonel programlama yapmanin da bir sakincasi yok. share-nothing mimarisi kurabilmek icin verimlilik anlaminda boyle olmasi gerekli - İnanç Gümüş
ozetlersek: ham data >> hbase >> mapreduce >> cassandra << realtime analyze. cassandra'ya verileri atarken de verinin ne sekilde istenecegini dusunup cassandra ustunde buna gore bir yapi olusturmak mantikli olacaktir tahminen. cassandra'yi biraz daha arastirmam lazim - İnanç Gümüş
@gokhan listeme koydum, firsat bulunca okuyacagim, tesekkurler:) bu arada bigtable makalesinden biraz once arastirmaya baslamistim ben de no-sql db'leri. dedigim gibi ogrenmesi fazla sikinti degil ama pratik olarak denenip deneyim edildikten sonra asil bahsedilmeyen yonlerinin fayda/zararlarinin ortaya cikacagini dusunuyorum - İnanç Gümüş
Amund Tveit
Mapreduce & Hadoop Algorithms in Academic Papers (3rd update) - http://atbrox.com/2010...
Mike Chelen
Mike Chelen
Amund Tveit
Case Study: Using Hadoop to help people write - http://aws.amazon.com/solutio...
Mike Chelen
Mike Chelen
Fwd: Analyzing Human Genomes with Hadoop » Cloudera Hadoop & Big Data Blog - http://www.cloudera.com/blog... (via http://friendfeed.com/the-lif...)
Deepak Singh
Clojure+Hadoop Slides - Digital Digressions by Stuart Sierra - http://stuartsierra.com/2009...
Stuart Sierra's clojure + Hadoop slides - Deepak Singh from Bookmarklet
Hadi Kalantari
I am beginner in hadoop, what linux is better for clients on hadoop?
Mike Chelen
"Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster." - Mike Chelen from Bookmarklet
jeff hammerbacher
Hadoop World NYC is Oct 2: http://www.cloudera.com/hadoop-.... Learn how enterprises like JP Morgan, Visa, eBay, IBM, Booz Allen, and more are using Hadoop.
jeff hammerbacher
Mike Chelen
Hadoop computes the 10^15+1st bit of π (Hadoop and Distributed Computing at Yahoo!) - http://developer.yahoo.net/blogs...
"I used Yahoo's Hadoop clusters to compute the 1,000,000,000,000,001st bit of π. The 7 hexadecimal digits of π starting at the 10^15+1 bit are: 6216B06" - Mike Chelen from Bookmarklet
Mike Chelen
"The Hadoop project is extremely important to us here at Yahoo!. We run the world's largest Hadoop clusters, work with academic institutions and other large corporations on advanced cloud computing research and our engineers are active participants in the Hadoop community." - Mike Chelen from Bookmarklet
jeff hammerbacher
Fwd: How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook - http://www.slideshare.net/awadall... (via http://friendfeed.com/amund...)
New Tech Books
Hadoop - The Definitive Guide #hadoop http://www.amazon.com/dp...
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problem. - New Tech Books
Deepak Singh
Notes from the 2009 Hadoop Summit West « Scale or die - http://scaleordie.com/2009...
Sidharth Shah
Mochi is a visual, log-analysis based debugging tool correlates Hadoop's behavior in space, time and volume, and extracts a causal, unified control- and data-flow model of Hadoop across the nodes of a cluster. - Sidharth Shah
Yingfeng Zhang
Has any one compared performance between MPI and mapreduce(such as hadoop) on scientific computing such as clustering?
IMO best used for different kinds of parallelism. Hadoop is very data parallel and MPI works better (in my limited experience) for task parallel jobs. Also MPI scales terribly. I'd like to see it being replaced by better frameworks for task parallelism - Deepak Singh
Sidharth Shah
Hadoop should target C++/LLVM, not Java (because of watts). http://www.trendcaller.com/2009...
Interesting question, but again no data to support it. - Sidharth Shah
Mike Chelen
HadoopStreaming - Hadoop Wiki - http://wiki.apache.org/hadoop...
utility which allows users to create and run jobs with any executables (e.g. shell utilities) as the mapper and/or the reducer - Mike Chelen from Bookmarklet
Sidharth Shah
Improving MapReduce Performance in Heterogeneous Environments - http://www.usenix.org/events...
Mike Chelen
Sidharth Shah
VideoMosaic is a set of Java classes capable of generating a series of photomosaics from frames of a video. The project was implemented under the context of my final project for Distributed Systems at the University of Washington. The Java implementation is built for rendering on Hadoop cluster. - Sidharth Shah
Sidharth Shah
Graceful shutdown, Hadoop, and black magic - http://blog.rapleaf.com/dev...
Sidharth Shah
Redpoll is a distributed machine learning library written in java. It works by the power of hadoop, which is an open source implementation of google's MapReduce computing Model. We intent to parallelize some traditional classification, clustering algorithms like Naive Bayes, K-Means, EM so that can deal with large-scale data sets. It's Apache 2.0 licensed - Sidharth Shah
Elias Torres
[#HADOOP-4012] Providing splitting support for bzip2 compressed files - ASF JIRA - https://issues.apache.org/jira...
Will Cloudera do this? ;-) - Elias Torres from Bookmarklet
Hmm, we should discuss... - jeff hammerbacher
Sidharth Shah
This isn't exactly about Hadoop, but nice interview with creators of MapReduce. - http://research.google.com/roundta...
Other ways to read this feed:Feed readerFacebook