Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »

anand › Comments

anand
hop - Project Hosting on Google Code - http://code.google.com/p/hop/
The Hadoop Online Prototype (HOP) is a modified version of Hadoop MapReduce that allows data to be pipelined between tasks and between jobs. This can enable better cluster utilization and increased parallelism, and allows new functionality: online aggregation (approximate answers as a job runs), and stream processing (MapReduce jobs that run continuously, processing new data as it arrives). - anand
anand
Data in Flight | January 2010 | Communications of the ACM - http://cacm.acm.org/magazin...
How streaming SQL technology can help solve the Web 2.0 data crunch. - anand
anand
Serge 's blog » InnoDB : Making Buffer Cache Scan Resistant - http://serge.frezefond.free.fr/...
What more frustrating than full table scan wiping out the buffer pool. You have a nicely tuned busy OLTP server with a warm buffer pool containing the current working set. Then someone submit a report needing to access a table through a full table scan. The normal and current MySQL behavior is to wipe out the content of the cache. if the table is never reused this is pure loss. The server will have to go through a full new warm up phase that can last quite long with a big buffer pool. This issue is now solved with MySQL 5.1.41. - anand
anand
mysqlreport :: Make easy-to-read MySQL status reports - http://hackmysql.com/mysqlre...
mysqlreport makes a friendly report of important MySQL status values. mysqlreport transforms the values from SHOW STATUS into an easy-to-read report that provides an in-depth understanding of how well MySQL is running. mysqlreport is a better alternative (and practically the only alternative) to manually interpreting SHOW STATUS. - anand
anand
Hadoop World: Sqoop - Database Import for Hadoop » Cloudera Hadoop & Big Data Blog - http://www.cloudera.com/blog...
database import to hdfs - anand
anand
A Survey of Collaborative Filtering Techniques - http://www.hindawi.com/journal...
In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memory-based, model-based, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area. - anand
anand
Ivory: A Hadoop toolkit for web-scale information retrieval - http://www.umiacs.umd.edu/~jimmyl...
Ivory is a Hadoop toolkit for web-scale information retrieval research that features a retrieval engine based on Markov Random Fields, appropriately named SMRF (Searching with Markov Random Fields). This open-source project began in Spring 2009 and represents a collaboration between the University of Maryland and Yahoo! Research. Ivory takes full advantage of the Hadoop distributed environment (the MapReduce programming model and the underlying distributed file system) for both indexing and retrieval. The current release of Ivory (release 0.2) - anand
anand
what search can predict - http://www.cam.cornell.edu/~sharad...
using search query logs to predict the future - anand
anand
instead of using CF on user-rating data on nearest neighbor this paper suggest using CF on a neighbor of experts - suggesting this to work well especially during a cold start. - anand
anand
The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations - http://portal.acm.org/citatio...
A heuristic algorithm is proposed for dynamic calculation of the median and other quantiles. The estimates are produced dynamically as the observations are generated. The observations are not stored; therefore, the algorithm has a very small and fixed storage requirement regardless of the number of observations. This makes it ideal for implementing in a quantile chip that can be used in industrial controllers and recorders. The algorithm is further extended to histogram plotting. The accuracy of the algorithm is analyzed. - anand
anand
Microsoft exec: Quitting Google as tough as stopping smoking - http://www.techflash.com/seattle...
interesting notes on search and its future from a panel of Google and Bing execs - anand
anand
What Can Search Predict? - http://www.cam.cornell.edu/~sharad...
predicting phenomena using search query logs - anand
anand
YouTube - Modeling Human Sentence Processing - http://www.youtube.com/watch...
YouTube - Modeling Human Sentence Processing
Play
Modeling human sentence-processing can help us both better understand how the brain processes language, and also help improve user interfaces. For example, our systems could compare different (computer-generated) sentences and produce ones that are easiest to understand. I will talk about my work on evaluating theories about syntactic processing difficulty on a large eye-tracking corpus, and present a model of sentence processing which uses an incremental, fully connected parsing strategy - anand
anand
first digit law - anand
anand
Hal Duame's reading list on NLP on old school NLP - anand
anand
Graph Clustering With Network Structure Indices - http://videolectures.net/icml07_...
We examine two simple algorithms: a new graphical adaptation of the k -medoids algorithm and the Girvan-Newman method based on edge betweenness centrality. Network structure indices (NSIs) are a proven technique for indexing network structure and efficiently finding short paths. We show how incorporating NSIs into these graph clustering algorithms can overcome these complexity limitations. - anand
anand
Web Search Intent Induction via Automatic Query Reformulation - http://www.aclweb.org/antholo...
We present a computationally efficient method for automatic grouping of web search results based on reformulating the original query to al- ternative queries the user may have intended. The method requires no data other than query logs and the standard inverted indices used by most search engines. Our method outperforms standard web search in the task of enabling users to quickly find relevant documents for in- formational queries. - anand
anand
An Industrial-Strength Audio Search Algorithm - http://www.ee.columbia.edu/~dpwe...
algorithm behind shazam - anand
anand
Experimental Data for Question Classification - http://l2r.cs.uiuc.edu/~cogcom...
This data collection contains all the data used in our learning question classification experiments(see [1]), which has question class definitions, the training and testing question sets, examples of preprocessing the questions, feature definition scripts and examples of semantically related word features. This work has been done by Xin Li and Dan Roth and supported by [2]. - anand
anand
Elements of Statistical Learning: data mining, inference, and ... - http://www-stat.stanford.edu/~tibs...
Data Mining, Inference, and Prediction. from Stanford - anand
anand
Fermat's Last Theorem - http://video.google.com/videopl...
A quiet English mathematician, he was drawn into maths by Fermat's puzzle, but at Cambridge in the '70s, FLT was considered a joke, so he set it aside. Then, in 1986, an extraordinary idea linked this irritating problem with one of the most profound ideas of modern mathematics: the Taniyama-Shimura Conjecture, named after a young Japanese mathematician who tragically committed suicide. The link meant that if Taniyama was true then so must be FLT. When he heard, Wiles went after his childhood dream again. "I knew that the course of my life was changing." For seven years, he worked in his attic study at Princeton, telling no one but his family. "My wife has only known me while I was working on Fermat", says Andrew. In June 1993 he reached his goal. At a three-day lecture at Cambridge, he outlined a proof of Taniyama - and with it Fermat's Last Theorem. - anand
anand
the science of juggling - http://www2.bc.edu/~lewbel...
Studying the ability to toss and catch balls and rings provides insight into human coordination, robotics and mathematics - anand
anand
Jamais Cascio: Cascio's Laws of Robotics on Vimeo - http://www.vimeo.com/4724087
Presentation by Jamais Cascio at Bay Area AI MeetUp in Menlo Park, CA on March 22, 2009. See ai-meetup.org for details about these MeetUps. - anand
anand
Empirical Studies in Discourse - http://www.cogsci.ed.ac.uk/~jmoore...
Special issue of Computational Linguistics on empirical discourse analysis - anand
anand
Thinking Through Language* - http://www.wjh.harvard.edu/~lds...
Paul Bloom's article on language and thought - anand
anand
Networks, Crowds, and Markets - http://www.cs.cornell.edu/home...
Reasoning About a Highly Connected World By David Easley and Jon Kleinberg - anand
anand
anand
Of vulcan ears, human ears and ‘earprints’ - http://www.cns.nyu.edu/events...
Our outer ears are critical for localizing sound elevation. Van Opstal and colleagues show that humans adapt to new ear shapes. This process resembles learning a second language because after adaptation people can localize equally well with their own or modified ears. - anand
anand
What is the probability your vote will make a difference? - http://www.stat.columbia.edu/~cook...
The chances that your vote will ever make a difference - anand
anand
YouTube - Authors@Google: Donald Knuth - http://www.youtube.com/watch...
YouTube - Authors@Google: Donald Knuth
Play
donal knuth on science and faith - anand
Other ways to read this feed:Feed readerFacebook