Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
The Google Technology Stack

The Google Technology Stack

A room to discuss the amazing stack of technologies that power Google, including PageRank, MapReduce, Bigtable, and the Google File System, as well as other similar technologies, such as Hadoop, CouchDB, and so on. See also http://michaelnielsen.org/blog...
Ivan Zuzak
"Google’s Holodeck isn’t quite as cool as the Star Trek Holodeck, but give them a few years, I’m sure they’ll figure out how to do that as well." -- MG Siegler - Ivan Zuzak via Bookmarklet
Ivan Zuzak
Vote for implementing 'delete application' functionallity in AppEngine - http://code.google.com/p...
Vote for implementing 'delete application' functionallity in AppEngine
just star the issue to cast the vote. more stars = more chance that the issue will be resolved earlier. - Ivan Zuzak via Bookmarklet
Ivan Zuzak
"Today we're introducing Google Fusion Tables on Labs, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as we get feedback from users." - Ivan Zuzak via Bookmarklet
it's exciting! - Sets Turan
Ivan Zuzak
"Google Translator Toolkit is a new tool being launched today to help translators organize their work and benefit from shared translations, glossaries and translation memories." - Ivan Zuzak via Bookmarklet
I am in love. <3 - Vedrana Jankovic
This is very useful. Not only for translators, but also for the rest of us who will benefit from all of the translation data that Google gathers from this. By having real translators modify machine translations, Google can gather data to improve the machine translation. Smart. - Justin Doub
This is going to be *big*. A collaborative, social, globally accessible and open translation system has countless applications. I know how long they've been working on this, and am *very* excited! - Ivan Zuzak
I tried a test with an English document. Translated it to Dutch. It needs some work, but it's a great start. Very impressive. - Peter van Teeseling
Peter - yeah, the initial translation is done using machine translation which is a statistical approach (the same one used here - http://translate.google.com/transla...). Machine translation will get better as the set of parallel corpora for each language pair gets bigger. And increasing this set is one of the goals of the GTT application. - Ivan Zuzak
Turkish translation is really poor. :/ - Bahriye Sarıkaya
I think the real test for Google Translator is in Translator Boomerang. Some of what it spits out is quite amusing, and very far from the original input...true definition of "lost in translation". http://www.donationcoder.com/Softwar... A conversation with the developer, related to some hilarious results I got:... more... - April Russo
Haha, that sure is one (fun) way to measure the quality of translations. And I agree, it's a bit quirky right now, but I think we will see drastic increase in quality as more and more people join in and "crowdsource" the engine. - Ivan Zuzak
Ivan Zuzak
"...we're now ready to show off -- and get feedback on -- the gadgets.realtime set of APIs. These APIs will let Google gadgets hosted in different user's browsers communicate with each other. The first API, gadgets.sharedstate, is available on the new Talk Developer Sandbox. With this API, you can share an object between instances of a gadget, and be notified in realtime when the other instance modifies it. More APIs and UI improvements to allow gadgets.realtime gadgets to be used on orkut and iGoogle are in the works and coming soon." - Ivan Zuzak via Bookmarklet
This reminds me, Google gets extra points in my book for using a sample chess gadget in Wave :) - Ahsan Ali
Chess FTW! : ) - vijay
I don't think it's available for me to use yet, am I right? - vijay
hehe; dunno, I'll stick to FICS until Google Wave let's me 'wave' moves ;) - Ahsan Ali
this is not google wave, it's a new api for google gadgets. but since gadgets will be a part of wave, it's all good :) - Ivan Zuzak
Of course, we were digressing ;) - Ahsan Ali
Vijay, you can use the chess app now but both parties need to be running on the Talk sandbox. See the blog post for more info. And, yeah, the same gadgets that run in the Talk sandbox will run in Wave. - Moishe Lettvin
Thanks Lettvin! How about a game then? =P - vijay
Yeah, me too. I wanna play too ! :D - Ahsan Ali
I'm dying to try Google Wave. Gosh, it's *months* away still, isn't it? :-( - Kol Tregaskes
Hold on, Kol ;-) - Stanislas Jourdan
Hehe, trying my best. :-) - Kol Tregaskes
Vijay, my chess skills are weak, I'm sure you'd make short work of me :) I'm happy just giving people another way to play! - Moishe Lettvin
Ivan Zuzak
"With release 1.2.3 of the Python SDK, we are psyched to present an exciting new feature - the Task Queue API. You can now perform offline processing on App Engine by scheduling bundles of work (tasks) for automatic execution in the background. You don't need to worry about managing threads or polling - just write the task processing code, queue up some input data, and App Engine handles the rest. If desired, you can even organize and control task execution by defining custom queues... Last but not least, the 1.2.3 release is full of other new stuff as well! Stay tuned to the blog for more updates or check the release notes for exciting info on: Asynchronous urlfetch support, Django 1.0 support." - Ivan Zuzak via Bookmarklet
Ivan Zuzak
Google Docs: Creating and giving presentations has gotten easier - http://googledocs.blogspot.com/2009...
Google Docs: Creating and giving presentations has gotten easier
Google Docs: Creating and giving presentations has gotten easier
"If you haven't created a presentation in Google Docs in a while, you should consider trying out these recent enhancements: 1) Multi-shape formatting allows you select multiple shapes and/or text boxes, and format them all at once. 2) Manipulation of text boxes got easier in some more ways, too. Text boxes now grow in size as you type and you can now vertically align the text within multiple boxes using a new Text Vertical Alignment button in the toolbar. 3) When you are giving live presentations, you now can better navigate to specific slides within your presentation." - Ivan Zuzak via Bookmarklet
Christof TD
The "Pragmatic Chaos" team (made up of 4 teams - AT&T BellKor, Commendo, Pragmatic Theory, Yahoo! Research Israel) hit 10% improvement mark for Netflix Prize - http://www.research.att.com/~volins...
Christof TD
Joe Armstrong's Dissertation Thesis (Erlang) - "Making reliable distributed systems in the presence of software errors" - http://www.sics.se/~joe...
Michael Nielsen
Official Google Blog: A new landmark in computer vision - http://googleblog.blogspot.com/2009... (via http://friendfeed.com/michael...)
"And today, a Google team is presenting a paper on landmark recognition (think: Statue of Liberty, Eiffel Tower) at the Computer Vision and Pattern Recognition (CVPR) conference in Miami, Florida. In the paper, we present a new technology that enables computers to quickly and efficiently identify images of more than 50,000 landmarks from all over the world with 80% accuracy." - Michael Nielsen
Michael Nielsen
Forthcoming book edited by Toby Segaran and Jeff Hammerbacher. - Michael Nielsen
wow... want that book. - Christof TD
Michael Nielsen
What are your favourite papers about machine learning / data mining?
I'd like to compile a list of classics - say a top 30 that could be read as a way of getting a good overview of the key ideas. If you have two or three favourite papers that don't obviously already appear here, could you please add them below? What I'm most looking for is the gems - papers that really have a high payoff per unit time spent. Learning of underappreciated gems would be especially helpful! - Michael Nielsen
A few to start: Page and Brin's classic paper on PageRank; Jon Kleinberg's paper on ranking webpages (http://www.eecs.harvard.edu/~michae... ); the IBM group's early classic on statistical machine translation (http://portal.acm.org/citatio... ) - Michael Nielsen
Some related classes, with good reading lists: Michael Mitzenmacher's course (http://www.eecs.harvard.edu/~michae... ); Jon Kleinberg's course (http://www.cs.cornell.edu/courses... ); Andrew Ng's class (http://www.stanford.edu/class... ) - Michael Nielsen
Rudi Cilibrasi has some remarkable papers here: http://cilibrar.com/ In particular, his thesis develops a beautiful general purpose method for finding similar objects, based on information theoretic ideas. Roughly speaking, the idea is that the way to compute the similarity of two items, A and B, is to compute (zip(A)+zip(B)-zip(A,B))/max[zip(A), zip(B)], where zip(A) is the length of... more... - Michael Nielsen
More on the clustering side, but some really good papers from a clustering course I took a few years back: http://www.cs.uwaterloo.ca/~shai... - Ilya Grigorik
Machine learning? Sutton & Barto 1990: http://www.cs.ualberta.ca/%7Esutt... Temporal difference learning - Björn Brembs
Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cambridge, MA: MIT Press. - Daniel Mietchen
My personal classics are 1. Epicurus principle as explained in [Trigg]; 2. Occams razor as explaind in [Domingos]; and 3. Risk reduction as explained in [Netlab]. -- All those principles are much more helpfull to explain what machine learning is about than yet-another few more percent accuracy -- References [Trigg] L. Trigg. ‘Designing Similarity Functions’. Dissertation, University of... more... - joergkurtwegner
Thankyou, everyone, it's good stuff! Keep them coming in. - Michael Nielsen
Marc Chung
Mine a wide range of graphs with Pregel. Implementing PageRank, for example, takes only about 15 lines of code. http://googleresearch.blogspot.com/2009...
Ilya Grigorik
Ivan Zuzak
"At Google, we consider translation a key part of making information universally accessible to everyone around the world. While we think Google Translate, our automatic translation system, is pretty neat, sometimes machine translation could use a human touch. Yesterday, we launched Google Translator Toolkit, a powerful but easy-to-use editor that enables translators to bring that human touch to machine translation." - Ivan Zuzak via Bookmarklet
why didn't FF import the youtube video i linked to in the subject? hm.. - Ivan Zuzak
Ilya Grigorik
Wukong: Ruby + Hadoop Streaming library - http://github.com/mrflip...
Ilya Grigorik
Most Hadoop Jobs Are In California - http://radar.oreilly.com/2009...
Ilya Grigorik
Michael Nielsen
Michael Nielsen
"Project Hosting on Google Code is a web-based platform for open source development, providing mailing lists, an issue tracker, a source code repository, download areas, and so on. This talk will focus on a new version-control component of Project Hosting on Google Code: Mercurial backed by Bigtable. Mercurial/Bigtable is designed to scale over thousands of machines and use Bigtable's replication to run over multiple datacenters. It is built to be able to host hundreds of thousands of open source projects. Come learn about Mercurial's architecture, and how we've extended it to grow to "Google size"." - Michael Nielsen
Michael Nielsen
A number-theoretic approach to consistent hashing - http://michaelnielsen.org/blog... (via http://friendfeed.com/michael...)
Michael Nielsen
"This is a 120 page document describing the design of state of the art, large scale computing facilities, such as those run by the big Internet companies. It discusses everything from facilities issues through the computing hardware through to the software infrastructure. This is an excellent design guide about how everyone should be designing data centers of all sizes, not just huge facilities. Don't be intimidated by its length: it is very easy to read. Just browse the table of contents and pick and choose the sections that interest you. I particularly enjoyed Chapter 5: Energy and Power Efficiency." - Michael Nielsen
Michael Nielsen
MIT Database Systems (6.830) TA Course Notes - http://blog.marcua.net/post... (via http://friendfeed.com/michael...)
Ilya Grigorik
The Smart Grid and Big Data: Hadoop at the Tennessee Valley Authority (TVA) - http://www.cloudera.com/blog...
Ilya Grigorik
Easy Map-Reduce With Hadoop Streaming - http://www.igvita.com/2009...
Ivan Zuzak
Google Apps Script: Expanding the Google Office With Your Own JavaScript - http://blogoscoped.com/archive...
Google Apps Script: Expanding the Google Office With Your Own JavaScript
Show all
"...with scripts you can: Create your own custom spreadsheet functions. Automate repetitive tasks (e.g. process responses to Google Docs forms). Link multiple Google products together (e.g. send emails or schedule Calendar events from a list of addresses in a Spreadsheet). Customize existing Google products (e.g. add custom buttons or menus to run your own scripts)." - Ivan Zuzak via Bookmarklet
Michael Nielsen
Michael Nielsen
Michael Mitzenmacher: Algorithms at the End of the Wire - http://www.eecs.harvard.edu/%7Emich... (via http://friendfeed.com/michael...)
Michael Mitzenmacher's graduate course at Harvard. Links to lots of web-relevant stuff there, including PageRank and MapReduce. - Michael Nielsen
Michael Nielsen
Michael Nielsen
Books, papers, sites, and software for learning about Web search and related areas - Quinn Slack - http://qslack.com/post... (via http://friendfeed.com/michael...)
Other ways to read this feed:Feed readerFacebook