A room to discuss the amazing stack of technologies that power Google, including PageRank, MapReduce, Bigtable, and the Google File System, as well as other similar technologies, such as Hadoop, CouchDB, and so on. See also http://michaelnielsen.org/blog...
"Google’s Holodeck isn’t quite as cool as the Star Trek Holodeck, but give them a few years, I’m sure they’ll figure out how to do that as well." -- MG Siegler
- Ivan Zuzak
via Bookmarklet
"Today we're introducing Google Fusion Tables on Labs, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as we get feedback from users."
- Ivan Zuzak
via Bookmarklet
"Google Translator Toolkit is a new tool being launched today to help translators organize their work and benefit from shared translations, glossaries and translation memories."
- Ivan Zuzak
via Bookmarklet
This is very useful. Not only for translators, but also for the rest of us who will benefit from all of the translation data that Google gathers from this. By having real translators modify machine translations, Google can gather data to improve the machine translation. Smart.
- Justin Doub
This is going to be *big*. A collaborative, social, globally accessible and open translation system has countless applications. I know how long they've been working on this, and am *very* excited!
- Ivan Zuzak
I tried a test with an English document. Translated it to Dutch. It needs some work, but it's a great start. Very impressive.
- Peter van Teeseling
Peter - yeah, the initial translation is done using machine translation which is a statistical approach (the same one used here - http://translate.google.com/transla...). Machine translation will get better as the set of parallel corpora for each language pair gets bigger. And increasing this set is one of the goals of the GTT application.
- Ivan Zuzak
I think the real test for Google Translator is in Translator Boomerang. Some of what it spits out is quite amusing, and very far from the original input...true definition of "lost in translation". http://www.donationcoder.com/Softwar... A conversation with the developer, related to some hilarious results I got:...
more...
- April Russo
Haha, that sure is one (fun) way to measure the quality of translations. And I agree, it's a bit quirky right now, but I think we will see drastic increase in quality as more and more people join in and "crowdsource" the engine.
- Ivan Zuzak
"...we're now ready to show off -- and get feedback on -- the gadgets.realtime set of APIs. These APIs will let Google gadgets hosted in different user's browsers communicate with each other. The first API, gadgets.sharedstate, is available on the new Talk Developer Sandbox. With this API, you can share an object between instances of a gadget, and be notified in realtime when the other instance modifies it. More APIs and UI improvements to allow gadgets.realtime gadgets to be used on orkut and iGoogle are in the works and coming soon."
- Ivan Zuzak
via Bookmarklet
Vijay, you can use the chess app now but both parties need to be running on the Talk sandbox. See the blog post for more info. And, yeah, the same gadgets that run in the Talk sandbox will run in Wave.
- Moishe Lettvin
"With release 1.2.3 of the Python SDK, we are psyched to present an exciting new feature - the Task Queue API. You can now perform offline processing on App Engine by scheduling bundles of work (tasks) for automatic execution in the background. You don't need to worry about managing threads or polling - just write the task processing code, queue up some input data, and App Engine handles the rest. If desired, you can even organize and control task execution by defining custom queues... Last but not least, the 1.2.3 release is full of other new stuff as well! Stay tuned to the blog for more updates or check the release notes for exciting info on: Asynchronous urlfetch support, Django 1.0 support."
- Ivan Zuzak
via Bookmarklet
"If you haven't created a presentation in Google Docs in a while, you should consider trying out these recent enhancements: 1) Multi-shape formatting allows you select multiple shapes and/or text boxes, and format them all at once. 2) Manipulation of text boxes got easier in some more ways, too. Text boxes now grow in size as you type and you can now vertically align the text within multiple boxes using a new Text Vertical Alignment button in the toolbar. 3) When you are giving live presentations, you now can better navigate to specific slides within your presentation."
- Ivan Zuzak
via Bookmarklet
The "Pragmatic Chaos" team (made up of 4 teams - AT&T BellKor, Commendo, Pragmatic Theory, Yahoo! Research Israel) hit 10% improvement mark for Netflix Prize - http://www.research.att.com/~volins...
"And today, a Google team is presenting a paper on landmark recognition (think: Statue of Liberty, Eiffel Tower) at the Computer Vision and Pattern Recognition (CVPR) conference in Miami, Florida. In the paper, we present a new technology that enables computers to quickly and efficiently identify images of more than 50,000 landmarks from all over the world with 80% accuracy."
- Michael Nielsen
I'd like to compile a list of classics - say a top 30 that could be read as a way of getting a good overview of the key ideas. If you have two or three favourite papers that don't obviously already appear here, could you please add them below? What I'm most looking for is the gems - papers that really have a high payoff per unit time spent. Learning of underappreciated gems would be especially helpful!
- Michael Nielsen
Rudi Cilibrasi has some remarkable papers here: http://cilibrar.com/ In particular, his thesis develops a beautiful general purpose method for finding similar objects, based on information theoretic ideas. Roughly speaking, the idea is that the way to compute the similarity of two items, A and B, is to compute (zip(A)+zip(B)-zip(A,B))/max[zip(A), zip(B)], where zip(A) is the length of...
more...
- Michael Nielsen
Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cambridge, MA: MIT Press.
- Daniel Mietchen
My personal classics are 1. Epicurus principle as explained in [Trigg]; 2. Occams razor as explaind in [Domingos]; and 3. Risk reduction as explained in [Netlab]. -- All those principles are much more helpfull to explain what machine learning is about than yet-another few more percent accuracy -- References [Trigg] L. Trigg. ‘Designing Similarity Functions’. Dissertation, University of...
more...
- joergkurtwegner
Thankyou, everyone, it's good stuff! Keep them coming in.
- Michael Nielsen
"At Google, we consider translation a key part of making information universally accessible to everyone around the world. While we think Google Translate, our automatic translation system, is pretty neat, sometimes machine translation could use a human touch. Yesterday, we launched Google Translator Toolkit, a powerful but easy-to-use editor that enables translators to bring that human touch to machine translation."
- Ivan Zuzak
via Bookmarklet
why didn't FF import the youtube video i linked to in the subject? hm..
- Ivan Zuzak
"Project Hosting on Google Code is a web-based platform for open source development, providing mailing lists, an issue tracker, a source code repository, download areas, and so on. This talk will focus on a new version-control component of Project Hosting on Google Code: Mercurial backed by Bigtable. Mercurial/Bigtable is designed to scale over thousands of machines and use Bigtable's replication to run over multiple datacenters. It is built to be able to host hundreds of thousands of open source projects. Come learn about Mercurial's architecture, and how we've extended it to grow to "Google size"."
- Michael Nielsen
"This is a 120 page document describing the design of state of the art, large scale computing facilities, such as those run by the big Internet companies. It discusses everything from facilities issues through the computing hardware through to the software infrastructure. This is an excellent design guide about how everyone should be designing data centers of all sizes, not just huge facilities. Don't be intimidated by its length: it is very easy to read. Just browse the table of contents and pick and choose the sections that interest you. I particularly enjoyed Chapter 5: Energy and Power Efficiency."
- Michael Nielsen
"...with scripts you can: Create your own custom spreadsheet functions. Automate repetitive tasks (e.g. process responses to Google Docs forms). Link multiple Google products together (e.g. send emails or schedule Calendar events from a list of addresses in a Spreadsheet). Customize existing Google products (e.g. add custom buttons or menus to run your own scripts)."
- Ivan Zuzak
via Bookmarklet