Tuesday, November 25, 2008

Google Sorts 1PB of Data in Six Hours & Two Minutes

Its been more than 7 weeks since I published something on my blog. I was too busy with implementing Axis2/C AMQP transport so that could hardly find some spare time to write something. Now I am almost done with the transport implementation and hence free to write about the things that are there waiting in the pipeline. Of course there are lot of things to write about.

I thought the tremendous sort experiment done at Goolge would be the best thing to start with. I happened to read about this on the official Google blog. They have done this using MapReduce and this is the first and only sorting experiment of at this scale it seems.

MapReduce is a programming model and an associated implementation for processing and generating large data sets and is used in Google, Facebook, Yahoo!, LinkedIn, etc.

I will be writing a couple of posts on MapReduce in the next couple of days as it is such an interesting and poweful mechanism for handling large data sets.

No comments: