Danushka's Tech Thoughts: Data deduplication in a nutshell

Monday, June 7, 2010

Data deduplication in a nutshell

Data deduplication is a special form of data compression where redundant data is eliminated improving storage utilization. In a nutshell, in the deduplication process duplicate data is eliminated leaving only one copy of the data. However indexing of all data is kept for restoring purposes. Obviously data deduplication helps reduce storage capacity as only unique data is saved in the disk.

Lets have a look at an example.

A considerably large image of size 3MB taken during your summer holiday might contain 100 instances of the same block of pixels. Lets think the size of each block is 10k (to be realistic lets imagine its an images of a lake with a scenic pasture in the background … fair enough ;-)). With data deduplication the size of your image comes down to something like 2M because the total size of redundant blocks adds up to 1M (i.e. 10k x 100). Just imagine the storage utilization you would gain in an enterprise storage implementation!. Its quite a lot.

Also, data is money at the end of the day. If you look at most of the storage services that are there at the moment you will see that they are comparatively expensive. Therefore data deduplication will come in handy when it comes to storing large dumps of data.

Monday, June 7, 2010

Data deduplication in a nutshell

No comments: