Home > Uncategorized > Does Primary Storage De-Dupe Save You Money?

Does Primary Storage De-Dupe Save You Money?

A few months ago I was in a discussion with a customer that I thought was worth sharing. It centered on primary storage de-duplication vs. automated storage tiering. Excluding backup de-duplication, de-dupe has also been a topic attracting a lot of attention as a feature for SAN storage. NetApp focuses heavily on this feature in their storage products (for both NAS and SAN). EMC has been a bit behind in this regard, only having de-dupe/compression for NAS originally and more recently including compression with their SAN block storage offerings. Media outlets such as The Register have published articles indicating that EMC executives have hinted block deduplication is on the roadmap for their storage arrays. Outside of NetApp and EMC, no other major storage vendors have introduced similar functionality into their product line, but several are rumored to be in the works.

What De-Dupe Options Exist Today?

Currently, EMC does single-instance storage and compression on file system (NAS) data, and on block (SAN) data it does compression. The de-dupe and compression can be applied to the entire datastore or individual VM’s.   You cannot de-dupe or compress individual VM’s on a Clariion or block-only VNX using Fibre Channel/iSCSI storage. This is similar in the NetApp world. If you run VMware on NFS, de-dupe can be done per VM. With FC or iSCSI, de-dupe applies to the entire datastore.    In a block environment using VMFS file systems where you have a variety of VM’s that need different performance levels, you would probably want to consider multiple datastores setup on different storage tiers. One might be for higher-performing production VM’s and would not have de-dupe/compression enabled. You would create another Tier2 datastore for non-prod VM’s or VM’s that are often idle. This method of tiering is obviously a manual process that requires administrator involvement to re-balance resources.    Regardless of what any manufacturer will claim, de-dupe is generally not a technology that you want to enable on ALL of your primary storage. Compression and de-duplication is still best suited for infrequently accessed data or a low-performance environment, however many small to medium environments might fall under this classification so de-duping everything could be a possibility.

De-dupe vs. Automated Storage Tiering (AST)

At the end of the day, features are nice but the more important question is whether it truly saves you money and helps you be more efficient. In my discussion with the customer, we were discussing strategies to save space and improve the economies of storage. Naturally, this led us to discuss de-duplication and it also led us to a discussion comparing the savings of de-dupe to the savings from AST. Compellent introduced sub-LUN AST a few years ago and EMC introduced it in their product line in Q3 of 2010. Generally speaking, the typical performance profile of a customer array shows that the vast majority of LUNs are accessed infrequently. When zooming in to look at a busy LUN, again the performance profile usually shows that only a minority of blocks comprising that LUN are actually busy, while the rest of the blocks within the LUN are infrequently accessed. With sub-LUN AST, the hot blocks get moved up to SSD or 15K drives, while the majority of data that is infrequently accessed resides on SATA.

What I’ve found is using AST at the sub-LUN level with large amounts of SATA gives you better savings without the risk of overhead that can reduce performance when you introduce de-dupe or compression.  The caveat being the array must be properly designed so that there is enough SSD or 15K spindles to accommodate the hot blocks.  If the data has higher I/O characteristics so that it’s not a good fit for SATA, then it’s probably not a good fit for de-dupe or compression either because of the performance penalty you’re going to incur, hence there’s no savings either way.

As a theoretical example, let’s say we’re dealing with an environment that has VM’s living on 1TB SATA drives configured in a 5+1 R5 setup.   That yields 4500GB of usable space. A ballpark street price for an upgrade of 6 1TB drives equates to a cost per GB of $1.77/GB.   If there are 2TB’s worth of VM’s sitting out there, and they can be crunched down by 20% to save 400GB, it’s a savings of just $708.  Some vendors might complain that’s too low of an estimate on the savings, so let’s double it to 40%, now you’re saving $1416. On enterprise-class storage arrays that is not a significant savings, it’s more like a rounding error. Are you saving money? Sure. But I would have to wonder why even create the additional overhead for a savings like that.

Let’s look at a use-case for 2TB’s of VM’s residing on more expensive FC storage. I’ll use a cost per GB of $6. With 2TB’s of VM’s stored on that tier, de-duping to save 20% yields a cost savings of $2400. That’s certainly better than $708 but nothing to get overly excited about. Now, if I can move 85% of what’s on the $6/GB storage to $1.77/GB storage by using auto-tiering, my savings are about $9,000.

What’s the bottom line?

In the case of a large environment that has mostly expensive FC disk with data not requiring the performance characteristics of that tier, then perhaps that creates a compelling cost savings to implement de-dupe or compression, especially if you’re not able to implement AST because you’re on an older-generation platform that doesn’t support it. However, those older-generation platforms typically won’t support de-dupe either, because naturally vendors want to put all the flashy new features in the latest generation boxes. There are gateway devices you could put in front of your old storage to give you new features, but there are a variety of pro’s and con’s to consider with that best suited for another discussion . Certainly, de-duplication on primary storage can save you money. However, given all the hype that surrounds this topic, the savings aren’t necessarily what you might think they would be.


Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: