Archive for the ‘Uncategorized’ Category

Another real example of snaps vs. backups

June 22, 2011 Leave a comment

StorageZilla, who normally has a heavy EMC bias, has a post up today that gives a great example of why you need a backup solution in addition to snapshots. No vendor bashing or obsessive promotion here, simply a good example of what can happen if you don’t have a purpose-built backup system.

Check it out here:

For more examples of why most folks continue to look at a purpose-built backup solution, see my original article on the topic here:

Categories: Uncategorized Tags: ,

Does Primary Storage De-Dupe Save You Money?

May 31, 2011 Leave a comment

A few months ago I was in a discussion with a customer that I thought was worth sharing. It centered on primary storage de-duplication vs. automated storage tiering. Excluding backup de-duplication, de-dupe has also been a topic attracting a lot of attention as a feature for SAN storage. NetApp focuses heavily on this feature in their storage products (for both NAS and SAN). EMC has been a bit behind in this regard, only having de-dupe/compression for NAS originally and more recently including compression with their SAN block storage offerings. Media outlets such as The Register have published articles indicating that EMC executives have hinted block deduplication is on the roadmap for their storage arrays. Outside of NetApp and EMC, no other major storage vendors have introduced similar functionality into their product line, but several are rumored to be in the works.

What De-Dupe Options Exist Today?

Currently, EMC does single-instance storage and compression on file system (NAS) data, and on block (SAN) data it does compression. The de-dupe and compression can be applied to the entire datastore or individual VM’s.   You cannot de-dupe or compress individual VM’s on a Clariion or block-only VNX using Fibre Channel/iSCSI storage. This is similar in the NetApp world. If you run VMware on NFS, de-dupe can be done per VM. With FC or iSCSI, de-dupe applies to the entire datastore.    In a block environment using VMFS file systems where you have a variety of VM’s that need different performance levels, you would probably want to consider multiple datastores setup on different storage tiers. One might be for higher-performing production VM’s and would not have de-dupe/compression enabled. You would create another Tier2 datastore for non-prod VM’s or VM’s that are often idle. This method of tiering is obviously a manual process that requires administrator involvement to re-balance resources.    Regardless of what any manufacturer will claim, de-dupe is generally not a technology that you want to enable on ALL of your primary storage. Compression and de-duplication is still best suited for infrequently accessed data or a low-performance environment, however many small to medium environments might fall under this classification so de-duping everything could be a possibility.

De-dupe vs. Automated Storage Tiering (AST)

At the end of the day, features are nice but the more important question is whether it truly saves you money and helps you be more efficient. In my discussion with the customer, we were discussing strategies to save space and improve the economies of storage. Naturally, this led us to discuss de-duplication and it also led us to a discussion comparing the savings of de-dupe to the savings from AST. Compellent introduced sub-LUN AST a few years ago and EMC introduced it in their product line in Q3 of 2010. Generally speaking, the typical performance profile of a customer array shows that the vast majority of LUNs are accessed infrequently. When zooming in to look at a busy LUN, again the performance profile usually shows that only a minority of blocks comprising that LUN are actually busy, while the rest of the blocks within the LUN are infrequently accessed. With sub-LUN AST, the hot blocks get moved up to SSD or 15K drives, while the majority of data that is infrequently accessed resides on SATA.

What I’ve found is using AST at the sub-LUN level with large amounts of SATA gives you better savings without the risk of overhead that can reduce performance when you introduce de-dupe or compression.  The caveat being the array must be properly designed so that there is enough SSD or 15K spindles to accommodate the hot blocks.  If the data has higher I/O characteristics so that it’s not a good fit for SATA, then it’s probably not a good fit for de-dupe or compression either because of the performance penalty you’re going to incur, hence there’s no savings either way.

As a theoretical example, let’s say we’re dealing with an environment that has VM’s living on 1TB SATA drives configured in a 5+1 R5 setup.   That yields 4500GB of usable space. A ballpark street price for an upgrade of 6 1TB drives equates to a cost per GB of $1.77/GB.   If there are 2TB’s worth of VM’s sitting out there, and they can be crunched down by 20% to save 400GB, it’s a savings of just $708.  Some vendors might complain that’s too low of an estimate on the savings, so let’s double it to 40%, now you’re saving $1416. On enterprise-class storage arrays that is not a significant savings, it’s more like a rounding error. Are you saving money? Sure. But I would have to wonder why even create the additional overhead for a savings like that.

Let’s look at a use-case for 2TB’s of VM’s residing on more expensive FC storage. I’ll use a cost per GB of $6. With 2TB’s of VM’s stored on that tier, de-duping to save 20% yields a cost savings of $2400. That’s certainly better than $708 but nothing to get overly excited about. Now, if I can move 85% of what’s on the $6/GB storage to $1.77/GB storage by using auto-tiering, my savings are about $9,000.

What’s the bottom line?

In the case of a large environment that has mostly expensive FC disk with data not requiring the performance characteristics of that tier, then perhaps that creates a compelling cost savings to implement de-dupe or compression, especially if you’re not able to implement AST because you’re on an older-generation platform that doesn’t support it. However, those older-generation platforms typically won’t support de-dupe either, because naturally vendors want to put all the flashy new features in the latest generation boxes. There are gateway devices you could put in front of your old storage to give you new features, but there are a variety of pro’s and con’s to consider with that best suited for another discussion . Certainly, de-duplication on primary storage can save you money. However, given all the hype that surrounds this topic, the savings aren’t necessarily what you might think they would be.


Categories: Uncategorized

Real-world example of snaps vs. backup

April 4, 2011 Leave a comment

Following up from my last post comparing snapshots with replication to backup, one of my peers sent me the following info:

“You may have heard about the issues Google experienced with Gmail a while ago.  Data Loss.  No problem because they replicate all data to a second data center, right?   But sometimes you’re replicating bad data or a software bug, which is what Google seems to be saying here.  But they back up their data. ”

Pretty interesting stuff here, particularly that Gmail is backed up (kudos to Google).    It also raises a very real use-case where snapshots or data on disk doesn’t have to be corrupted, rather the OS of the system storing the snapshots could have a bug that renders the snapshots invalid. 

Typically, storage arrays that are replicating require that both sides be running the same code level.  Usually when doing an upgrade, you upgrade the remote site first, then upgrade the source side.   Running both sides at different code versions for an extended period of time isn’t an option as it causes an issue getting support from the manufacturer, but some code bugs related to the upgrade may not pop up for 30, 60, or even 90 days.

Categories: Backup, Uncategorized