Archive for the ‘De-dupe’ Category

VMware vDP and Avamar – Blown out of Proportion

October 8, 2012 2 comments

The dust has settled a bit since the announcement of vSphere 5.1, including the new VMware Data Protector (vDP) functionality based on EMC Avamar code.   Immediately following the announcement there were:

  • EMC folks reporting this proves Avamar is the greatest thing since sliced bread because VMware chose Avamar Virtual Edition (AVE) as the basis for vDP
  • VMware folks stating vDP only leverages Avamar technology – it is a new product co-developed by VMware and EMC rather than AVE with a new GUI.
  • Critics/Competitors saying they are two completely different products and this announcement doesn’t mean anything or this announcement means the world will be running Hyper-V in 12-18 months as EMC takes over VMware and fails miserably.

What’s my opinion?  Being a middle-of-the-road guy, naturally I think both the far left and right are blowing things out of proportion and VMware employees were generally the most accurate in their assessments.    

We can hold these things to be self-evident:

  • vDP is a virtual appliance.  AVE is a virtual appliance.   One would find it highly unlikely that VMware would completely re-write the virtual appliance used for vDP, but we don’t know for sure.
  • The vDP GUI is a heck of a lot simpler to manage for the average SMB shop than AVE.  EMC needs to learn a lesson here and quickly – not just for SMB customers but also Enterprise customers running full-blown Avamar. 
  • vDR was getting a little bit better, but a scan of the VMware Community Forums quickly showed it was a poor product.  Even the smallest of SMB shops did not like it and usually ended up going the Veeam route after struggling to get vDR working.
  • Avamar does have best-in-class de-duplication algorithms so it’s not hard to accept the argument that VMware evaluated different de-dupe technologies and picked Avamar’s to the be nuts and bolts under vDP.
  • I wouldn’t try to outsmart Joe Tucci.  We might see some pushing of the envelope with regards to the EMC-VMware relationship, but he’s not going to screw this thing up. 


Questions in my mind…

  • AVE was very performance hungry.  In fact, before install it required a benchmark test be run for 24-48 hours that was very disk intensive.  If certain specs were not met, EMC would not support the AVE configuration.    This is why EMC almost always sells Avamar as a HW/SW appliance.   In my mind, the typical vDP user is probably going to use some very low-cost storage as the backup repository.  I wonder how this product is going to perform unless some significant performance enhancements were made to the vDP product relative to AVE. 
  • Even the smallest of SMB’s typically want their backups to be stored off-site, and vDP doesn’t offer any replication capability, nor does it offer any sort of tape-out mechanism.    Is this really a practical solution for anybody nowadays?
  • Is there an upgrade path from vDP to full Avamar?   I’ve seen EMC employees post in their blogs that there is a clear upgrade path if you outgrow vDP, every other post I’ve seen says there is no upgrade path.  I’ve not been able to find any official documentation about the upgrade path.  Which is it, and is there an expensive PS engagement involved? 


All in all, the providers of SMB-oriented VMware backup solutions such as Veeam don’t have much to be worried about yet.    It’s a strange world of “coopetition” that we live in today.   EMC and VMware cooperating on vDP.  VMware partnering with all storage vendors, yet being majority owned by EMC.    EMC partnering closely with Microsoft and beefing up Hyper-V support in all their products.   All storage vendors partnering closely with Oracle, but Oracle getting into the storage business.   Cisco partnering with NetApp on FlexPod and also with VCE on vBlock.  EMC pushing Cisco servers to their clients but also working with Lenovo for some server OEM business.      The list goes on and all indications are this is the new reality we will be living with for some time.  

What would I do if I were Veeam or another provider of SMB backup for virtual machines?  Keep continuing to innovate like crazy, as Veeam has done.  It’s no different than what VMware needs to keep doing to ensure they stay ahead of Microsoft.   Might I suggest for Veeam specifically, amp up the “coopetition” and build DD BOOST support into your product.    DataDomain is the best-in-class target de-dupe appliance with the most market share.  Unfortunately, the way Veeam and DD work together today is kludgey at best.   Although Veeam can write to NFS storage, it does not work well with a NFS connection directly to the DD appliance.   Rather, it is recommended to setup an intermediary Linux server to re-export the NFS export from the DD box.    A combination of Veeam with DD BOOST and something like a DD160 for the average SMB shop would be a home run and crush vDP as a solution any day of the week.    I have heard that Quest vRanger recently built support for DD BOOST into their product and it will be interesting to see if that remains now that Quest was purchased by Dell. 



A look at Block Compression and De-duplication with Veeam and EMC VNX

March 26, 2012 4 comments

Before I proceed any further, I want to state clearly that the testing I performed was not to pit one alternative vs. another.   Rather, I was curious to do some testing to see what type of Block LUN Compression rates I could get for backup data written to a CX4/VNX, including previously de-duped data.   At the same time, I had a need to do some quick testing in the lab comparing Veeam VSS vs. VMware Tools VSS snapshot quiescing.    Since Veeam does de-duplication of data, I ended up just using the backup data that Veeam wrote to disk for my Block LUN Compression tests.

Lab Environment

My lab consists of a VNX5300, a Veeam v6 server, and vSphere 5 running on Cisco UCS.   The VM’s I backed up with Veeam included a mix of app, file, and database VMs.  App/File constituted about 50% of the data and DB was the other 50%.   By no means will I declare this to be a scientific test, but these were fairly typical VM’s that you might find in a small customer environment and I didn’t modify the data sets in any way to try and enhance results.

Veeam VSS Provider Results

For those not aware, most VADP backup products will quiesce the VM by leveraging MS VSS.  Some backup applications provide their own VSS provider (including Veeam), and others like vDR rely on the VMware VSS provider that gets installed along with VMware tools.   With Veeam, it’s possible to configure a job that quiesces the VM with or without their own provider.   My results showed the Veeam VSS provider was much faster than VMware’s native VSS.   On average Veeam created the backup snapshot in 3 seconds with their provider, and 20 seconds without it.   I also ran some continuous ping tests to the VM’s while this process was occurring, and 1/3 of the time I noticed a dropped ping or two when the snapshot was being created with VMware’s VSS provider.   A dropped ping is not necessarily a huge issue in itself, but certainly the longer the quiescing and snapshot process takes, the bigger your window for a “hiccup” to occur, which may be noticed the applications running on that server.

De-dupe and Compression Results

I ran two tests leveraging Veeam and a 200GB Thin LUN on the VNX5300.

Test 1

The settings used for this test were:

  • ·         Veeam De-dupe = ON
  • ·         Veeam In-line compression = ON
  • ·         EMC Block LUN Compression = Off
  Backup Job Size
Backup Job 1 6GB
Backup Job 2 1.2GB
Backup Job 3 12.3GB


The final space usage on the LUN was 42GB.   I then turned on Block LUN Compression and no additional savings were obtained, which was to be expected since the data had already been compressed.

Test 2

The settings used for this test were:

  • ·         Veeam De-dupe = ON
  • ·         Veeam In-line compression = Off
  • ·         EMC Block LUN Compression = ON
  Backup Job Size
Backup Job 1 13.6GB
Backup Job 2 3.4GB
Backup Job 3 51.3GB


The final space usage on the LUN was 135GB.  I then turned on VNX Block LUN Compression and the consumed space was reduced to 60GB – a 2.3:1 compression ratio or a 56% space savings.  Not too shabby for compression.   More details on how EMC’s Block LUN Compression are available at this link:

In short, it looks at 64KB segments of data and tries to compress data within each segment. 

Again, this post isn’t about comparing de-dupe or compression rates between Veeam’s software approach within the backup job, or letting the storage hardware do the work.   There are going to be pros and cons to both methods.   For longer retentions (30 days and beyond), I tend to recommend a Purpose-built Backup Appliance (PBBA) that does variable-length block de-duplication.  Rather, for these tests I was out to confirm:

a)      Does Block LUN Compression work well for backup data (whether it has been de-duped or not)?  The conclusion here was Block LUN Compression worked quite well.  I really didn’t know what to expect, so the results were a pleasant surprise.   In hindsight, it does make sense that the data could still compress fairly well.   Although de-dupe has eliminated redundant patterns of blocks, if the remaining post-dedupe blocks still contain data that is compressable, you should be able to squeeze more out of it. This could come in handy for situations where B2D is leveraged and your backup software doesn’t offer compression, or shorter retentions that don’t warrant a PBBA that does variable-length block de-duplication.   


b)      The latest version of Veeam is quite impressive, they’ve done some nice things to enhance the architecture so it can scale out as larger enterprise backup software does.   The level of de-dupe and compression achieved within the software was impressive as well.   I can certainly understand why a large number of mid-market customers I speak with have little interest in using vDR for VM image backups as Veeam is still light-years ahead.    If you’re looking at these two products and you have highly-transactional systems in your environment such as busy SQL or Exchange boxes, you’ll be better off with Veeam and its enhanced VSS capabilities. 

Categories: Backup, De-dupe, EMC, Veeam, VMware

De-duplication just got cheaper

October 11, 2011 Leave a comment

A week ago I wrote that the high cost of data de-duplication and the lack of downward movement on prices was a potential concern for the market players. I have seen two recent cases (and heard first-hand of more) where customers were seriously considering B2D on 2 or 3TB NL-SAS drives without de-dupe as they were finding it to be a cheaper acquisition.

Coincidentally, EMC responded this week by announcing a refresh of the low-end DD platforms, giving quite a lot more capacity with a much lower price point. The DD160 replaces the 140, 620 replaces 610, and 640 replaces 630. The cost of the existing DD670 was also reduced. I think this is a smart move by EMC and will make them much more competitive at the low-end, where customers would often choose an inferior technology simply because of price point and meeting “good enough” criteria.

A common scenario I found was most small-medium sized companies have at least 10TB of backup data. In the DD product line, this would put them above the previous low-end models and into the higher-end 670, making it out of reach for them financially.

What I’m seeing is the new DD640 with expansion shelf capabilities nicely solves this problem. It can scale to 32TB usable pre-deduped capacity. I just ran a sample config comparing a 670 with a 32TB shelf to a 640 with 30TB shelf and the cost is cheaper by tens of thousands. Kudos to EMC on this one. Now, if they could only do something for Avamar cost of entry at the low-end of the market…

Categories: Backup, De-dupe, EMC

The Cost of Data De-duplication

October 6, 2011 Leave a comment

Backup data-deduplication has been one of the hottest technologies of the last few years.   We’ve seen major players like EMC spend 2 billion to purchase DataDomain, and the industry in general is a 2 billion dollar market annually.   Why all the focus here?  Two reasons I believe:

1) These technologies solve an important business need to back up an ever-increasing volume of data (much of it redundant). 

2) Storage manufacturers have to find a way to maintain their growth rates to satisfy Wall St. and backup de-duplication is still one of the fastest areas of growth.

I was shocked to hear the other day when a very large client reported they were moving away from backup de-duplication and simply going to backup on SATA/NL-SAS 3TB drives.   What was the reasoning behind that decision?   The cost of SATA/NL-SAS drives is coming down faster than the cost of data de-duplication.  

That is certainly an interesting theory, and one that deserves some further consideration.  If there is a common challenge I’ve seen with customers, it’s dealing with the cost associated with next-generation backup, and prices have only come down minimally in the past 2 years.   Backup is still an oft-forgotten step-child within IT infrastructure and it’s hard to explain to corporate management why money is needed to fix backups.   Often, when I’m designing a storage and backup solution for a customer, the storage is no longer the most expensive piece of the solution.   Thanks to storage arrays being built on industry-standard x86 hardware, iSCSI taking away market share from Fibre Channel, and advancements in SAS making it the preferred back-end architecture instead of FC, the storage has become downright cheap and backup is the most costly part of the solution.   This issue affects the cost-conscious low-end of the market more than anywhere else, but nonetheless it can be a challenge across all segments.  

In the case of this particular customer who decided it was more economical to move away from de-dupe, there is certainly more to the story.   Namely, they were backing up a large amount of images and only seeing 3:1 de-dupe ratios.    However, I have recently seen another use case for a customer who only needed to keep backups for 1 week where it was more economical to do straight B2D on fat SATA/NL-SAS solution.  By layering in some software that does compression yielding 2:1 savings, it becomes even more economical.   

From the manufacturer perspective, I’m sure it’s not easy to come up with pricing for de-dupe Purpose Built Backup Appliances (PBBAs).   The box can’t be priced based on the actual amount of SATA/NL-SAS in the box as that would be too cheap based on the amount of data you can truly store on it, but it can’t priced for the full logical capacity as there less incentive to use a de-dupe PBBA vs. straight disk.   Generally speaking, to make a de-dupe PBBA a good value, you need to have a data retention schedule that can yield at least 4:1 or 5:1 de-dupe in my experience.   

Even if you can’t obtain 5:1 or greater de-dupe, there are a few additional things worth considering that may still make a PBBA the right choice instead of straight disk.   First, a PBBA with de-dupe can still offer a lot of benefits for bandwidth-friendly replication to a remote site.    Second, a PBBA with de-dupe can offer a significantly better environmental savings in terms of space, power, and cooling than straight disk.   

Categories: Backup, De-dupe