Archive
VMFS-5 Heap Size
Nick and I troubleshooted this issue for a client back in December. He beat me to the punch on blogging about it 🙂 This was the first time I had heard of heap size, and I hope it’s the last!
VMware vDP and Avamar – Blown out of Proportion
The dust has settled a bit since the announcement of vSphere 5.1, including the new VMware Data Protector (vDP) functionality based on EMC Avamar code.  Immediately following the announcement there were:
- EMC folks reporting this proves Avamar is the greatest thing since sliced bread because VMware chose Avamar Virtual Edition (AVE) as the basis for vDP
- VMware folks stating vDP only leverages Avamar technology – it is a new product co-developed by VMware and EMC rather than AVE with a new GUI.
- Critics/Competitors saying they are two completely different products and this announcement doesn’t mean anything or this announcement means the world will be running Hyper-V in 12-18 months as EMC takes over VMware and fails miserably.
What’s my opinion? Being a middle-of-the-road guy, naturally I think both the far left and right are blowing things out of proportion and VMware employees were generally the most accurate in their assessments.   Â
We can hold these things to be self-evident:
- vDP is a virtual appliance. AVE is a virtual appliance.  One would find it highly unlikely that VMware would completely re-write the virtual appliance used for vDP, but we don’t know for sure.
- The vDP GUI is a heck of a lot simpler to manage for the average SMB shop than AVE. EMC needs to learn a lesson here and quickly – not just for SMB customers but also Enterprise customers running full-blown Avamar.Â
- vDR was getting a little bit better, but a scan of the VMware Community Forums quickly showed it was a poor product. Even the smallest of SMB shops did not like it and usually ended up going the Veeam route after struggling to get vDR working.
- Avamar does have best-in-class de-duplication algorithms so it’s not hard to accept the argument that VMware evaluated different de-dupe technologies and picked Avamar’s to the be nuts and bolts under vDP.
- I wouldn’t try to outsmart Joe Tucci. We might see some pushing of the envelope with regards to the EMC-VMware relationship, but he’s not going to screw this thing up.Â
Â
Questions in my mind…
- AVE was very performance hungry. In fact, before install it required a benchmark test be run for 24-48 hours that was very disk intensive. If certain specs were not met, EMC would not support the AVE configuration.   This is why EMC almost always sells Avamar as a HW/SW appliance.  In my mind, the typical vDP user is probably going to use some very low-cost storage as the backup repository. I wonder how this product is going to perform unless some significant performance enhancements were made to the vDP product relative to AVE.Â
- Even the smallest of SMB’s typically want their backups to be stored off-site, and vDP doesn’t offer any replication capability, nor does it offer any sort of tape-out mechanism.   Is this really a practical solution for anybody nowadays?
- Is there an upgrade path from vDP to full Avamar?  I’ve seen EMC employees post in their blogs that there is a clear upgrade path if you outgrow vDP, every other post I’ve seen says there is no upgrade path. I’ve not been able to find any official documentation about the upgrade path. Which is it, and is there an expensive PS engagement involved?Â
Â
All in all, the providers of SMB-oriented VMware backup solutions such as Veeam don’t have much to be worried about yet.   It’s a strange world of “coopetition” that we live in today.  EMC and VMware cooperating on vDP.  VMware partnering with all storage vendors, yet being majority owned by EMC.   EMC partnering closely with Microsoft and beefing up Hyper-V support in all their products.  All storage vendors partnering closely with Oracle, but Oracle getting into the storage business.  Cisco partnering with NetApp on FlexPod and also with VCE on vBlock. EMC pushing Cisco servers to their clients but also working with Lenovo for some server OEM business.     The list goes on and all indications are this is the new reality we will be living with for some time. Â
What would I do if I were Veeam or another provider of SMB backup for virtual machines? Keep continuing to innovate like crazy, as Veeam has done. It’s no different than what VMware needs to keep doing to ensure they stay ahead of Microsoft.  Might I suggest for Veeam specifically, amp up the “coopetition” and build DD BOOST support into your product.   DataDomain is the best-in-class target de-dupe appliance with the most market share. Unfortunately, the way Veeam and DD work together today is kludgey at best.  Although Veeam can write to NFS storage, it does not work well with a NFS connection directly to the DD appliance.  Rather, it is recommended to setup an intermediary Linux server to re-export the NFS export from the DD box.   A combination of Veeam with DD BOOST and something like a DD160 for the average SMB shop would be a home run and crush vDP as a solution any day of the week.   I have heard that Quest vRanger recently built support for DD BOOST into their product and it will be interesting to see if that remains now that Quest was purchased by Dell.Â
Â
Â
Strategies for SRM with a VNXe
Give credit where credit is due, EMC does a lot of things well.  VMware Site Recovery Manager (SRM) support for the VNXe is definitely not one of those.  EMC has done such a great job turning the ship around when it comes to VMware integration with their products thanks to guys like Chad Sakac (@sakacc), that it is beyond mind-boggling to me as to why it is taking such a long time to get this straightened out on the VNXe. Â
Originally, it was stated that VNXe would support SRM when SRM 5.0 came out (Q3 2011), at least with NFS and iSCSI would be later down the road. Then, the date slipped to Q4 2011, and again to Q1 2012, and again to Q3 2012, and I just saw an update on the EMC community forums where it’s now stated as Q4 2012 (https://community.emc.com/thread/127434).  Let me be clear to EMC and their engineering group, this is not acceptable.   Customers who have bought this product with the intent to move to fully replicated vSphere environments have a right to be pissed.  Partners who are responsible for designing best-in-class high-availability solutions for their SMB customers have a right to be pissed.  We don’t have unreasonable expectations or unrealistic high demands.  EMC just screwed this one up badly.
What I find most incomprehensible of all is the fact that the VNXe software is largely based on the underpinnings of the previous Celerra (NAS) product.  Celerra had SRM support for both NFS and iSCSI previously!  For Pete’s sake, how hard can it be to modify this?!?!   In a recent explanation, it was stated that the API’s were changing between SRM 4.x and 5.x.  Well, somehow every other major storage array from EMC and other manufacturers didn’t seem to have a hiccup from this in their support of SRM.  Obviously, EMC is going to focus on the high-dollar VMAX and VNX platforms first, but no excuse to let your SMB product lag this far behind.
OK, now that the rant is out of the way, what options do you have to achieve a fully replicated solution for your vSphere environment?   It really boils down to two market-proven options, though you may come across some other fringe players:
Â
1)Ÿ SRM w/ vSphere Replication
–     Seamless Disaster Recovery failover and testing
–     Tightly integrated into vSphere and vCenter
–     Easy per-VM replication management within vCenter
–     Storage agnostic – no vendor lock-in with array replication
Âź2) Veeam
–     Leverages backup snapshot functionality to also replicate to a remote Veeam server
–     Storage agnostic
–     Offers ability to do a file-level restore from remote replicas
–     Included as part of Veeam Backup and Replication product.
Â
Here’s a table I put together showing a comparison between the two options:
 | Veeam Replication | SRM w/ vSphere Replication |
vSphere version required | 4.0 and higher | 5.0 (HW Version 7 or higher required on VMs) |
Replication Methodology | VM Snapshots | vSCSI block tracking |
Realistic best-case RPO | 15 min | 15 min |
Includes Backup | Yes | No |
Licensing | Per socket | Per VM |
VSS quiescing | Yes (custom VSS driver) | Yes (VM Tools VSS) |
Replicate powered-off VMs | Yes | No |
File Level Restore from Replica | Yes | No |
Orchestrated Failover based on defined DR plan | No | Yes |
Easy non-disruptive DR testing capabilities | No | Yes |
Multiple Restore Points from Replica | Yes | No |
Re-IP VM during failover | Yes | Yes |
Â
So, how do you choose between the two?  Well, that’s where the proverbial “it depends” answer comes in.  When I’m speaking with SMB market customers, I’ll ask questions about their backup to get a sense as to whether or not they could benefit from Veeam.  If so, then it’s certainly advantageous to try and knock-out backup and replication with one product.  However, that’s not to say that there can’t be advantages to running Veeam for backup but using SRM with vSphere Replication as well, if you truly need that extra level of automation that SRM offers.
UPDATE 10/2/2012
I recently got notified about an update to the original post on the EMC community forums: https://community.emc.com/thread/127434.  An EMC representative has just confirmed that the target GA date is now Q1 2013….which marks another slip.
Also, with the announcement of vSphere 5.1 came a few improvements to vSphere Replication with SRM.  Most notably, SRM now supports auto-failback with vSphere Replication, which previously was a function only supported with array-based replication.
A look at Block Compression and De-duplication with Veeam and EMC VNX
Before I proceed any further, I want to state clearly that the testing I performed was not to pit one alternative vs. another.  Rather, I was curious to do some testing to see what type of Block LUN Compression rates I could get for backup data written to a CX4/VNX, including previously de-duped data.  At the same time, I had a need to do some quick testing in the lab comparing Veeam VSS vs. VMware Tools VSS snapshot quiescing.   Since Veeam does de-duplication of data, I ended up just using the backup data that Veeam wrote to disk for my Block LUN Compression tests.
Lab Environment
My lab consists of a VNX5300, a Veeam v6 server, and vSphere 5 running on Cisco UCS.  The VM’s I backed up with Veeam included a mix of app, file, and database VMs. App/File constituted about 50% of the data and DB was the other 50%.  By no means will I declare this to be a scientific test, but these were fairly typical VM’s that you might find in a small customer environment and I didn’t modify the data sets in any way to try and enhance results.
Veeam VSS Provider Results
For those not aware, most VADP backup products will quiesce the VM by leveraging MS VSS. Some backup applications provide their own VSS provider (including Veeam), and others like vDR rely on the VMware VSS provider that gets installed along with VMware tools.  With Veeam, it’s possible to configure a job that quiesces the VM with or without their own provider.  My results showed the Veeam VSS provider was much faster than VMware’s native VSS.  On average Veeam created the backup snapshot in 3 seconds with their provider, and 20 seconds without it.  I also ran some continuous ping tests to the VM’s while this process was occurring, and 1/3 of the time I noticed a dropped ping or two when the snapshot was being created with VMware’s VSS provider.  A dropped ping is not necessarily a huge issue in itself, but certainly the longer the quiescing and snapshot process takes, the bigger your window for a “hiccup” to occur, which may be noticed the applications running on that server.
De-dupe and Compression Results
I ran two tests leveraging Veeam and a 200GB Thin LUN on the VNX5300.
Test 1
The settings used for this test were:
- ·        Veeam De-dupe = ON
- ·        Veeam In-line compression = ON
- ·        EMC Block LUN Compression = Off
 | Backup Job Size |
Backup Job 1 | 6GB |
Backup Job 2 | 1.2GB |
Backup Job 3 | 12.3GB |
Â
The final space usage on the LUN was 42GB.  I then turned on Block LUN Compression and no additional savings were obtained, which was to be expected since the data had already been compressed.
Test 2
The settings used for this test were:
- ·        Veeam De-dupe = ON
- ·        Veeam In-line compression = Off
- ·        EMC Block LUN Compression = ON
 | Backup Job Size |
Backup Job 1 | 13.6GB |
Backup Job 2 | 3.4GB |
Backup Job 3 | 51.3GB |
Â
The final space usage on the LUN was 135GB. I then turned on VNX Block LUN Compression and the consumed space was reduced to 60GB – a 2.3:1 compression ratio or a 56% space savings.  Not too shabby for compression.   More details on how EMC’s Block LUN Compression are available at this link: http://www.emc.com/collateral/hardware/white-papers/h8045-data-compression-wp.pdf
In short, it looks at 64KB segments of data and tries to compress data within each segment.Â
Again, this post isn’t about comparing de-dupe or compression rates between Veeam’s software approach within the backup job, or letting the storage hardware do the work.  There are going to be pros and cons to both methods.  For longer retentions (30 days and beyond), I tend to recommend a Purpose-built Backup Appliance (PBBA) that does variable-length block de-duplication. Rather, for these tests I was out to confirm:
a)     Does Block LUN Compression work well for backup data (whether it has been de-duped or not)? The conclusion here was Block LUN Compression worked quite well. I really didn’t know what to expect, so the results were a pleasant surprise.  In hindsight, it does make sense that the data could still compress fairly well.  Although de-dupe has eliminated redundant patterns of blocks, if the remaining post-dedupe blocks still contain data that is compressable, you should be able to squeeze more out of it. This could come in handy for situations where B2D is leveraged and your backup software doesn’t offer compression, or shorter retentions that don’t warrant a PBBA that does variable-length block de-duplication. Â
Â
b)     The latest version of Veeam is quite impressive, they’ve done some nice things to enhance the architecture so it can scale out as larger enterprise backup software does.  The level of de-dupe and compression achieved within the software was impressive as well.  I can certainly understand why a large number of mid-market customers I speak with have little interest in using vDR for VM image backups as Veeam is still light-years ahead.   If you’re looking at these two products and you have highly-transactional systems in your environment such as busy SQL or Exchange boxes, you’ll be better off with Veeam and its enhanced VSS capabilities.Â
A new crop of storage start-ups has arrived
About two years ago I was working at EMC and the company had just completed the acquisition of DataDomain, which was one the last “hot” storage-related start-ups around.    There were certainly other storage start-up companies around, but nobody really had a story that screamed “come here, get some shares, and get rich when we get bought”. A prime example being Xiotech (now Xio). Xio’s value-prop and future are quite fuzzy from my perspective, but somehow they keep hanging in there.  At the time, everybody wondered who the next hot startup would be, or even if there would be another hot startup.  Compellent was the closest thing one could find, and they were soon snatched up by Dell.
Fortunately for technology, innovation is constant. New ideas are always being generated, particularly within the realm of data storage. Anyone who analyzes the balance sheets of EMC, NetApp, and others realizes that data storage is a profitable business, much more so than servers.  I do believe this partly explains why we see so many startups in the data storage arena (because venture capitalists see the $$$), and we also see large conglomerates accustomed to skinny margins like Dell beefing up their storage and services portfolio.
If you follow social media, then you’re already well-aware of Tintri, Whiptail, PureStorage, Violin, Atlantis, Oxygen, Nirvanix, and more.  Today, I’ll give my thoughts on some of the most-discussed startups.
1)     Tintri – my thoughts on Tintri were already published in an earlier post here: https://hoosierstorage.wordpress.com/2011/04/19/tintri-whats-the-big-deal/.  I heard from a handful of Tintri folks after posting that, none too happy with my post.  Some of them are now gone from Tintri.  Ultimately, my thoughts are largely still the same. It’s my understanding Tintri now has HA controllers, which is a big plus, but I still question the entire market of dedicated VMware storage appliances.   EMC, the parent of VMware, and the largest storage vendor is as focused as I’ve ever seen them in increasing their integration with Microsoft technologies, particularly Hyper-V.   Joe Tucci knows he can’t tie his cart to just one horse, just like he knew he had to let VMware remain independent back in 2006.   Similarly, Veeam has been putting tons of effort into increasing their functionality with Hyper-V.    These companies are both leaders in their respective market segments, think they are doing this in anticipation of receiving no value from it? Most people buy storage arrays with the intent of using them for at least 5 years.   5 years is a lifetime in the technology world. There’s no guarantee that VMware will be the dominant hypervisor in 2-3 years.  I certainly hope that they are, and if they continue to out-innovate everyone else they should be. However, if you buy a dedicated storage appliance for VMware, and in 2 years the world is moving to Hyper-V 4.0, what then?  Microsoft is unlikely to make Hyper-V work with NFS anytime soon.  Would you buy a dedicated storage device for Sharepoint and nothing else?  There still remain use-cases for physical servers and physical servers that need SAN storage.  A dedicated VMware storage box can’t help here. Why run two storage devices when you can do it all with one?
2)     Whiptail and other dedicated Flash arrays:  Dedicated Flash arrays seems to be generating quite a lot of buzz these days.  They all share a lot of similarities, in most cases the claim is made that by leveraging cheaper consumer-grade MLC flash drives and adding in some fancy magic on top, they can get a much bigger bang for the buck from these drives and make them “enterprise-class”.   They also make crazy claims like “200,000 IOPS”, a number that you simply won’t see in the real world.  Real-world numbers for enterprise-class SLC flash drives are 3500 IOPS per drive.   Anybody who tells you more than that is just blowing smoke.
I know of at least one customer who tested out one of these all-flash appliances.  It was nothing more than a rack-mountable server stuffed with Intel consumer-grade MLC drives (he took a pic and showed me).   He saw a 25% increase in DB performance when compared to the current 50 15K drives that the DB is spread across.  I’m sorry but…..I’m not impressed.   These devices also tend to be single points of failure, unless you buy a second box and connect them together to form a cluster.   I have said it before and I’ll say it again, never buy a SPOF storage solution unless your data is disposable!
As with VMware-dedicated storage appliances, I really have to question the value of the all-flash appliances, except for very niche use cases.  Flash storage for an existing array isn’t that expensive.  The real value in flash is by leveraging small amounts to increase performance where it’s needed, then fulfill capacity requirements by leveraging cheaper high-capacity SATA or NL-SAS drives.  This works and it’s in use today in many environments. Plus, it’s really not that expensive.  Why buy two devices when you can do it all with one?
3)     Oxygen: Now we’re getting into some start-ups that I see having good value propositions.  I first became aware of Oxygen about 6-9 months ago, and have been testing the technology out personally. I also have at least one client that was looking for a secure, Dropbox-like technology for their enterprise that is testing it out.  I posted some previous thoughts on Oxygen here: https://hoosierstorage.wordpress.com/?s=oxygen
As technologies like Oxygen become more robust, I truly do see this being the next-generation file server within the enterprise.  There is no doubt that we are witnessing a consumerization of IT, with tablets, smartphones, etc.  Users have a need to access their business files on these devices, and if you don’t provide them with the technology to do it, they will find a way using consumer-technologies that you don’t want them to be using.  Oxygen in particular offers a great alternative, providing sync-and share capabilities between your PC and mobile devices, yet retaining the safety and security of keeping data inside the corporate firewall.
4)     Atlantis:  When I first saw the Atlantis ILIO appliance in use, I couldn’t help but be impressed.  Storage performance with VDI is a problem many shops encounter, and when a company can cut that down by 90%, well it definitely turns heads.  Plus, unlike the dedicated physical appliances I mentioned above, Atlantis runs as a vApp, and can leverage your existing SAN environment (or local storage in some cases).  Rather than me do the talking, I would recommend taking a look at this article for a deep-dive on Atlantis: http://myvirtualcloud.net/?p=2604.  I’m currently evaluating Atlantis in my employers demo lab – so far so good. I’m also working on a model to see just how (or if) it ends up being more cost-effective than a traditional SAN leveraging some SSD’s.
That’s it for now.  Other technologies I hope to be discussing soon include Actifio and Nirvanix.
Tintri – What’s the big deal?
You may have seen several news articles a couple weeks back about the hottest new thing in VMware storage – Tintri. Their marketing department created quite a buzz with most major IT news outlets picking up the story and proclaiming that the Tintri appliance was the future of VMware storage.
Instead of re-hashing what’s been said already, here’s a brief description from CNET:
Tintri VMstore is a hardware appliance that is purpose-built for VMs. It uses virtual machine abstractions–VMs and virtual disks–in place of conventional storage abstractions such as volumes, LUNs, or files. By operating at the virtual machine and disk level, administrators get the same level of insight, control, and automation of CPU, memory, and networking resources as general-purpose shared-storage solutions.
A few more technical details from The Register:
The VMstore T440 is a 4U, rackmount, multi-cored, multi-processor, X86 server with gigabit or 10gigE ports to a VMware server host. It appears as a single datastore instance in the VMware vSphere Client – connecting to vCenter Server. Multiple appliances – nodes – can be connected to one vCenter Server to enable sharing by ESX hosts. Virtual machines (VMs) can be copied or moved between nodes using storage vMotion.
The T440 is a hybrid storage facility with 15 directly-attached 3.5-inch, 7,200rpm, 1TB, SATA disk drives, and 9 x 160GB SATA, 2-bit, multi-level cell (MLC) solid state drives (SSD), delivering 8.5TB of usable capacity across the two storage tiers. There is a RAID6 redundancy scheme with hot spares for both the flash and disk drives.
Â
I was a bit skeptical as to how this could be much different from other storage options on the market today. Tintri claims that you don’t manage the storage, everything is managed by VM. The only logical way I could see this happening is if you’re managing files (with every VM being a file) instead of LUN’s. How do you accomplish this? Use a native file system as your datastore instead of creating a VMFS file system on top of a block-based datastore. In other words, NFS.
So, after doing a little research, it appears this box isn’t much more than a simple NAS, with a slick GUI, doing some neat things under the covers with auto-tiering (akin to Compellent’s Data Progression or EMC sub-LUN FAST) and de-duplication. Instead of adding drives to a tray to expand, you expand by adding nodes. This makes for a nice story in that you scale performance as you scale capacity, but in the SMB market where this product is focused, I typically find the performance offered in the base unit with multi-core processors is 10X more than the typical SMB customer needs. In that scenario, scaling by nodes starts to become expensive as you are re-buying the processors each time instead of just buying disks, it takes up more space in the rack, and it increases power/cooling costs over just expanding by adding drives.
Today, it appears the box does not offer dual-controllers, replication, or iSCSI. iSCSI is something most SMB folks can probably go without and rely solely on NFS, which performs very similar to iSCSI at comparable Ethernet speeds and can offer additional functionality. Replication is probably something most SMB’s can also go without. I don’t see too many SMB’s going down the VMware SRM path. Most either don’t need that level of DR, or a solution like Veeam Backup and Replication fits their needs well (host-based SRM is also rumored to be coming later this year from VMware). The dual-controller issue is one I believe no customer should ever compromise on for production data, even SMB customers. I’ve seen enough situations over the years where storage processors, switches, or HBA’s just die or go into a spontaneous reboot, and that’s with products that have been established in the marketplace for some time and are known to be reliable. In this scenario with a single-controller system on Gen1 equipment, you’re risking too much. With consolidated storage you’re putting all your eggs in one basket, and when you do that, it better be a pretty darn good basket. The Register reported that a future release of the product will support dual-controllers, which I would make a priority if I were running Tintri.
Tintri managed to create quite a splash, but of course only time will tell how successful this box is going to be. Evostor launched a similar VMware-centric storage at VMworld a couple years ago but now their official domain name is expired. Tintri will certainly have an uphill battle to fight. When I look at the competition Tintri is going to face, many of their claimed advantages have already been released in recent product refreshes by their competition. The VNXe is probably the box that competes the best. The VNXe GUI is incredibly easy to use and makes no mention of LUNs or RAID groups, just like Tintri. It’s extremely cheap and EMC has deep pockets, which will be tough for Tintri to compete with. VNXe is built on proven technology that’s very mature, while Tintri is Gen 1. It supports NFS with advanced functionality like de-dupe. Tintri has a small advantage here in that EMC’s de-dupe for VM’s is post-process, while Tintri claims to support inline de-dupe (but only for the portion of VM data that resides on SSD drives). This is probably using some of the intellectual property that the ex-Data Domain employees at Tintri provided. The VNXe also supports iSCSI and will support FCoE. The NetApp FAS2020 is also a competitor in this space, supporting many of the same things the VNXe has, although the GUI is nowhere near as simple. Tintri’s big advantages are that it supports SSD today and does sub-LUN auto-tiering. These are two things that EMC put in the VNX but left out of the VNXe. It’s been stated the VNXe was supposed to get Flash drive support later this year, but there’s been no mention of auto-tiering support. Competition is good for end users and my hope is that with competitors putting sub-LUN tiering in their products at the low-end, it will force EMC’s hand to include FAST in the VNXe, because I think it will ultimately need it within 12-18 months to remain competitive in the market. Whether or not the typical SMB even needs auto-tiering with Flash drives is another story, but once the feature is there and customers start getting hyped about it, it’ll need to be there.
Further reading:
http://www.theregister.co.uk/2011/03/24/tintri_vmware_storage_appliance/
http://news.cnet.com/8301-13846_3-20045989-62.html
http://www.yellow-bricks.com/2011/03/24/tintri-virtual-machine-aware-storage/