Archive for the ‘NAS’ Category

Why enterprise-class traditional NAS products will remain

January 7, 2013 2 comments

I’ve commented before that “Unified Storage” as a differentiator is no longer the differentiator it once was, given the fact that virtually all major storage vendors are now offering a “Unified” product.  Previously, only EMC and NetApp offered a true Unified storage solution.   Now, IBM has SONAS built into the V7000, HP is building IBRIX into their products, HDS released the HUS platform that leverages BlueArc NAS, and Dell also is integrating Exanet into their products.   


However, it’s important to note that not all Unified storage products are the same.   Just because a manufacturer can “check the box” on a spec sheet that they have NAS doesn’t mean all NAS products work the same.    On a related note, now that EMC has acquired Isilon, which many perceive to be a superior product to Celerra, the rumors are always going around about when will VNX File be replaced with Isilon code on the VNX series.

I’m here to tell you that:

  • EMC and NetApp are still best equipped to fulfill the needs of traditional enterprise NAS use cases compared to any other vendor.
  • I don’t believe Isilon will replace VNX File (Celerra) anytime soon.
  • While Isilon, SONAS, IBRIX, etc are superior for scale-out use cases, that’s not the case for traditional enterprise NAS requirements.

Why is this the case?  First let me clarify, when I say traditional enterprise NAS requirements, I’m talking large enterprise, as in tens of thousands of users.    For a smaller shop, these don’t apply.   Here are some sample requirements:

  • Support for hundreds of file systems and/or mountpoints (much different than the big-data use case people talk of today involving a single file system that scales to petabytes)
    • Large enterprises have dozens if not hundreds of legacy file servers.   Wouldn’t it be great to consolidate these or virtualize them behind some file gateway?  Sure!  Is it realistic in a huge environment with thousands of custom applications that have hard-coded UNC paths to these locations, immense user disruption and re-education, etc?  Not really.
  • Robust NDMP support
    • Large enterprises may be using advanced features of NDMP such as volume-based backup and checkpoint/snapshot based NDMP backups.   Do all scale-out NAS offering support these?  I don’t know to be honest but I’d be surprised. 
  • Number of CIFS sessions
    • Handling 20,000 users logging in each morning, authenticating against AD, downloading user/group SIDs for each account, and handling drive map creations for each user that may be part of the login script is a unique requirement in its own right.  It’s very intensive, but not from the standpoint of “scale-out” processing intense.   Being able to open all these CIFS user sessions, maintain them, and potentially fail them over is not what scale-out NAS was designed for.
  • Multiple CIFS servers
    • Same point as above under multiple file systems.  It’s not necessarily so simple for an organization to consolidate tens or hundreds of file servers down to one name.
  • Multi-protocol support
    • Scale-out NAS was not designed for corporations that have invested a lot in making their traditional NAS boxes work with advanced multi-protocol functionality, with complex mapping setup between Windows AD and Unix NIS/LDAP to allow users to access the same data from both sides with security remaining intact.
  • Snapshots
    • Most scale-out NAS boxes offer snapshots, but make sure they are Shadow-Copy client integrated, as most large organizations let their users/helpdesk perform their own file restores. 
  • Advanced CIFS functions
    • Access Based Enumeration – hides shares from users who don’t have ACL rights.
    • Branch Cache – increases performance at remote offices
    • Robust AD integration and multi-domain support (including legacy domains)
  • Migration from legacy file servers with lots of permission/SID issues.
    • If you’re migrating a large file server that dates back to the stone age (NT) to a NAS, it most likely is going to have a lot of unresolvable SIDs hidden deep in its ACL’s for one reason or another.   This can be a complex migration to an EMC or NetApp box.  I know from experience Celerra had multiple low-level params that could be tweaked as well as custom migration scripts all designed to handle issues that can occur when you start encountering these problem SIDs during the migration.   A lot of knowledge has been gained here by EMC and NetApp over the past 10 years and built into their traditional NAS products.   How are scale-out NAS products designed to handle these issues?  I am hard-pressed to believe that they can handle it.


The reality is that EMC’s Celerra codebase and NetApp’s ONTAP were purpose-built NAS operating systems designed to deal with these traditional enterprise requirements.   SONAS, IBRIX, BlueArc, Exanet, and Isilon were not.    These scale-out products (which I evaluated many years ago at a former employer and even had the opportunity to watch SONAS be developed and productized) were designed for newer scale-out use cases, often involving High Performance Computing (HPC).   In fact, HPC was the sole reason my former employer looked at all of these excluding Exanet.    Many of these products use SAMBA to provide their CIFS support.  Isilon was just recently switched to a more enterprise-class custom CIFS stack.  SONAS definitely uses SAMBA because it was built upon clustered SAMBA.   HPC has completely different requirements for NAS than traditional corporate file sharing, and so companies that built products focused on the HPC market were not concerned about meeting the needs of corporate file shares.


Now this is slowly changing, as we see more traditional enterprise features being built into the latest Isilon “Mavericks” code release, particularly around security.  I’m sure the other vendors are rapidly making code modifications as well now that they’ve all picked the NAS technology that they will make their SAN’s “unified” with.    But it will take time to catch up to 10 years of complex Windows permission and domain integration development that Celerra/VNX and NetApp have as advantages on their side.    From a quick search, it appears Isilon does not support MS Access Based Enumeration, so to think that EMC is going to dump Celerra/VNX code and plop in Isilon code on its Unified storage arrays is silly, when there are probably thousands of customers using this functionality.


Categories: EMC, IBM, NAS, NetApp

Is the end of the File Server finally in sight?

December 28, 2011 Leave a comment

A year ago I wrote an article detailing my thoughts on how greatly exaggerated predictions of the imminent death of the file server truly were. A few years back many thought the file server would be gone by now, replaced by SharePoint or other similar content portals. Today, file servers (herein referenced as NAS) are alive and well, storing more unstructured content than ever before. You can read the original article here:

In summary, the main reasons why NAS has not disappeared are:

  • Much of the content stored on NAS is simply not suitable for being stored in a database, and middleware technologies that allow the data to stay on NAS but be presented as if it were in the database adds complexity.
  • Legacy environments are often too big to accommodate a migration of all user and department shared files into a new repository in a cost effective manner.
  • Legacy environments often have legacy apps that were hard-coded to use UNC paths or mapped drive letters.
  • Many businesses in various industries have instruments or machinery that write data to a network share to store data using commonly accepted CIFS and NFS protocols.
  • The bulk of file growth today is in larger rich media formats, which are not well-suited for SharePoint.
  • NAS is a great option for VMware using NFS

The other day I found myself in a presentation where the file server is dead claim was made once again, and the very thought crossed my mind as well after seeing some examples of impressive technology hitting the street. What’s driving the new claims? Not just cloud storage (internal or external), but more specifically Cloud storage with CIFS/NFS gateways and sync and share capabilities with mobile devices.

EMC’s Atmos is certainly one technology playing in this space, another other is Nirvanix. I’ve also had some exposure to Oxygen Cloud and am really impressed with their corporate IT friendly DropBox-like offering. So how do these solutions replace NAS? Most would agree that the consumerization of corporate IT is a trend going on in the workplace right now. Many companies are considering “Bring your own device” deployments instead of supplying desktops and laptops to everyone. Many users (such as doctors) are adopting tablet technology on their own to make themselves more productive at work. Additionally, many users are using consumer-oriented websites like DropBox to collaborate at work. The cloud storage solutions augment or replace the file server by providing functionality similar to these public cloud services, but the data resides inside the corporate firewall. Instead of a home drive or department share, a user gets a “space” with a private folder and shared folders. New technologies allow that shared space to be accessed by traditional NFS or CIFS protocols, as a local drive letter, via mobile devices, or via a web-based interface. Users can also generate links that expire within X number of hours or days that allow an external user to access one of their files, without the needing to email a copy of the document or put it out on DropBox, FTP, etc.

The one challenge I see is that no single solution does everything yet, meaning CIFS/NFS, web-based, and mobile sync and share. Atmos can do CIFS/NFS, but mobile device access requires something like Oxygen. Nirvanix also does just CIFS/NFS. Oxygen by itself isn’t really setup to be an internal CIFS/NFS access gateway, it’s primarily intended for web/mobile sync and share use cases. Panzura, Nasuni, etc offer CIFS/NFS or iSCSI gateway access to the cloud, but they don’t offer sync and share to mobile devices. You could certainly cobble together something that does everything by putting appliances in front of gateways that sit in front of a storage platform, but then it starts to become difficult to justify the effort. You’d also have to consider the fact you’ll need to re-architect within 12-18 months when more streamlined solutions are available. Either way, file sharing is still an exciting place to be with lots of change occurring in the industry. I can definitely see the possibility of home drives and department/workgroup shares going away into a private cloud offering, but the concept of file sharing is certainly still alive and well and CIFS/NFS isn’t going anywhere anytime soon. I don’t like to make predictions, but at this point my best guess is the technology that can do the best job of integrating legacy NFS/CIFS not just with “cloud storage”, but with web-friendly access and mobile device access that accelerate the consumerization trends will be the winner in this race.

Categories: Cloud, EMC, NAS, SAN Tags:

A new true Unified storage contender in the market

October 13, 2011 1 comment

Most folks have heard of Unified storage by now and are well aware of the basic capabilities, namely NAS and SAN in a single box.   NetApp and EMC have been the primary players in this market for some time, and to date have been the only vendors to offer a true Unified solution in the enterprise arena.  In using the term “true Unified”, I’m looking at the technology and determining if it is leveraging a purpose-built storage OS to handle SAN and NAS data delivery to hosts.    There are other vendors out there claiming they have Unified capabilities because it is a compelling feature for customers, but by my definition taking a SAN and throwing on a Windows Storage Server to do CIFS does not count as a true Unified solution.   I’m less concerned about the semantics of whether or not there are truly two code bases in the box, one serving SAN and the other serving NAS, as long as they operate from a common storage pool and have a single-point of management.  


I figured the next vendor with a true Unified solution would be Dell, as multiple signs have been pointing to them integrating some NAS technology they acquired into their existing SAN platforms (Compellent and Equalogic), but surprisingly, the announcement yesterday came from IBM.   IBM took the V7000 array they released last year based on SVC technology and added Unified functionality to it by leveraging their SONAS product (Scale-out NAS).    I consider this to be a pretty major announcement, as NetApp and EMC can no longer claim superiority as the only Unified storage vendors with native solutions.   IBM could sell OEM’d NetApp arrays (N-Series) in the past if the situation warranted, and it will be interesting to see if this announcement is the beginning of the end for the IBM-NTAP OEM relationship.


In the case of the V7000, IBM has integrated the SONAS code into the solution and made one GUI to manage it.   Because the V7000 runs SVC-based code and the NAS is handled by SONAS components, it does not appear to be a unified code-base like NetApp, but two code-bases tied together with a single GUI like the VNX.    From a picture I saw on Tony Pearson’s blog, they are including two IBM servers in the stack (called “File Modules” that are akin to datamovers or Filers) that run active-active sitting in front of the V7000 controllers.  


I had some exposure to SONAS when I worked at a large pharma and saw its development first-hand for a project we undertook but never bought.   IBM hired the guy who created SAMBA (Andrew  Tridgell) to architect an active-active clustered SAMBA architecture to run on top of IBM’s Global Parallel File System (GPFS).   It was a very interesting system, and Andrew Tridgell still ranks as one of the smartest people I have ever met, but back in 2007-2008 it was just a little too new.   Fast forward 3 years and I’m sure the system is much more robust and fully-baked, though I’m not 100% sold on using SAMBA for CIFS access in the enterprise.


Because SONAS/GPFS is a scale-out system, the NAS functionality in the V7000 does have an advantage over EMC and Netapp in that the same file system can be served out of the two File Modules simultaneously.  However, it appears the V7000 may be limited to just two file modules from what I see, unlike a full SONAS/GPFS solution or something like Isilon.


Only time will tell if the V7000 Unified will be successful and IBM will keep development of the product a hot-priority.   Some folks would point to the legacy DS boxes as an example of a technology that was good when it was first released, but then sat for years without any major updates while the technology continued to evolve.   But at least for the immediate future, the V7000 is certainly worthy competition in the Unified space and an example of how competition is good for the industry overall, as it forces the big gorillas to keep on their toes and continue to find new ways to innovate.  


Further reading:

Categories: IBM, NAS, SAN

The Importance of Storage Performance Benchmarks

September 15, 2011 2 comments

As one who scans the daily flood of storage news every day, I’ve started to notice an uptick in the number of articles and press releases over the past year highlighting various vendors who have “blown away” a benchmark score of some sort and claim ultimate superiority in the storage world. 2 weeks later, another vendor is trumping that they’ve beaten the score that was just posted 2 weeks prior. With the numbers we’re seeing touted, I’m sure 1 bazillion IOPS must only be right around the corner.

Most vendors who utilize benchmarks tend to be storage startups, looking to get some publicity for themselves, not that there’s anything wrong with that. You gotta get your name out there somehow. For the longest time, the dominant player in the storage world, EMC, refused to participate in benchmarks saying they were not realistic of real-world performance. I don’t disagree with that, in many cases benchmarks are not indicative of real-world performance. Nevertheless, now even EMC has jumped into the fray. Perhaps they decided that not participating costs more in negative press than it does good.

What does it all mean for you? Here are a couple things to consider:

  1. Most benchmark tests are not indicative of real-world results. If you want to use a benchmark stat to get a better sense of what the max system limits are, that’s fine. But, don’t forget what your requirements truly are and measure each system against that. In most cases, customers are using a storage array for mixed workloads from a variety of business apps for a variety of use cases (database, email, file, and VMware, etc).  These different applications all have different I/O patterns.  The benchmark tests don’t simulate workloads that are related to this “real-world” mixed I/O pattern.   Benchmarks are heavily tilted in favor of niche use cases with very specific workloads. I’m sure there are niche use cases out there where the benchmarks do matter, but for 95% of storage buyers, they don’t matter. The bottom line is be sure the system has enough bandwidth and spindles to handle your real MB/sec and IOPS requirements. Designing that properly will be much more beneficial to you than getting an array that recently did 1 million IOPS in a benchmark test.
  1. Every vendor will reference a benchmark that works in their favor. Every vendor pretty much seems to be able to pull a benchmark stat out of their hat that favors their systems above all others.

Here’s an example I saw last year when working with a customer who evaluated 3 different vendors (IBM, NetApp, and EMC), and how I helped the customer get clear on what was real. In this case, both non-EMC vendors were referencing SPC numbers that showed how badly an EMC CX3-40 performed relative to their platforms. A couple alarms went off for me immediately:

  1. The CX3-40 was a generation old relative to the other platforms. The CX4 was the current platform on the market (now replaced by VNX). In other words, not an apples-to-apples comparison.
  2. At the time the CX3-40 existed, EMC did not participate in SAN benchmarks for its mid-range or high-end arrays.

I took a look at the V7000 SPC-1 benchmark and found some interesting conclusions.  Here is a chart that shows how the V7000 performed on the benchmark, and it shows other competitors as well:

The V7000 scored 56,500.   Interesting to note, since the box only supported 120 drives at the time, they had to utilize the SVC code in it to add on a DS5020 box, which allowed them to add more drives (200 total) to the configuration.  They put 80 15K RPM drives in the DS5020, higher speed drives the V7000 didn’t support natively at the time.    What’s important to note about the CX3-40 results seen in the SPC1 results is that this was a CX3-40 that NetApp purchased and ran a test on, then submitted the results to without EMC’s permission.  I don’t care what your vendor affiliation is, that’s not a fair fight. EMC had no input into how the array was configured and tuned.    Even though the array could hold 240 drives, it was only configured it for 155.    The CX3-40 scored 25,000.   Let’s make a realistic assumption that if EMC had configured the array and tuned it for the benchmark as other vendors did, then it could have done at least 25% better.   This would give it a score of 31,000.    The CX3-40 was a predecessor to the CX4-240 and they both hold 240 drives.   Performance and spec limits pretty much doubled across the board from CX3 to CX4, because EMC implemented a 64-bit architecture with the CX4’s release in 2008. So, again let’s make a realistic assumption and take the 31,000 result of the CX3-40 and double it to create a theoretical score for the CX4-240 of 62,000.

If I look at other arrays that are comparable to the CX4-240 in the results list, such as the DS5300 or FAS 3000 series, this theoretical score is right in the ballpark of the other arrays.     I would hope most would agree that this shows all the arrays in-scope were within striking distance of each other.   What exactly do these numbers mean relative to your business….not much. You can’t design a system for your business needs using these numbers. When most customers are analyzing their performance requirements, they have figures for IOPS, throughput, and latency that they need to meet to ensure good business application performance, not a theoretical benchmark score target.

Benchmarks can certainly be interesting, and I admit sometimes I think it’s cool to see a system that did X number of GB’s per second of throughput or X million IOPS, but my recommendation is don’t get to spun up on them in your search for a storage platform.  Every vendor has a benchmark that makes them look the best.  Instead, use your own metrics or work with a trusted business partner who can help you gather the data specific to your environment and evaluate each technology against how well it meets your business needs.

Categories: EMC, NAS, NetApp, SAN

Does Archiving to Centera or CAS Still Matter?

May 4, 2011 3 comments

Over the past 2 years, I’ve noticed a rather drastic reduction in the number of archiving conversations I have with customers. Email archiving still pops up, but most of the folks who need to do it are already doing it. File system archiving seems to be even less common these days, though it still pops up occasionally. There is certainly still a market in healthcare and financials, but even that seems less prevalent than it was at one time. Archiving did come up in a recent conversation, which got me thinking about this topic again and I thought it’d make a good blog post.

Without a doubt, the archive market seems to have shrunk. I’m reminded of my time at EMC a year and a half ago when I had to go thru some training about “recapturing the archive market”. From the early-mid 2000’s until the late 2000’s, the “archive first” story was the hottest thing going. EMC built an entire business on the Backup, Recovery, and Archive story (BURA), which encompassed the idea of archiving your static and stale data first, to save money by shrinking the amount of data you need to back up and store on more expensive Tier 1 storage. As a result, they made the term Content Addressable Storage (CAS) go mainstream and be copied by others.  The Centera platform was a product EMC purchased rather than developed in-house, but they created a successful business out of it nonetheless. The predecessor of the Centera was a product called FilePool. The founders of FilePool are actively involved in another CAS startup now called Caringo.

How CAS Works

The Content Address is a digital fingerprint of the content. Created mathematically from the content itself, the Content Address represents the object—change the binary representation of the object, (e.g. edit the file in any way) and the Content Address changes. This feature guarantees authenticity—either the original document remains unchanged or the content has been modified and a new Content Address is created.

Step 1 An object (file, BLOB) is created by a user or application.
Step 2 The application sends the object to CAS system for storing.
Step 3 CAS system calculates the object’s Content Address or “fingerprint,” a globally unique identifier.
Step 4 CAS system then sends the Content Address back to the application.
Step 5 The applications store the Content Address—not the object—for future reference. When an application wants to recall the object, it sends the Content Address to the CAS system, and it retrieves the object. There is no filesystem or logical unit for the application to manage.

CAS systems also had another compelling advantage back in the day, that being there was very little storage management involved. No RAID groups, LUNs, or Storage Groups to ever build or allocate. No traditional file system to ever manage. Per IDC, a full time employee could effectively manage considerably more CAS storage than any other type (320TB vs. 25TB for NAS/SAN).

I have to admit, the CAS story was compelling. Thousands of customers signed up and bought hundreds of PB’s of CAS from multiple vendors. The Fortune 150 company I worked for in the past implemented hundreds of TB’s of Centera CAS as part of an archiving strategy. We archived file system, database, and email data to the system using a variety of ISV packages. Given that this market used to be so hot, I’ve often thought about the possible scenarios for it cooling off, and why many people now choose to use a Unified storage platform for archiving rather than a purpose-built CAS system. Here are a few of the thoughts I’ve had so far (comments welcome and appreciated):

  1. CAS wasn’t as simple as claimed. Despite the claims of zero storage management, in reality I think several of the admin tasks that were eliminated by CAS were replaced by new management activities that were required for CAS. Designing archive processes with your internal business customers, evaluating various archiving software packages, configuring those software packages to work with your CAS system, and troubleshooting those software packages can be cumbersome and time-consuming.
  2. Storage management has gotten considerably easier in the last 5 years.   Most vendors have moved from RAID groups to pool’s, LUN/Volume creation is handled via GUI instead of CLI, and the GUI’s have been streamlined and made easy for the IT generalist to use.   Although I would say a CAS appliance can still be easier to manage at scale, the difference is not near as great as it was in 2005.
  3. NetApp created a great story with their one size fits all approach when they built in WORM functionality to their Unified storage platform, which was soon copied by EMC in the Celerra product and enhanced to include compliance.
  4. Many customers didn’t need guaranteed content authenticity that CAS offers, they simply needed basic archiving. Before NetApp and EMC Unified platforms offered this capability, Centera and other CAS platforms were the only choice for a dedicated archive storage box. Once NetApp and then EMC built in archiving into the cost-effective mid-range Unified platform, my opinion is it cut Centera and other CAS systems off at the knees.
  5. CAS systems were not cheap, even if they could have a better TCO than Tier 1 SAN storage. It was primarily larger enterprises that were typically able to afford CAS, while the lower-end of the market quickly gravitated to a Unified box that had archive functionality built in.
  6. Backup windows were not always reduced by archiving. Certainly there were some cases where it could help, but also areas where it did not. As an example, many customers wanted to do file system archiving on file systems with millions and millions of files. When you archive, the data is copied to the archive and a stub is left in the original file system. Using traditional backup, these stubs still need to be backed up, and the backup application sees them as a file. This means even if the stub is only 1KB, it still causes the backup application to slow way down as part of the catalog indexing process. There are some workarounds like doing a volume-based backup, which backs up the file system as an image. However, there are caveats here as well. As an example, if you do file-system de-dupe on an EMC platform in conjunction with archiving, you can no longer do granular file-level recoveries from a volume-based backup. Only a full-destructive restore is allowed.
  7. Many customers didn’t really need to archive for compliance purposes, rather they simply wanted to save money by moving stale data from Tier 1 storage to Tier2/3 storage. This required adding in cost and complexity for a file migration appliance or ISV software package to perform the file movement between tiers, which ate away at the cost savings. Now that many storage arrays have auto-tiering functionality built-in, the system will automatically send less frequently accessed blocks of data to a lower tier of storage, completely transparent to the admin and end-user, with no file stubbing required.

To sum it up, what would I recommend to a customer today? CAS is still a very important storage product and although it’s not a rapidly growing area, it still has a significant install base that will remain for some time. There still are some things that a CAS system can do that the Unified boxes cannot. Guaranteed content authenticity with an object-based storage model is certainly one of those, and probably the most important. If you require as good of a guarantee as you can possibly get that your archive data is safe, CAS is the way to go. As I alluded to before, this still has importance in the healthcare and financial verticals, though I see smaller institutions in those verticals often choose a Unified platform for cost-effectiveness. Outside of those verticals, if your archive storage needs are <100TB, I’m of the opinion that a Unified platform is most likely the way to go, keeping in mind every environment can be unique. There may also be exceptions for applications that offer CAS API integration thru the XAM protocol. If you’re using one of those applications, then it may also make sense to investigate a true CAS platform.

Further reading on CAS:

Categories: Archive, Backup, EMC, NAS, NetApp

Tintri – What’s the big deal?

April 19, 2011 2 comments

You may have seen several news articles a couple weeks back about the hottest new thing in VMware storage – Tintri. Their marketing department created quite a buzz with most major IT news outlets picking up the story and proclaiming that the Tintri appliance was the future of VMware storage.

Instead of re-hashing what’s been said already, here’s a brief description from CNET:

Tintri VMstore is a hardware appliance that is purpose-built for VMs. It uses virtual machine abstractions–VMs and virtual disks–in place of conventional storage abstractions such as volumes, LUNs, or files. By operating at the virtual machine and disk level, administrators get the same level of insight, control, and automation of CPU, memory, and networking resources as general-purpose shared-storage solutions.

A few more technical details from The Register:

The VMstore T440 is a 4U, rackmount, multi-cored, multi-processor, X86 server with gigabit or 10gigE ports to a VMware server host. It appears as a single datastore instance in the VMware vSphere Client – connecting to vCenter Server. Multiple appliances – nodes – can be connected to one vCenter Server to enable sharing by ESX hosts. Virtual machines (VMs) can be copied or moved between nodes using storage vMotion.

The T440 is a hybrid storage facility with 15 directly-attached 3.5-inch, 7,200rpm, 1TB, SATA disk drives, and 9 x 160GB SATA, 2-bit, multi-level cell (MLC) solid state drives (SSD), delivering 8.5TB of usable capacity across the two storage tiers. There is a RAID6 redundancy scheme with hot spares for both the flash and disk drives.


I was a bit skeptical as to how this could be much different from other storage options on the market today. Tintri claims that you don’t manage the storage, everything is managed by VM. The only logical way I could see this happening is if you’re managing files (with every VM being a file) instead of LUN’s. How do you accomplish this? Use a native file system as your datastore instead of creating a VMFS file system on top of a block-based datastore. In other words, NFS.

So, after doing a little research, it appears this box isn’t much more than a simple NAS, with a slick GUI, doing some neat things under the covers with auto-tiering (akin to Compellent’s Data Progression or EMC sub-LUN FAST) and de-duplication. Instead of adding drives to a tray to expand, you expand by adding nodes. This makes for a nice story in that you scale performance as you scale capacity, but in the SMB market where this product is focused, I typically find the performance offered in the base unit with multi-core processors is 10X more than the typical SMB customer needs. In that scenario, scaling by nodes starts to become expensive as you are re-buying the processors each time instead of just buying disks, it takes up more space in the rack, and it increases power/cooling costs over just expanding by adding drives.

Today, it appears the box does not offer dual-controllers, replication, or iSCSI. iSCSI is something most SMB folks can probably go without and rely solely on NFS, which performs very similar to iSCSI at comparable Ethernet speeds and can offer additional functionality. Replication is probably something most SMB’s can also go without. I don’t see too many SMB’s going down the VMware SRM path. Most either don’t need that level of DR, or a solution like Veeam Backup and Replication fits their needs well (host-based SRM is also rumored to be coming later this year from VMware). The dual-controller issue is one I believe no customer should ever compromise on for production data, even SMB customers. I’ve seen enough situations over the years where storage processors, switches, or HBA’s just die or go into a spontaneous reboot, and that’s with products that have been established in the marketplace for some time and are known to be reliable. In this scenario with a single-controller system on Gen1 equipment, you’re risking too much. With consolidated storage you’re putting all your eggs in one basket, and when you do that, it better be a pretty darn good basket. The Register reported that a future release of the product will support dual-controllers, which I would make a priority if I were running Tintri.

Tintri managed to create quite a splash, but of course only time will tell how successful this box is going to be. Evostor launched a similar VMware-centric storage at VMworld a couple years ago but now their official domain name is expired. Tintri will certainly have an uphill battle to fight. When I look at the competition Tintri is going to face, many of their claimed advantages have already been released in recent product refreshes by their competition. The VNXe is probably the box that competes the best. The VNXe GUI is incredibly easy to use and makes no mention of LUNs or RAID groups, just like Tintri. It’s extremely cheap and EMC has deep pockets, which will be tough for Tintri to compete with. VNXe is built on proven technology that’s very mature, while Tintri is Gen 1. It supports NFS with advanced functionality like de-dupe. Tintri has a small advantage here in that EMC’s de-dupe for VM’s is post-process, while Tintri claims to support inline de-dupe (but only for the portion of VM data that resides on SSD drives). This is probably using some of the intellectual property that the ex-Data Domain employees at Tintri provided. The VNXe also supports iSCSI and will support FCoE. The NetApp FAS2020 is also a competitor in this space, supporting many of the same things the VNXe has, although the GUI is nowhere near as simple. Tintri’s big advantages are that it supports SSD today and does sub-LUN auto-tiering. These are two things that EMC put in the VNX but left out of the VNXe. It’s been stated the VNXe was supposed to get Flash drive support later this year, but there’s been no mention of auto-tiering support. Competition is good for end users and my hope is that with competitors putting sub-LUN tiering in their products at the low-end, it will force EMC’s hand to include FAST in the VNXe, because I think it will ultimately need it within 12-18 months to remain competitive in the market. Whether or not the typical SMB even needs auto-tiering with Flash drives is another story, but once the feature is there and customers start getting hyped about it, it’ll need to be there.

Further reading:

Categories: NAS, SAN, Virtualization, VMware

Why NAS? The Benefits of Moving Traditional File Servers to NAS vs. Virtualizing Them

April 14, 2011 Leave a comment

Customers are often presented with the dilemma of moving file servers to NAS (CIFS shares) or virtualizing the file servers. The latter keeps the environment largely “as is” with the flexibility benefits of having servers virtualized. On the surface, the latter option can seem to have a lot of appeal, because you get to keep everything within the construct of your virtualization hypervisor.

Advantages of Virtualizing Windows File Servers

  1. Maintains existing way of doing things.
  2. Allows you to leverage advanced virtualization functionality, such as VMware VMotion and VMware SRM for DR for your file servers.

It’s important, though, to understand all of the benefits that NAS truly offers. The advantages of leveraging NAS instead of traditional file servers (physical or virtual) are still numerous. The rest of this article lists the advantages that specifically exist with the EMC Celerra NAS platform. Some points carry over to other NAS platforms as well, but not all.

Advantages of Moving Windows File Servers to EMC Celerra NAS

  1. Microsoft-compatibility without the bloat: Celerra uses a purpose-built NAS OS that is only about 25MB in size compared to a default Windows Server install. This makes Celerra much more efficient at doing the job of serving up file data. Since it does not run Windows, it is not susceptible to Microsoft vulnerabilities and virus code cannot be executed on it directly. When you virtualize a Windows file server, you still have a Windows server that is susceptible to infection or worms. Removing these servers from the environment will reduce the number of servers less that the administrator has to worry about.
  2. Microsoft Support: EMC is an official licensee of the CIFS/SMB protocol (no reverse engineering), so it is guaranteed to be fully compatible with Active Directory and all supported Windows clients.  EMC also maintains a joint support center with Microsoft engineers in Atlanta, GA.
  3. Checkpoints/Snapshots: File system snapshots enable instant restore for users. You can do this now with Volume Shadow Copies on your Windows server, but it’s not nearly as scalable as a Celerra. Currently, you are allowed up to 96 per file system with Celerra.
  4. De-dupe: With EMC Celerra, you can de-duplicate CIFS and NFS data, including NFS-based VMware datastores. With typical user data on CIFS shares, you can expect to see a 30-40% reduction in the amount of space used. Celerra also has functions to prevent re-hydration of the data when doing an NDMP-protocol backup. In my testing with de-dupe on NFS-based VMware datastores, I saw between 20-40% reduction in the amount of space used for virtual machines.
  5. Failover and DR: All file server data can be easily replicated with Celerra Replicator. Failover can still be accomplished with the click of a button from the GUI.
  6. Scalability: You typically don’t see Windows file systems with much more than 2TB of data on them due to scalability issues. Celerra can have up to 32TB on a single server and it truly scales to that amount.
  7. Virtual Desktops: NAS can make perfect sense for VDI environments, as you can gain efficiencies by centralizing user data to a CIFS share. Granted, you can do that on a traditional Windows file server, but you cannot take advantage of advanced Celerra features. One of these features is de-duplication. You can crunch down typical user data by 30-40% with no performance impact that end users are going to notice.
  8. NDMP backup: NDMP is a backup protocol that all NAS and backup vendors standardized on many years ago. It is needed since true purpose-built NAS operating systems are closed, with no ability for users to directly interact with the OS, hence you cannot install a backup agent. Due to the fact the NDMP code is built into the OS, NDMP backups are traditionally much more reliable than traditional backup agents. The data also traverses directly from the storage to the backup medium, reducing network traffic.
  9. Multi-protocol: Should you ever need to enable NFS access for Linux servers in your environment, this capability exists natively within the Celerra. On a Windows file server, you must enable Microsoft Unix file sharing services, which overwhelming evidence shows does not perform well and is not reliable.
  10. Built-in HA: Every Celerra is configured as an active/passive or n+1 cluster. This is automatically setup right out of the box. The failover code is simply part of the operating system, so it is much cleaner than a traditional Windows cluster, both from a management standpoint and a failover standpoint.
  11. Easy Provisioning: File servers can be provisioned on the Celerra from start to sharing files in under 5 minutes. Even a VM would take more time, not just to spin-up the VM, but actually create the shares. The Celerra comes with an easy GUI, but you can also use traditional Windows CIFS management tools to create shares.
  12. Virtual Provisioning and Auto-Expansion: Celerra has for many years now supported virtual provisioning (aka thin provisioning). This allows you to provision a share that appears to be 1TB in size, even though physically it might only be 200GB. This is useful if you have business customers that regularly ask for more space than they need. In the past, you would allocate everything they asked for, only to find out 1-2 years down the road that they only used 25% of what they said they needed. There was no easy way to reclaim this space and re-purpose it. Now, you can use virtual provisioning to alleviate this issue, and rely on auto-expansion capabilities, which will grow the physical file system as needed, so you are regularly bothered as the administrator to constantly expand the file system.
  13. File Retention: Many businesses today have policies and procedures governing the retention of business documents. Some businesses may even fall under government regulation that adds an extra layer of scrutiny. Celerra natively supports a technology called File-level Retention (FLR), which allows you to use a GUI or custom scripts to set retention on a group of files. This will prevent even an administrator from being able to delete the underlying file system.
  14. Tiered Storage: Celerra natively supports tiering files to other types of storage within the same box, or to a completely different box, whether it is a different NAS box or a Windows server.
Categories: EMC, NAS, NetApp

What Ever Happened to “The Death of the File Server”?

March 16, 2011 Leave a comment

Roughly 3 years ago the hype was at an all-time high. Consultants were proclaiming that the file server was dead and that we’d all be using collaborative web-based portals for file sharing in the future (ala SharePoint). So 3 years later where do we stand? The file server (aka Network Attached Storage or NAS) is more popular than ever.

That’s not to say that Sharepoint hasn’t increased dramatically in popularity as well—certainly it has. My intent is not to bash SharePoint, because I think it’s a great product when you use it for what it is best at doing. Rather, I want to investigate why the NAS market has only grown, and why the need for NAS in organizations is increasing particularly with the use of VMware.

I used to be the NAS administrator for a 200+ TB EMC Celerra NAS environment. The Fortune 200 company I was working at started a significant project to implement SharePoint and eliminate traditional file shares. I had to sit thru many meetings with consultants and project managers where I was in attendance to basically give details of the file share environment and discuss plans for how it would “go away”. I would chuckle to myself internally every time, because I knew the file shares weren’t going anywhere, and the hype I kept hearing about how SharePoint would manage everything wasn’t going to play out as planned.

So why was I confident that NAS wasn’t going away from the environment I managed or any environment?

First, out of the 200+ TB at the company I worked at, the amount of data that would be suitable for SharePoint only comprised about ¼ of the total. The rest of the data was highly specialized, atypical formats, or binaries simply not suitable for storage in a SQL database. I would certainly agree that SharePoint is the best medium for storing office documents in a manner so that they can be shared between colleagues or departments, be tracked with versioning, and made searchable thru an easy web-based portal. With the exception of 50MB PPT files that the CEO makes by cutting and pasting images in bmp format, the reality is these office documents usually don’t hog the majority of space on a file server. While it’s often true that office files constitute the majority number of files, usually the majority of space usage is from a smaller number of non-office files. Typically, when I run a free File System Assessment (FSA) for a customer, I find that the majority of space is used by ZIP files, ISO images, or other binary formats. None of these file formats are best served out of a SharePoint environment and should be stored on a NAS device.
The second reason why NAS won’t go away is that legacy environments are often too big to accommodate a migration of all user and department shared files into a new repository in a cost effective manner. Obviously, new companies or very small companies may not be hampered by these legacy problems. The bigger the organization the more difficult it is to migrate the data without the cost blowing up to extreme proportions. At that point, the business is going to look at the costs of the migration and then look at the value of the data. Typical user data is seen as having low value. While department files may be very critical, executives that control the purse strings rarely have visibility down into this layer and don’t recognize how much of their business runs on Excel. If you have a file server that dates back to the NT days, imagine all the outdated files that live out there that nobody is willing to delete. If you have file servers this old, there are usually a large number of files with invalid SID’s or corrupted ACL’s that become a real nightmare to migrate. In my former environment, the company would sponsor multiple “clean up” days where it was “mandatory” for employees and departments to clean up their home drives and other file shares to comply with and aggressive records retention policy. How much space was reclaimed from these efforts? Less than 1 percent of the total amount of file server data. What may be junk is often treasured by the person who owns it, and they simply aren’t going to delete the data or they’ll find another place on the network to store it so you simply move the problem from one place to another.

Lastly, let’s discuss why the use of NAS is increasing. First, as we all know the digital age has caused a massive increase in the amount of digital information that needs to be stored. Whether or not you “buy off” on analyst research reports that claim we’ll see 45X growth in the next decade, certainly we will have data growth of some multiple. Again, most of this growth is in the form of rich media, which is not suitable for a SharePoint farm and should be stored on NAS or perhaps a Cloud Storage offering of some kind.

More specifically, I’m seeing a significant increase in the popularity of NAS for VMware environments. First, NFS continues to become more popular as the shared storage sitting behind VMware. Chad Sakac of EMC has a great explanation of some of the more recent benefits you get by using VMware on an EMC Celerra serving out NFS. Additionally, as VMware View increases in popularity, the consolidated storage requirements increase because all of the user data files get centralized. It’s not a good idea to have this lower-value data residing in data stores located on high-performing premium storage. Rather, it makes more sense to centralize this data via user-folder redirection to a centralized CIFS share on a NAS. Once there, it becomes easier to backup and manage. You can also enable additional functionality built into platforms like the EMC Celerra to de-dupe this data on primary storage. Typical user files can be crunched down 30-40%, thereby delaying your next purchase of storage. Perhaps it is a bit ironic, but the hottest trend in technology, specifically virtualization, is only helping to ensure that one of the oldest technologies of the server/client era will continue to not just stick around, but have a bright future.


For more information on what NOT to put in SharePoint, check here and here.
Categories: NAS