Archive

Archive for the ‘Archive’ Category

Centera Update – 12TB nodes now available

December 24, 2012 1 comment

Here’s one that slipped in under the radar.  With EMC’s focus for archiving being on Isilon and Atmos these days, it wasn’t well publicized that a new Gen4LP (LP = Low Power = 5400RPM drives) node is now available.   I didn’t realize it myself until I was filling out a Centera change control form and noticed a new option.    The 12TB G4LP nodes use 3TB drives internally.  Other than that I doubt there is much change to the hardware.   One thing to note, you cannot add these to an existing cube that is built on previous G4LP nodes.   12TB nodes can only be in a cube by themselves. 

I’ve commented before that I still believe Centera is still a very legitimate platform: https://hoosierstorage.wordpress.com/2011/05/04/does-archiving-to-centera-or-cas-still-matter-2/.  Many Centera customers are struggling with the decision of whether to move to Isilon or Atmos.  While either one may make sense in some cases, in other cases there’s nothing wrong with sticking with Centera.  It’s a pretty painless migration do to something like a CICM or C2C, certainly less effort and cost will be involved than migrating to an alternative technology.   Yes, Centera is somewhat “proprietary”, especially now that EMC has ended XAM development, but if EMC is a preferred manufacturer then there isn’t much to worry about.    You can rest assured that EMC is going to support this platform for a minimum of 5 years once the node hardware goes end of sale like they do with all hardware (except Symm, which is longer).   Even in an unfathomable scenario where EMC went out of business, there are multiple 3rd parties that can migrate data off Centera now.  The 12TB node will offer a pretty attractive refresh TCO based on hardware refreshes I’ve seen going from G3/G4 to Gen4LP 8TB nodes.  Then, in 5-7 years when it’s time for another tech refresh, hopefully the debate between Isilon and Atmos as the preferred archiving platform will be over 🙂

Release announcement:

Product: Centera CentraStar v4.2 SP2 (4.2.2) and 12TB Node Hardware

General Availability Date: Nov 19, 2012

Product Overview

Centera is a networked storage system specifically designed to store and provide fast, easy access to fixed content (information in its final form). It is the first solution to offer online availability with long-term retention and assured integrity for this fastest-growing category of information. CentraStar is the operating environment that runs the Centera cluster.

New Feature Summary

The following new features and functionality are available within CentraStar v4.2.2:

  • The introduction and support for GEN 4LP 12TB nodes utilizing 3TB drives.
  • Improved security through an updated SUSE Linux platform (SLES 11 SP2) and Java updates.
  • Consolidation of the CentraStar and Centera Virtual Archive software into one package for improved installation and maintenance.

 

Categories: Archive, EMC

Does Archiving to Centera or CAS Still Matter?

May 4, 2011 3 comments

Over the past 2 years, I’ve noticed a rather drastic reduction in the number of archiving conversations I have with customers. Email archiving still pops up, but most of the folks who need to do it are already doing it. File system archiving seems to be even less common these days, though it still pops up occasionally. There is certainly still a market in healthcare and financials, but even that seems less prevalent than it was at one time. Archiving did come up in a recent conversation, which got me thinking about this topic again and I thought it’d make a good blog post.

Without a doubt, the archive market seems to have shrunk. I’m reminded of my time at EMC a year and a half ago when I had to go thru some training about “recapturing the archive market”. From the early-mid 2000’s until the late 2000’s, the “archive first” story was the hottest thing going. EMC built an entire business on the Backup, Recovery, and Archive story (BURA), which encompassed the idea of archiving your static and stale data first, to save money by shrinking the amount of data you need to back up and store on more expensive Tier 1 storage. As a result, they made the term Content Addressable Storage (CAS) go mainstream and be copied by others.  The Centera platform was a product EMC purchased rather than developed in-house, but they created a successful business out of it nonetheless. The predecessor of the Centera was a product called FilePool. The founders of FilePool are actively involved in another CAS startup now called Caringo.

How CAS Works

The Content Address is a digital fingerprint of the content. Created mathematically from the content itself, the Content Address represents the object—change the binary representation of the object, (e.g. edit the file in any way) and the Content Address changes. This feature guarantees authenticity—either the original document remains unchanged or the content has been modified and a new Content Address is created.

Step 1 An object (file, BLOB) is created by a user or application.
Step 2 The application sends the object to CAS system for storing.
Step 3 CAS system calculates the object’s Content Address or “fingerprint,” a globally unique identifier.
Step 4 CAS system then sends the Content Address back to the application.
Step 5 The applications store the Content Address—not the object—for future reference. When an application wants to recall the object, it sends the Content Address to the CAS system, and it retrieves the object. There is no filesystem or logical unit for the application to manage.

CAS systems also had another compelling advantage back in the day, that being there was very little storage management involved. No RAID groups, LUNs, or Storage Groups to ever build or allocate. No traditional file system to ever manage. Per IDC, a full time employee could effectively manage considerably more CAS storage than any other type (320TB vs. 25TB for NAS/SAN).

I have to admit, the CAS story was compelling. Thousands of customers signed up and bought hundreds of PB’s of CAS from multiple vendors. The Fortune 150 company I worked for in the past implemented hundreds of TB’s of Centera CAS as part of an archiving strategy. We archived file system, database, and email data to the system using a variety of ISV packages. Given that this market used to be so hot, I’ve often thought about the possible scenarios for it cooling off, and why many people now choose to use a Unified storage platform for archiving rather than a purpose-built CAS system. Here are a few of the thoughts I’ve had so far (comments welcome and appreciated):

  1. CAS wasn’t as simple as claimed. Despite the claims of zero storage management, in reality I think several of the admin tasks that were eliminated by CAS were replaced by new management activities that were required for CAS. Designing archive processes with your internal business customers, evaluating various archiving software packages, configuring those software packages to work with your CAS system, and troubleshooting those software packages can be cumbersome and time-consuming.
  2. Storage management has gotten considerably easier in the last 5 years.   Most vendors have moved from RAID groups to pool’s, LUN/Volume creation is handled via GUI instead of CLI, and the GUI’s have been streamlined and made easy for the IT generalist to use.   Although I would say a CAS appliance can still be easier to manage at scale, the difference is not near as great as it was in 2005.
  3. NetApp created a great story with their one size fits all approach when they built in WORM functionality to their Unified storage platform, which was soon copied by EMC in the Celerra product and enhanced to include compliance.
  4. Many customers didn’t need guaranteed content authenticity that CAS offers, they simply needed basic archiving. Before NetApp and EMC Unified platforms offered this capability, Centera and other CAS platforms were the only choice for a dedicated archive storage box. Once NetApp and then EMC built in archiving into the cost-effective mid-range Unified platform, my opinion is it cut Centera and other CAS systems off at the knees.
  5. CAS systems were not cheap, even if they could have a better TCO than Tier 1 SAN storage. It was primarily larger enterprises that were typically able to afford CAS, while the lower-end of the market quickly gravitated to a Unified box that had archive functionality built in.
  6. Backup windows were not always reduced by archiving. Certainly there were some cases where it could help, but also areas where it did not. As an example, many customers wanted to do file system archiving on file systems with millions and millions of files. When you archive, the data is copied to the archive and a stub is left in the original file system. Using traditional backup, these stubs still need to be backed up, and the backup application sees them as a file. This means even if the stub is only 1KB, it still causes the backup application to slow way down as part of the catalog indexing process. There are some workarounds like doing a volume-based backup, which backs up the file system as an image. However, there are caveats here as well. As an example, if you do file-system de-dupe on an EMC platform in conjunction with archiving, you can no longer do granular file-level recoveries from a volume-based backup. Only a full-destructive restore is allowed.
  7. Many customers didn’t really need to archive for compliance purposes, rather they simply wanted to save money by moving stale data from Tier 1 storage to Tier2/3 storage. This required adding in cost and complexity for a file migration appliance or ISV software package to perform the file movement between tiers, which ate away at the cost savings. Now that many storage arrays have auto-tiering functionality built-in, the system will automatically send less frequently accessed blocks of data to a lower tier of storage, completely transparent to the admin and end-user, with no file stubbing required.

To sum it up, what would I recommend to a customer today? CAS is still a very important storage product and although it’s not a rapidly growing area, it still has a significant install base that will remain for some time. There still are some things that a CAS system can do that the Unified boxes cannot. Guaranteed content authenticity with an object-based storage model is certainly one of those, and probably the most important. If you require as good of a guarantee as you can possibly get that your archive data is safe, CAS is the way to go. As I alluded to before, this still has importance in the healthcare and financial verticals, though I see smaller institutions in those verticals often choose a Unified platform for cost-effectiveness. Outside of those verticals, if your archive storage needs are <100TB, I’m of the opinion that a Unified platform is most likely the way to go, keeping in mind every environment can be unique. There may also be exceptions for applications that offer CAS API integration thru the XAM protocol. If you’re using one of those applications, then it may also make sense to investigate a true CAS platform.

Further reading on CAS:

http://en.wikipedia.org/wiki/Content-addressable_storage

Categories: Archive, Backup, EMC, NAS, NetApp