Hitachi’s VSP vs. VMAX

Today’s announcement of Hitachi’s VSP brings another round to the competition between EMC and Hitachi/HDS in the enterprise. VSP’s recent introduction which is GA and orderable today, takes the rivalry to a whole new level.

I was on SiliconANGLEs live TV feed earlier today discussing the merits of the two architectures with David Floyer and Dave Vellante from Wikibon. In essence, there seems to be a religious war going on between the two.

Examining VMAX, it’s obviously built around a concept of standalone nodes which all have cache, frontend, backend and processing components built in. Scaling the VMAX, aside from storage and perhaps cache, involves adding more VMAX nodes to the system. VMAX nodes talk to one another via an external switching fabric (RapidIO currently). The hardware although sophisticated packaging, IO connection technology and other internal stuff looks very much like a 2U server one could purchase from any number of vendors.

On the other hand, Hitachi’s VSP is a special built storage engine (or storage computer as Hu Yoshida says). While the architecture is not a radical revision of USP-V, it’s a major upleveling of all component technology from the 5th generation cross bar switch, the new ASIC driven Front-end and Back-end directors, the shared control L2 cache memory and the use of quad core Xenon Intel processors. Much of this hardware is unique, sophistication abounds and looks very much like a blade system for the storage controller community.

The VSP and VMAX comparison is sort of like a open source vs. closed source discussion. VMAX plays the role of open source champion that largely depends on commodity hardware, sophisticated packaging but with minimal ASICs technology. As evidence of the commodity hardware VPLEX EMC’s storage virtualization engine reportedly runs on VMAX hardware. Commodity hardware lets EMC ride the technology curve as it advances for other applications.

Hitachi VSP plays the role of closed source champion. Its functionality is locked inside proprietary hardware architecture, ASICS and interfaces. The functionality it provides is tightly coupled with their internal architecture and Hitachi probably believes that by doing so they can provide better performance and more tightly integrated functionality to the enterprise.

Perhaps this doesn’t do justice to either development team. There is plenty of unique proprietary hardware and sophisticated packaging in VMAX but they have taken the approach of separate but equal nodes. Whereas Hitachi has distributed this functionality out to various components like Front-end directors (FEDs), backend directors (BEDs), cache adaptors (CAs) and virtual storage directors (VSDs), each of which can scale independently, i.e., doesn’t require more BEDs to add FEDs or CAs. Ditto for VSDs. Each can be scaled separately up to the maximum that can fit inside a controller chasis and then if needed, you can add a whole another controller chasis.

One has an internal switching infrastructure (the VSP cross bar switch) and the other uses external switching infrastructure (the VMAX RapidIO). The promise of external switching like commodity hardware, is that you can share the R&D funding to enhance this technology with other users. But the disadvantage is that architecturally you may have more latency to propagate an IO to other nodes for handling.

With VSP’s cross bar switch, you may still need to move IO activity between VSDs but this can be done much faster and any VSD can access any CA, BED, FED resource required to perform the IO so the need to move IO is reduced considerably. Thus, providing a global pool of resources that any IO can take advantage of.

In the end, blade systems like VSP or separate server systems like VMAX, can all work their magic. Both systems have their place today and in the foreseeable future. Where blades servers shine is in dense packaging, high power cooling efficiency and bringing a lot of horse power to a small package. On the other hand, server systems are simple to deploy and connect together with minimal limitations on the number of servers that can be brought together.

In a small space blade systems probably can bring more compute (storage IO) power to bear within the same volume than multiple server systems but the hardware is much more proprietary and costs lots of R&D $s to maintain leading edge capabilities.

Typed this out after the show, hopefully I characterized the two products properly. If I am missing anything please let me know.

[Edited for readability, grammar and numerous misspellings – last time I do this on an iPhone. Thanks to Jay Livens (@SEPATONJay) and others for catching my errors.]

Data storage features for virtual desktop infrastructure (VDI) deployments

The Planet Data Center by The Planet (cc) (from Flickr)
The Planet Data Center by The Planet (cc) (from Flickr)

Was talking with someone yesterday about one of my favorite topics, data storage for virtual desktop infrastructure (VDI) deployments.  In my mind there are a few advanced storage features that help considerably with VDI implemetations:

  • Deduplication – almost every one of your virtual desktops will share 75-90% of their O/S disk data with every other virtual desktop.  Having sub-file/sub-block deduplication can be a godsend for all this replicated data and reduce O/S storage requirements considerably.
  • 0 storage snapshots/clones – another solution to the duplication of O/S data is to use some sort of space conserving snapshots.  For example, one creates a master (gold) disk image and makes 100s if not 1000s of snapshots of it, taking almost no additional space.
  • Highly available/highly reliable storage – when you have a lone desktop dependent on DAS for it’s O/S, it doesn’t impact a lot of users if that device fails. However, when you have 100s to 1000s of users dependent on DAS device(s) for their O/S software, any DAS failure could impact all of them at the same time.  As such, one needs to move off DAS and invest in highly reliable and available external storage of some kind to sustain reasonable uptime for your user community.

Those seem to me to be the most important attributes for VDI storage but there are a couple more features/facilities which can also:

  • NAS systems with NFS – VDI deployments will generate lots of VMDKs for all the user desktop C: drives.  Although this can be managed with block level storage as separate LUNs or multi-VMDK LUNs, who want’s to configure a 100 to 1000 LUNs.  NFS files can perform just as well and are much easier to create on the fly and thus, for VDI it’s hard to beat NFS storage.
  • Boot storm enhancements – Another problem with VDI is that everyone gets to work 8am Monday and proceeds to boot up their (virtual) machines, which drives an awful lot of IO to their virtual C: drives.  Deduplication and 0 storage snapshots can help manage the boot storm as long as these characteristics are retained throughout system cache, i.e, deduplication exists in cache as well as on backend disk.  But there are other approaches to the problem as well, available from various vendors to better manage boot storms.
  • Anti-Virus scan enhancements – Similar to boot storms, A-V scans also typically happen around the same time for many desktop users and can be just as bad for virtual C: drive performance.  Again, deduplication or 0 storage snapshots can help (with above caveats) but some vendor storage can offload these activities from the desktop alltogether.  Also last weeks VMworld release of VMware’s vShield Edge (see VMworld 2010 review) also supports some A-V scan enhancements. Any of these approaches should be able to help.

Regular “dumb” block storage will always work but it will require a lot more raw storage, performance will suffer just when everybody gets back to work, and the administrative burden will be much higher.

I may seem biased but enterprise class reliability&availability with some of the advanced storage features described above can help make your deployment of VDI that much better for you and all your knowledge workers.

Anything I missed?

Enterprise data storage defined and why 3PAR?

More SNW hall servers and storage
More SNW hall servers and storage

Recent press reports about a bidding war for 3PAR bring into focus the expanding need for enterprise class data storage subsystems.  What exactly is enterprise storage?

Defining enterprise storage is frought with problems but I will take a shot.  Enterprise class data storage has:

  • Enhanced reliability, high availability and serviceability – meaning it hardly ever fails, it keeps operating (on redundant components) when it does fail, and repairing the storage when the rare failure occurs can be accomplished without disrupting ongoing storage services
  • Extreme data integrity – goes beyond just RAID storage, meaning that these systems lose data very infrequently, provide the latest data written to a location when read and will tell you when data cannot be accessed.
  • Automated I/O performance – meaning sophisticated caching algorithms that try to keep ahead of sequential I/O streams, buffer actively read data, and buffer write data in non-volatile cache before destaging to disk or other media.
  • Multiple types of storage – meaning the system supports SATA, SAS and/or FC disk drives and SSDs or Flash storage
  • PBs of storage – meaning behind one enterprise class storage (sub-)system one can support over 1PB of storage
  • Sophisticated functionality – meaning the system supports multiple forms of offsite replication, thin provisioning, storage tiering, point-in-time copies, data cloning, administration GUIs/CLIs, etc.
  • Compatibility with all enterprise O/Ss – meaning the storage has been tested and is on hardware compatibility lists for every major operating system in use by the enterprise today.

As for storage protocol, it seems best to leave this off the list.  I wanted to just add block storage, but enterprises today probably have as much if not more external file storage (CIFS or NFS) as they have block storage (FC or iSCSI).  And the proportion in file systems seems to be growing (see IDC report referenced below).

In addition, while I don’t like the non-determinism of iSCSI or file access protocols, this doesn’t seem to stop such storage from putting up pretty impressive performance numbers (see our performance dispatches).  Anything that can crack 100K I/O or file operations per second probably deserves to call themselves enterprise storage as long as they meet the other requirements.  So, maybe I should add high-performance storage to the list above.

Why the sudden interest in enterprise storage?

Enterprise storage has been around arguably since the 2nd half of last century (for mainframe systems) but lately has become even more interesting as applications deploy to the cloud and server virtualization (from VMware, Microsoft Hyper-V and others) takes over the data center.

Cloud storage and cloud computing services are lowering the entry points for storage and processing, enabling application deployments which were heretofore unaffordable.  These new cloud applications consume storage at increasing rates and don’t seem to be slowing down any time soon.  Arguably, some cloud storage is not enterprise storage but as service levels go up for these applications, providers must ultimately turn to enterprise storage.

In addition, server virtualization transforms the enterprise data center from a single application per server to easily 5 or more applications per physical server.  This trend is raising server utilization, driving more I/O, and requiring higher capacity.  Such “multi-application” storage almost always requires high availability, reliability and performance to work well, generating even more demand for enterprise data storage systems.

Despite all the demand, world wide external storage revenues dropped 12% last year according to IDC.  Now the economy had a lot to do with this decline but another factor reducing external storage revenue is the ongoing drop in the price of storage on a $/GB basis.  To this point, that same IDC report stated that external storage capacity increased 33% last year.

Why Dell & HP wants 3PAR storage?

Margins on enterprise storage are good, some would say very good.  While raw disk storage can be had at under $0.50/GB, enterprise class storage is often 10 or more times that price.  Now that has to cover redundant hardware, software/firmware engineering and other characteristics, but this still leaves pretty good margins.

In my mind, Dell would see enterprise storage as a natural extension of their current enterprise server business.  They already sell and support these customers, including enterprise class storage just adds another product to the mix.  Developing enterprise storage from scratch is probably a 4-7 year journey with the right people, buying 3PAR puts them in the market today with a competitive product.

HP is already in the enterprise storage market today, with their XP and EVA storage subsystems.  However, having their own 3PAR enterprise class storage may get them better margins than their current XP storage OEMed from HDS.  But I think Chuck Hollis’s post on HP’s counter bid for 3PAR may have revealed another side to this discussion – sometime M&A is as much about constraining your competition as it is about adding new capabilities to a company.

——

What do you think?

Primary storage compression can work

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)
Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

Since IBM’s announced their intent to purchase StorWize there has been much discussion on whether primary storage data compression can be made to work.  As far as I know StorWize only offered primary storage compression for file data but there is nothing that prohibits doing something similar for block storage as long as you have some control over how blocks are laid down on disk.

Although secondary block  data compression has been around for years in enterprise tape and more recently with some deduplication appliances, primary storage compression pre-dates secondary storage compression.  STK delivered primary storage data compression with Iceberg in the early 90’s but it wasn’t until a couple of years later that they introduced compression on tape.

In both primary and secondary storage, data compression works to reduce the space needed to store data.  Of course, not all data compresses well, most notably image data (as it’s already compressed) but compression ratios of 2:1 were common for primary storage of that time and are normal for today’s secondary storage.  I see no reason why such ratios couldn’t be achieved for current primary storage block data.

Implementing primary block storage data compression

There is significant interest in implementing deduplication for primary storage as NetApp has done but supporting data compression is not much harder.  I believe much of the effort to deduplicate primary storage lies in creating a method to address partial blocks out of order, which I would call data block virtual addressing which requires some sort of storage pool.  The remaining effort to deduplicate data involves implementing the chosen (dedupe) algorithm, indexing/hashing, and other administrative activities.  These later activities aren’t readily transferable to data compression but the virtual addressing and space pooling should be usable by data compression.

Furthermore, block storage thin provisioning requires some sort of virtual addressing as does automated storage tiering.  So in my view, once you have implemented some of these advanced capabilities, implementing data compression is not that big a deal.

The one question that remains is does one implement compression with hardware or software (see Better storage through hardware for more). Considering that most deduplication is done via software today it seems that data compression in software should be doable.  The compression phase could run in the background sometime after the data has been stored.  Real time decompression using software might take some work, but would cost considerably less than any hardware solution.  Although the intensive bit fiddling required to perform data compression/decompression may argue for some sort of hardware assist.

Data compression complements deduplication

The problem with deduplication is that it needs duplicate data.  This is why it works so well for secondary storage (backing up the same data over and over) and for VDI/VMware primary storage (with duplicated O/S data).

But data compression is an orthogonal or complementary technique which uses the inherent redundancy in information to reduce storage requirements.  For instance, something like LZ compression takes advantage of the fact that in text some letters occur more often than others (see letter frequency). For instance, in English, ‘e’, ‘t’, ‘a’, ‘o’, ‘i’, and ‘n, represent over 50% of the characters in most text documents.  By using shorter bit combinations to encode these letters one can reduce the bit-length of any (English) text string substantially.  Another example is run length encoding which takes any repeated character and substitutes a trigger character, the character itself, and a count of the number of repetitions for the repeated string.

Moreover, the nice thing about data compression is that all these techniques can be readily combined to generate even better compression rates.  And of course compression could be applied after deduplication to reduce storage footprint even more.

Why would any vendor compress data?

For a couple of reasons:

  • Compression not only reduces storage footprint but with hardware assist it can also increase storage throughput. For example, if 10GB of data compresses down to 5GB, it should take ~1/2 the time to read.
  • Compression reduces the time it would take time to clone, mirror or replicate.
  • Compression increases the amount of data that could be stored which should incentivise them to pay more for your storage.

In contrast, with data compression vendors might may sell less storage.  But the advantages of enterprise storage is in the advanced functionality/features and higher reliability/availability/performance that are available.  I see data compression as just another advantages to enterprise class storage and as a feature, the user could enable or disable it and see how well it works for there data.

What do you think?

FAST, Cache & Boost – Day2@EMCWorld 2010

View from the front @EMCWorld 2010
Between sessions, view from the front @EMCWorld 2010

Well EMCWorld 2010’s a wrap for me.  Some interesting Day 2 announcements included:

  • Advanced FAST and FAST cache for CLARiiON were announced which pertain to how SSD storage is used in the subsystem. Advanced FAST provides for sub-lun storage tiering between SATA disk, performance disk, and SSD storage.  Such automated data movement between tiers is managed by policy automation.  FAST cache converts SSD storage into a cache expanded memory for the storage subsystem.  It is not quite a NetApp PAM equivalent because of it’s location at the end of internal FC link, but it’s close.
  • Unisphere for CLARiiON and Celerra was announced.  This is a combined management administration interface that replaces CLARiiON Navisphere and Celerra Manager and supplies a unified view (single pain of glass) to administer these two products.
  • Boost for Data Domain was announced.  Last month EMC’s Data Domain rolled out their Global Deduplication Appliance (GDA) which depends on Symantec’s OST API to support a shared or split deduplication process. This capability now supports all other Data Domain appliances and increases the ingestion rate for these systems. Essentially, EMC’s Boost moves some deduplication processing over to Symantec’s Media Server and by doing so reduces the media server processing and bandwidth requirements (as only unique data need be shipped to the DD appliance) and also increases ingest speed by sharing the deduplication load and reducing input data.

We discussed GDA and it’s OST split deduplication processing at length in an EMC announcement summary in last month’s newsletter   We will be also place this on our our website later this month if you’re interested in more information.

There was plenty of other announcements while I was there and I heard the technical sessions were great too.  We already discussed Day 1 activities in our post on VPLEX surfacing.  But I had to leave so I missed all of Wednesday and Thursday.

This year EMC showed many video vignettes on-stage to illustrate IT problems and some of their solutions.  In past EMCWorld’s this was done mostly by customers talking about technology problems and there was some of this as well.  However the addition of the video segments seemed to help get their point across.  How well this succeeded is anyone’s guess but I would say most of it was very entertaining.

Social media seemed even more present this year than last.  Twitter was very active and blogging was too. I read one post that summarized Mark Lewis’s keynote session that was just finished – not quite real time but close.  Video was being taken on the exhibit floor, live at the blogger’s lounge via “the Cube” and other places as well, most of which is probably still being edited for release   I found the bloggers lounge crowded but serviceable.  WiFi on the floor could have been better but this was a nit.

Overall the show was put on very well and I look forward to EMCWorld 2011 in Las Vegas next year.  See you there.

VPLEX surfaces at EMCWorld

Pat Gelsinger introducting VPLEXes on stage at EMCWorld
Pat Gelsinger introducting VPLEXes on stage at EMCWorld

At EMCWorld today Pat Gelsinger  had a pair of VPLEXes flanking him on stage and actively moving VMs from “Boston” to “Hopkinton” data centers.  They showed a demo of moving a bunch of VMs from one to the other with all of them actively performing transaction processing.  I have written about EMC’s vision in a prior blog post called Caching DaaD for Federated Data Centers.

I talked to an vSpecialist at the Blogging lounge afterwards and asked him where the data actually resided for the VMs that were moved.  He said the data was synchronously replicated and actively being updated  at both locations. They proceeded to long-distance teleport (Vmotion) 500 VMs from Boston to Hopkinton.  After that completed, Chad Sakac powered down the ‘Boston’ VPLEX and everything in ‘Hopkinton’ continued to operate.  All this was done on stage so Boston and Hopkinton data centers were possibly both located in the  convention center but interesting nonetheless.

I asked the vSpecialist how they moved the IP address between the sites and he said they shared the same IP domain.  I am no networking expert but I felt that moving the network addresses seemed the last problem to solve for long distance Vmotion.  But, he said Cisco had solved this with their OTV (Open Transport Virtualization) for  Nexus 7000 which could move IP addresses from one data center to another.

1 Engine VPLEX back view
1 Engine VPLEX back view

Later at the Expo, I talked with a Cisco rep who said they do this by encapsulating Layer 2 protocol messages into a Layer 3 packet. Once encapsulated it can be routed over anyone’s gear to the other site and as long as there was another Nexus 7K switch at the other site within the proper IP domain shared with the server targets for Vmotion then all works fine.  Didn’t ask what happens if the primary Nexus 7K switch/site goes down but my guess is that the IP address movement would cease to work. But for active VM migration between two operational data centers it all seems to hang together.  I asked Cisco if OTV was a formal standard TCP/IP protocol extension and he said he didn’t know.  Which probably means that other switch vendors won’t support OTV.

4 Engine VPLEX back view
4 Engine VPLEX back view

There was a lot of other stuff at EMCWorld today and at the Expo.

  • EMC’s Content Management & Archiving group was renamed Information Intelligence.
  • EMC’s Backup Recovery Systems group was in force on the Expo floor with a big pavilion with Avamar, Networker and Data Domain present.
  • EMC keynotes were mostly about the journey to the private cloud.  VPLEX seemed to be crucial to this journey as EMC sees it.
  • EMCWorld’s show floor was impressive. Lots of  major partners were there RSA, VMware, IOmega, Atmos, VCE, Cisco, Microsoft, Brocade, Dell, CSC, STEC, Forsythe, Qlogic, Emulex and many others.  Talked at length with Microsoft about SharePoint 2010. Still trying to figure that one out.
One table at bloggers lounge StorageNerve & BasRaayman hard at work
One table at bloggers lounge StorageNerve & BasRaayman in the foreground hard at work

I would say the bloggers lounge was pretty busy for most of the day.  Met a lot of bloggers there including StorageNerve (Devang Panchigar), BasRaaymon (Bas Raaymon), Kiwi_Si (Simon Seagrave), DeepStorage (Howard Marks), Wikibon (Dave Valente), and a whole bunch of others.

Well not sure what EMC has in store for day 2, but from my perspective it will be hard to beat day 1.

Full disclosure, I have written a white paper discussing VPLEX for EMC and work with EMC on a number of other projects as well.

Describing Dedupe

Hard Disk 4 by Alpha six (cc) (from flickr)
Hard Disk 4 by Alpha six (cc) (from flickr)

Deduplication is a mechanism to reduce the amount of data stored on disk for backup, archive or even primary storage.  For any storage, data is often duplicated and any system that eliminates storing duplicate data will be more utilize storage more efficiently.

Essentially, deduplication systems identify duplicate data and only store one copy of such data.  It uses pointers to incorporate the duplicate data at the right point in the data stream. Such services can be provided at the source, at the target, or even at the storage subsystem/NAS system level.

The easiest way to understand deduplication is to view a data stream as a book and as such, it can consist of two parts a table of contents and actual chapters of text (or data).  The stream’s table of contents provides chapter titles but more importantly (to us), identifies a page number for the chapter.  A deduplicated data stream looks like a book where chapters can be duplicated within the same book or even across books, and the table of contents can point to any book’s chapter when duplicated. A deduplication service inputs the data stream, searches for duplicate chapters and deletes them, and updates the table of contents accordingly.

There’s more to this of course.  For example, chapters or duplicate data segments must be tagged with how often they are duplicated  so that such data is not lost when modified.  Also, one way to determine if data is duplicated is to take one or more hashes and compare this to other data hashes, but to work quickly, data hashes must be kept in a searchable index.

Types of deduplication

  • Source deduplication involves a repository, a client application, and an operation which copies client data to the repository.  Client software chunks the data, hashes the data chunks, and sends these hashes over to the repository.  On the receiving end, the repository determines which hashes are duplicates and then tells the client to send only the unique data.  The repository stores the unique data chunks and the data stream’s table of contents.
  • Target deduplication involves performing deduplication inline, in-parallel, or post-processing by chunking the data stream as it’s recieved, hashing the chunks, determining which chunks are unique, and storing only the unique data.  Inline refers to doing such processing while receiving data at the target system, before the data is stored on disk.  In-parallel refers to doing a portion of this processing while receiving data, i.e., portions of the data stream will be deduplicated while other portions are being received.  Post-processing refers to data that is completely staged to disk before being deduplicated later.
  • Storage subsystem/NAS system deduplication looks a lot like post-processing, target deduplication.  For NAS systems, deduplicaiot looks at a file of data after it is closed. For general storage subsystems the process looks at blocks of data after they are written.  Whether either system detects duplicate data below these levels is implementation dependent.

Deduplication overhead

Deduplication processes generate most overhead while deduplicating the data stream, essentially during or after the data is written, which is the reason that target deduplication has so many options, some optimize ingestion while others optimize storage use. There is very little additonal overhead for re-constituting (or un-deduplicating) the data for read back as retrieving the unique and/or duplicated data segments can be done quickly.  There may be some minor performance loss because of lack of  sequentiality but that only impacts data throughput and not that much.

Where dedupe makes sense

Deduplication was first implemented for backup data streams.  Because any backup that takes full backups on a monthly or even weekly basis will duplicate lot’s of data.  For example, if one takes a full backup of 100TBs every week and lets say new unique data created each week is ~15%, then at week 0, 100TB of data is stored both for the deduplicated and undeduplicated data versions; at week 1 it takes 115TB to store the deduplicated data but 200TB for the non-deduplicated data; at week 2 it takes ~132TB to store deduplicated data but 300TB for the non-deduplicated data, etc.  As each full backup completes it takes another 100TB of un-deduplicated storage but significantly less deduplicated storage.  After 8 full backups the un-deduplicated storage would require 8ooTB but only ~265TB for deduplicated storage.

Deduplication can also work for secondary or even primary storage.  Most IT shops with 1000’s of users, duplicate lot’s of data.  For example, interim files are sent from one employee to another for review, reports are sent out en-mass to teams, emails are blasted to all employees, etc.  Consequently, any storage (sub)system that can deduplicate data would more efficiently utilize backend storage.

Full disclosure, I have worked for many deduplication vendors in the past.

Cleversafe’s new hardware

Cleversafe new dsNet(tm) Rack (from Cleversafe.com)
Cleversafe new dsNet(tm) Rack (from Cleversafe.com)

Yesterday, Cleversafe announced new Slicestor(r) 2100 and 2200 hardware using 2TB SATA drives. The standard 2100 1U package supports 8TB of raw data and the 2200 new 2U package supports 24TB of data. In addition, a new Accesser(r) 2100 supports 8GB of ECC RAM, and 2 GigE or 10GbE ports for data access.

In addition to the new server hardware, Cleversafe also announced an integrated rack with up to 18 Slicestor 2200s, 2 Accessors 2100s, 1 Omnience (management node), 48-port ethernet switch, and PDUs. This new rack configuration comes pre-cabled and can easily be installed to support an immediate 432TB raw capacity. It’s expected that customers with multiple sites could order 1 or more racks to support a quick installation of Cleversafe storage services.

Cleversafe currently offers iSCSI block services, direct object storage interface and file services interfaces (over iSCSI).  They are finding some success in the media and entertainment space as well as federal and state government data centers.

The federal and state government agencies seem especially interested in Cleversafe for its data security capabilities.  They offer cloud data security via their SecureSlice(tm) technology which encrypts data slices and uses key masking to obscure the key.  With SecureSlice, the only way to decrypt the data is to have enough slices to reconstitute the data.

Also the new Accesser and Slicestor server hardware now uses a drive on motherboard flash unit to hold operating system/Cleversafe software. This allows data drives to only hold customer data and reduces Accesser power requirements while also improving both Slicestor and Accesser reliability.

In a previous post we discussed EMC’s Atmos’s GeoProtect capabilities and although they are not quite at the sophistication of Cleversafe, EMC does offer a sort of data dispersion across sites/racks.  However, it appears that GeoProtect is currently limited to two distinct configurations.  In contrast, Cleversafe allows the user to select the number of Slicestor’s to store data and the threshold required to reconstitute the data.  Doing this allows the user to almost dial up or down the availability and reliability they want for their data.

Cleversafe performs well enough to saturate a single Accesser GigE iSCSI link.  Accessers maintain a sort of preferred routing table which indicates which Slicestors currently have the best performance. By accessing the quickest Slicestors first to reconstitute data, performance can be optimized.  Specifically, for the typical multi-site Cleversafe implementation, knowing current Slicestor to Accesser performance can improve data reconstitution performance considerably.

Full disclosure, I have done work for Cleversafe in the past.