Object Storage Summit wrap up

Attended ExecEvent’s first Next-Gen Object Storage Summit in Miami this past week.  Learned a lot there and met a lot of the players and movers in this space.  Here is a summary of what happened during the summit.

Janae starting a debate on Object Storage
Janae starting a debate on Object Storage

Spent most of the morning of the first day discussing some parameters of  object storage in general. Janae got up and talked about 4 major adopters for object storage:

  1. Rapid Responders – these customer have data in long term storage and it  just keeps building and needs to be stored in scaleable storage. They believe someday they  will need access to it and have no idea when. But when they want it, they want it fast. Rapid responder adoption  is based on the unpredictability of access. As such, having the data on scaleable disk object storage makes sense.  Some examples include black operations sites with massive surveillance feeds which maybe needed fast sometime after initial analysis and medical archives.
  2. Distributed (content) Enterprises – geographically distributed enterprises with users around the globe that need shared access to data.  Distributed enterprises often have 100 or so users sharing data access dispersed around the globe and want shared access to data.   Object storage can dispurse the data to provide local caching across the world for better data and meta-data latency.  Media and Entertainment are key customers in this space but design shops that follow the sun also have the problem.
  3. Private Cloud(y) – data centers adopt the cloud for a number of reasons but sometimes it’s just mandated.  In these cases, direct control over cloud storage with the economics of major web service providers can be an alluring proposition.  Some object storage solutions roll in with cloud like economics and on premises solutions and responsiveness, the best of all worlds.  Enterprise IT forced to move to the cloud are in this category.
  4. Big Hadoop(ers) – lots of data to analyze but with no understanding of when it will be analyzed.  Some Hadoopers can schedule analytics but most don’t know what they will want until they finish with the last analysis. In these cases, having direct access to all the data on an object store can cut setup time considerably.

There were other aspects of Janae’s session but these seemed of most interest. We spent the debating aspects of object storage rest of the morning getting an overview on Scality customers. At the end of the morning we debating aspects of object storage.  I thought Jean-Luc from Data Direct Networks had the best view of this when he said object storage is at it’s core, data storage that has scalability, resilience, performance and distribution.

The afternoon sessions were deep dives with the sponsors of the Object Summit.

  • Nexsan talked about there Assureon product line (EverTrust acquisition).  SHA1 and MD5 hashes are made of every object then as objects are replicated to other sites, the hashes are both checked to insure the data hasn’t been corrupted and the are  periodically checked (every 90 days) to see if the data is still correct. If it’s corrupted,  other replica’s obtained and re-instated.  In addition, Assureon has some unique immutable access logs that provide an almost “chain of custody” for objects in the system.  Finally, Assureon uses a Microsoft Windows Agent that is Windows Certified and installs without disruption to allow any user (or administrator) to identify files, directories, or file systems to be migrated to the object store.
  • Cleversafe was up next and talked about their market success with their distributed dsNet® object store and provided some proof points. [Full disclosure: I have recently been under contract with Cleversafe]. For instance, today they have under management over 15 billion objects and deployments with over 70PBs in production They have shipped over 170PB of dsNet storage to customers around the world. Cleversafe has many patents covering their information dispersal algorithms and performance optimization.  Some of their sites are in the Federal government installations with a few web intensive clients as well, the most notable being Shutterfly, photo sharing site.  Although dsNet is inherently geographical distributed  all these “sites” could easily be configured over 1 to 3 locations or more for simpler DR-like support.
  • Quantum talked about their Lattus product  built ontop of Amplidata’s technology. Lattus uses 36TB storage nodes, controller nodes to provide erasure coding for geographical data integrity and NAS gateway nodes.  The NAS gateway provides CIFS and NFS to objects. The Latus-C deployment is a forever disk archive for cloud like deployments. This system provides erasure coding for objects in the system which are then dispersed across up to 3 sites (today, with 4 site dispersal under test).  Their roadmap Lattus-M is going to be a managed file system offering that operates in conjunction with their StorNext product with ILMlike policy management. Farther out, on the roadmap is a Lattus-H which offers object repository for Hadoop clusters that can gain rapid access to data for analysis.
  • Scality talked about their success in major multi-tennant environments that need rock-solid reliability and great performance. Their big customers are major web providers that supply email services. Scality is a software product that builds a ring of object storage nodes that supplies the backend storage where the email data is held.  Scality is priced on a per end-user capacity stored. Today the product supports RestFul interfaces, CDMI (think email storage interface), Scality File System (based on FUSE, a POSIX compliant Linux file system). NFS interface is coming early next year.  With the Scality Ring, nodes can go down but the data is still available with rapid response time.  Nodes can be replicated or spread across multiple locations
  • Data Direct Networks (DDN) is coming at the problem from the High Performance Computing market and have an very interesting scaleable solution with extreme performance. DDN products are featured in many academic labs and large web 2.0 environments.  The WOS object storage supports just about any interface you want Java, PHP, Python, RestFULL, NFS/CIFS, S3 and others. They claim very high performance something on the order of 350MB/sec read and 250MB/sec write (I think per node) of object data transfers.  Nodes come in 240TB units and one can have up to 256 nodes in a WOS system.   One customer uses a WOS node to land local sensor streams then ships it to other locations for analysis.
View from the Summit balcony, 2nd day
View from the Summit balcony, 2nd day

The next day was spent with Nexsan and DDN talking about their customer base and some of their success stories. We spent the remainder of the morning talking about the startup world which surrounds some object storage technology and the inhibiters to broader adoption of the technology.

In the end there’s a lot of education needed to jump start this market place. Education about both the customer problems that can be solved with object stores and the product differences that are out there today.  I argued (forcefully) that what’s needed to accelerate adoption was some standard interface protocol that all object storage systems could utilize. Such a standard protocol would enable a more rapid ecosystem build out and ultimately more enterprise adoption.

One key surprise to me was that the problems their customers are seeing is something all IT customers will have some day. Jean-Luc called it the democratization of the HPC problems. Big Data is driving object storage requirements into the enterprise in a big way…

Comments?

Tape vs. Disk, the saga continues

Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)
Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)

Was on a call late last month where Oracle introduced their latest generation T1000C tape system (media and drive) holding 5TB native (uncompressed) capacity. In the last 6 months I have been hearing about the coming of a 3TB SATA disk drive from Hitachi GST and others. And last month, EMC announced a new Data Domain Archiver, a disk only archive appliance (see my post on EMC Data Domain products enter the archive market).

Oracle assures me that tape density is keeping up if not gaining on disk density trends and capacity. But density or capacity are not the only issues causing data to move off of tape in today’s enterprise data centers.

“Dedupe Rulz”

A problem with the data density trends discussion is that it’s one dimensional (well literally it’s 2 dimensional). With data compression, disk or tape systems can easily double the density on a piece of media. But with data deduplication, the multiples start becoming more like 5X to 30X depending on frequency of full backups or duplicated data. And number’s like those dwarf any discussion of density ratios and as such, get’s everyone’s attention.

I can remember talking to an avowed tape enginerr, years ago and he was describing deduplication technology at the VTL level as being architecturally inpure and inefficient. From his perspective it needed to be done much earlier in the data flow. But what they failed to see was the ability of VTL deduplication to be plug-compatible with the tape systems of that time. Such ease of adoption allowed deduplication systems to build a beach-head and economies of scale. From there such systems have no been able to move up stream, into earlier stages of the backup data flow.

Nowadays, what with Avamar, Symantec Pure Disk and others, source level deduplication, or close by source level deduplication is a reality. But all this came about because they were able to offer 30X the density on a piece of backup storage.

Tape’s next step

Tape could easily fight back. All that would be needed is some system in front of a tape library that provided deduplication capabilities not just to the disk media but the tape media as well. This way the 30X density over non-deduplicated storage could follow through all the way to the tape media.

In the past, this made little sense because a deduplicated tape would require potentially multiple volumes in order to restore a particular set of data. However, with today’s 5TB of data on a tape, maybe this doesn’t have to be the case anymore. In addition, by having a deduplication system in front of the tape library, it could support most of the immediate data restore activity while data restored from tape was sort of like pulling something out of an archive and as such, might take longer to perform. In any event, with LTO’s multi-partitioning and the other enterprise class tapes having multiple domains, creating a structure with meta-data partition and a data partition is easier than ever.

“Got Dedupe”

There are plenty of places, that today’s tape vendors can obtain deduplication capabilities. Permabit offers Dedupe code for OEM applications for those that have no dedupe systems today. FalconStor, Sepaton and others offer deduplication systems that can be OEMed. IBM, HP, and Quantum already have tape libraries and their own dedupe systems available today all of which can readily support a deduplicating front-end to their tape libraries, if they don’t already.

Where “Tape Rulz”

There are places where data deduplication doesn’t work very well today, mainly rich media, physics, biopharm and other non-compressible big-data applications. For these situations, tape still has a home but for the rest of the data center world today, deduplication is taking over, if it hasn’t already. The sooner tape get’s on the deduplication bandwagon the better for the IT industry.

—-

Of course there are other problems hurting tape today. I know of at least one large conglomerate that has moved all backup off tape altogether, even data which doesn’t deduplicate well (see my previous Oracle RMAN posts). And at least another rich media conglomerate that is considering the very same move. For now, tape has a safe harbor in big science, but it won’t last long.

Comments?

Repositioning of tape

HP LTO 4 Tape Media
HP LTO 4 Tape Media
In my past life, I worked for a dominant tape vendor. Over the years, we had heard a number of times that tape was dead. But it never happened. BTW, it’s also not happening today.

Just a couple of weeks ago, I was at SNW and vendor friend of mine asked if I knew anyone with tape library expertise because they were bidding on more and more tape archive opportunities. Tape seems alive and kicking for what I can see.

However, the fact is that tape use is being repositioned. Tape is no longer the direct target for backups that it once was. Most backup packages nowadays backup to disk and then later, if at all, migrate this data to tape (D2D2T). Tape is being relegated to a third tier of storage, a long-term archive and/or a long term backup repository.

The economics of tape are not hard to understand. You pay for robotics, media and drives. Tape, just like any removable media requires no additional power once it’s removed from the transport/drive used to write it. Removable media can be transported to an offsite repository or accross the continent. There it can await recall with nary an ounce (volt) of power consumed.

Problems with tape

So what’s wrong with tape, why aren’t more shops using it. Let me count the problems

  1. Tape, without robotics, requires manual intervention
  2. Tape, because of its transportability, can be lost or stolen, leading to data security breaches
  3. Tape processing, in general, is more error prone than disk. Tape can have media and drive errors which cause data transfer operations to fail
  4. Tape is accessed sequentially, it cannot be randomly accessed (quickly) and only one stream of data can be accepted per drive
  5. Much of a tape volume is wasted, never written space
  6. Tape technology doesn’t stay around forever, eventually causing data obsolescence
  7. Tape media doesn’t last forever, causing media loss and potentially data loss

Likely some other issues with tape missed here, but these seem the major ones from my perspective.

It’s no surprise that most of these problems are addressed or mitigated in one form or another by the major tape vendors, software suppliers and others interested in continuing tape technology.

Robotics can answer the manual intervention, if you can afford it. Tape encryption deals effectively with stolen tapes, but requires key management somewhere. Many applications exist today to help predict when media will go bad or transports need servicing. Tape data, is and always will be, accessed sequentially, but then so is lot’s of other data in today’s IT shops. Tape transports are most definitely single threaded but sophisticated applications can intersperse multiple streams of data onto that single tape. Tape volume stacking is old technology, not necessarily easy to deploy outside of some sort of VTL front-end, but is available. Drive and media technology obsolescence will never go away, but this indicates a healthy tape market place.

Future of tape

Say what you will about Ultrium or the Linear Tape-Open (LTO) technology, made up of HP, IBM, and Quantum research partners, but it has solidified/consolidated the mid-range tape technology. Is it as advanced as it could be, or pushing to open new markets – probably not. But they are advancing tape technology providing higher capacity, higher performance and more functionality over recent generations. And they have not stopped, Ultrium’s roadmap shows LTO-6 right after LTO-5 and delivery of LTO-5 at 1.6TB uncompressed capacity tape, is right around the corner.

Also IBM and Sun continue to advance their own proprietary tape technology. Yes, some groups have moved away from their own tape formats but that’s alright and reflects the repositioning that’s happening in the tape marketplace.

As for the future, I was at an IEEE magnetics meeting a couple of years back and the leader said that tape technology was always a decade behind disk technology. So the disk recording heads/media in use today will likely see some application to tape technology in about 10 years. As such, as long as disk technology advances, tape will come out with similar capabilities sometime later.

Still, it’s somewhat surprising that tape is able to provide so much volumetric density with decade old disk technology, but that’s the way tape works. Packing a ribbon of media around a hub, can provide a lot more volumetric storage density than a platter of media using similar recording technology.

In the end, tape has a future to exploit if vendors continue to push its technology. As a long term archive storage, it’s hard to beat its economics. As a backup target it may be less viable. Nonetheless, it still has a significant install base which turns over very slowly, given the sunk costs in media, drives and robotics.

Full disclosure: I have no active contracts with LTO or any of the other tape groups mentioned in this post.

Quantum OEMs esXpress VM Backup SW

Quantum announced today that they are OEMing esXpress software (from PHD Virtual) to better support VMware VM backups (see press release) . This software schedules VMware snapshots of VMs and can then transfer the VM snapshot (backup) data directly to a Quantum DXI storage device.

One free “Professional” esXpress license will ship with each DXI appliance which allows for up to 4-esXpress virtual backup appliance (VBA) virtual machines to run in a single VMware physical server. An “Enterprise” license can be purchased for $1850 which allows for up to 16-esXpress VBA virtual machines to run on a single VMware physical server. More Professional licenses can be purchased for $950 each. The free Professional license also comes with free installation services from Quantum.

Additional esXpress VBAs can be used to support more backup data throughput from a single physical server. The VBA backup activity is a scheduled process and as such, when completed the VBA can be “powered” down to save VMware server resources. Also as VBAs are just VMs they fully support VMware Vmotion, DRS, and HA capabilities that are available from VMware. However using any of these facilities to move a VBA to another physical server may require additional licensing.

The esXpress software eliminates the need for a separate VCB (VMware Consolidated Backup) proxy server and provides a direct interface to support Quantum DXI deduplicated storage for VM backups. This should simplify backup processing for VMware VMs using DXI archive storage.

Quantum also announced today a new key manager, the Scalar Key Manager for Quantum LTO tape encryption which has an integrated GUI with Quantum’s tape automation products. This allows a tape automation manager a single user interface to support tape automation and tape security/encryption. A single point of management should simplify the use of Quantum LTO tape encryption.

Data Domain bidding war

It’s unclear to me what EMC would want with Data Domain (DD) other than to lockup deduplication technology across the enterprise. EMC has Avamar for Source dedupe, has DL for target dedupe, has Celerra Dedupe and the only one’s missing are V-Max, Symm & Clariion dedupe.

My guess is that EMC sees Data Domain’s market share as the primary target. It doesn’t take a lot of imagination to figure that once Data Domain is a part of EMC, EMC’s Disk Library (DL) offerings will move over to DD technology. Which probably leaves FalconStor/Quantum technology used in DL today as outsiders.

EMC’s $100M loan to Quantum last month probably was just insurance to keep a business partner afloat until something better came along or they could make it on their own. The DD deal would leave Quantum parntership supporting EMC with just Quantum’s tape offerings.

Quantum deduplication technology doesn’t have nearly the market share that DD has in the enterprise but they have won a number of OEM deals not the least of which is EMC and they were looking to expand. But if EMC buys DD, this OEM agreement will end soon.

I wonder if DD is worth $1.8B in cash what could Sepaton be worth. They seem to be the only pure play dedupe appliance left standing out there.

Not sure whether NetApp will up their bid but they always seem to enjoy competing with EMC. Also unclear how much of this bid is EMC wanting DD or EMC just wanting to hurt NetApp, either way DD stockholders win out in the end.