Object Storage Summit wrap up

Attended ExecEvent’s first Next-Gen Object Storage Summit in Miami this past week.  Learned a lot there and met a lot of the players and movers in this space.  Here is a summary of what happened during the summit.

Janae starting a debate on Object Storage
Janae starting a debate on Object Storage

Spent most of the morning of the first day discussing some parameters of  object storage in general. Janae got up and talked about 4 major adopters for object storage:

  1. Rapid Responders – these customer have data in long term storage and it  just keeps building and needs to be stored in scaleable storage. They believe someday they  will need access to it and have no idea when. But when they want it, they want it fast. Rapid responder adoption  is based on the unpredictability of access. As such, having the data on scaleable disk object storage makes sense.  Some examples include black operations sites with massive surveillance feeds which maybe needed fast sometime after initial analysis and medical archives.
  2. Distributed (content) Enterprises – geographically distributed enterprises with users around the globe that need shared access to data.  Distributed enterprises often have 100 or so users sharing data access dispersed around the globe and want shared access to data.   Object storage can dispurse the data to provide local caching across the world for better data and meta-data latency.  Media and Entertainment are key customers in this space but design shops that follow the sun also have the problem.
  3. Private Cloud(y) – data centers adopt the cloud for a number of reasons but sometimes it’s just mandated.  In these cases, direct control over cloud storage with the economics of major web service providers can be an alluring proposition.  Some object storage solutions roll in with cloud like economics and on premises solutions and responsiveness, the best of all worlds.  Enterprise IT forced to move to the cloud are in this category.
  4. Big Hadoop(ers) – lots of data to analyze but with no understanding of when it will be analyzed.  Some Hadoopers can schedule analytics but most don’t know what they will want until they finish with the last analysis. In these cases, having direct access to all the data on an object store can cut setup time considerably.

There were other aspects of Janae’s session but these seemed of most interest. We spent the debating aspects of object storage rest of the morning getting an overview on Scality customers. At the end of the morning we debating aspects of object storage.  I thought Jean-Luc from Data Direct Networks had the best view of this when he said object storage is at it’s core, data storage that has scalability, resilience, performance and distribution.

The afternoon sessions were deep dives with the sponsors of the Object Summit.

  • Nexsan talked about there Assureon product line (EverTrust acquisition).  SHA1 and MD5 hashes are made of every object then as objects are replicated to other sites, the hashes are both checked to insure the data hasn’t been corrupted and the are  periodically checked (every 90 days) to see if the data is still correct. If it’s corrupted,  other replica’s obtained and re-instated.  In addition, Assureon has some unique immutable access logs that provide an almost “chain of custody” for objects in the system.  Finally, Assureon uses a Microsoft Windows Agent that is Windows Certified and installs without disruption to allow any user (or administrator) to identify files, directories, or file systems to be migrated to the object store.
  • Cleversafe was up next and talked about their market success with their distributed dsNet® object store and provided some proof points. [Full disclosure: I have recently been under contract with Cleversafe]. For instance, today they have under management over 15 billion objects and deployments with over 70PBs in production They have shipped over 170PB of dsNet storage to customers around the world. Cleversafe has many patents covering their information dispersal algorithms and performance optimization.  Some of their sites are in the Federal government installations with a few web intensive clients as well, the most notable being Shutterfly, photo sharing site.  Although dsNet is inherently geographical distributed  all these “sites” could easily be configured over 1 to 3 locations or more for simpler DR-like support.
  • Quantum talked about their Lattus product  built ontop of Amplidata’s technology. Lattus uses 36TB storage nodes, controller nodes to provide erasure coding for geographical data integrity and NAS gateway nodes.  The NAS gateway provides CIFS and NFS to objects. The Latus-C deployment is a forever disk archive for cloud like deployments. This system provides erasure coding for objects in the system which are then dispersed across up to 3 sites (today, with 4 site dispersal under test).  Their roadmap Lattus-M is going to be a managed file system offering that operates in conjunction with their StorNext product with ILMlike policy management. Farther out, on the roadmap is a Lattus-H which offers object repository for Hadoop clusters that can gain rapid access to data for analysis.
  • Scality talked about their success in major multi-tennant environments that need rock-solid reliability and great performance. Their big customers are major web providers that supply email services. Scality is a software product that builds a ring of object storage nodes that supplies the backend storage where the email data is held.  Scality is priced on a per end-user capacity stored. Today the product supports RestFul interfaces, CDMI (think email storage interface), Scality File System (based on FUSE, a POSIX compliant Linux file system). NFS interface is coming early next year.  With the Scality Ring, nodes can go down but the data is still available with rapid response time.  Nodes can be replicated or spread across multiple locations
  • Data Direct Networks (DDN) is coming at the problem from the High Performance Computing market and have an very interesting scaleable solution with extreme performance. DDN products are featured in many academic labs and large web 2.0 environments.  The WOS object storage supports just about any interface you want Java, PHP, Python, RestFULL, NFS/CIFS, S3 and others. They claim very high performance something on the order of 350MB/sec read and 250MB/sec write (I think per node) of object data transfers.  Nodes come in 240TB units and one can have up to 256 nodes in a WOS system.   One customer uses a WOS node to land local sensor streams then ships it to other locations for analysis.
View from the Summit balcony, 2nd day
View from the Summit balcony, 2nd day

The next day was spent with Nexsan and DDN talking about their customer base and some of their success stories. We spent the remainder of the morning talking about the startup world which surrounds some object storage technology and the inhibiters to broader adoption of the technology.

In the end there’s a lot of education needed to jump start this market place. Education about both the customer problems that can be solved with object stores and the product differences that are out there today.  I argued (forcefully) that what’s needed to accelerate adoption was some standard interface protocol that all object storage systems could utilize. Such a standard protocol would enable a more rapid ecosystem build out and ultimately more enterprise adoption.

One key surprise to me was that the problems their customers are seeing is something all IT customers will have some day. Jean-Luc called it the democratization of the HPC problems. Big Data is driving object storage requirements into the enterprise in a big way…

Comments?

Cleversafe’s new hardware

Cleversafe new dsNet(tm) Rack (from Cleversafe.com)
Cleversafe new dsNet(tm) Rack (from Cleversafe.com)

Yesterday, Cleversafe announced new Slicestor(r) 2100 and 2200 hardware using 2TB SATA drives. The standard 2100 1U package supports 8TB of raw data and the 2200 new 2U package supports 24TB of data. In addition, a new Accesser(r) 2100 supports 8GB of ECC RAM, and 2 GigE or 10GbE ports for data access.

In addition to the new server hardware, Cleversafe also announced an integrated rack with up to 18 Slicestor 2200s, 2 Accessors 2100s, 1 Omnience (management node), 48-port ethernet switch, and PDUs. This new rack configuration comes pre-cabled and can easily be installed to support an immediate 432TB raw capacity. It’s expected that customers with multiple sites could order 1 or more racks to support a quick installation of Cleversafe storage services.

Cleversafe currently offers iSCSI block services, direct object storage interface and file services interfaces (over iSCSI).  They are finding some success in the media and entertainment space as well as federal and state government data centers.

The federal and state government agencies seem especially interested in Cleversafe for its data security capabilities.  They offer cloud data security via their SecureSlice(tm) technology which encrypts data slices and uses key masking to obscure the key.  With SecureSlice, the only way to decrypt the data is to have enough slices to reconstitute the data.

Also the new Accesser and Slicestor server hardware now uses a drive on motherboard flash unit to hold operating system/Cleversafe software. This allows data drives to only hold customer data and reduces Accesser power requirements while also improving both Slicestor and Accesser reliability.

In a previous post we discussed EMC’s Atmos’s GeoProtect capabilities and although they are not quite at the sophistication of Cleversafe, EMC does offer a sort of data dispersion across sites/racks.  However, it appears that GeoProtect is currently limited to two distinct configurations.  In contrast, Cleversafe allows the user to select the number of Slicestor’s to store data and the threshold required to reconstitute the data.  Doing this allows the user to almost dial up or down the availability and reliability they want for their data.

Cleversafe performs well enough to saturate a single Accesser GigE iSCSI link.  Accessers maintain a sort of preferred routing table which indicates which Slicestors currently have the best performance. By accessing the quickest Slicestors first to reconstitute data, performance can be optimized.  Specifically, for the typical multi-site Cleversafe implementation, knowing current Slicestor to Accesser performance can improve data reconstitution performance considerably.

Full disclosure, I have done work for Cleversafe in the past.

Securing data in the cloud

Who says there are no clouds today by akakumo (cc) (from Flickr)
Who says there are no clouds today by akakumo (cc) (from Flickr)

We have posted previously about the need for backup in cloud storage. So today I would like to start a discussion on securing “data-at-rest” within the cloud.

Depositing data into the cloud seems a little like a chinese laundry to me – you deposit data in the cloud and receive a ticket or token used to retrieve the data. Today’s cloud data security depends entirely on this token.

Threats to the token

If one only looks at external security threats, two issues to token use seem apparent

  • Brute force cracking of any token is possible. I envision a set of cloud storage users wherein they use their current storage tokens as seeds to identify other alternate tokens. Such an attack could easily generate token synonyms which may or may not be valid. Detecting a brute force attack could be easily accomplished, but distributing this attack across 1000s of compromised PCs would be much harder to detect.
  • Tokens could be intercepted in the clear. Cloud data often may need to be accessed in locations outside the data center of origin. This would require sending tokens to others. These data tokens could inadvertently be sent in the clear and as such, intercepted.

Probably other external exposures beyond these two exist as well but these will suffice.

Securing cloud data-at-rest

Given the potential external and internal threats to data tokens, securing such data can eliminate any data loss from token exposure. I see at least three approaches to securing data in the cloud.

  • The data dispersal approach – Cleversafe’s product splits a data stream into byte segments and disburses these segments across storage locations. Their approach is oriented around Reed Solomon logic and results in no one location having any recognizable portion of the data. This requires multiple sites, or multiple systems at one site to implement but is essentially securing data through segmenting it. The advantages of this approach is that its fast and automatic but the disadvantage is that it only is supported via Cleversafe.
  • The software data encryption approach – there are plenty of software packages out such as GnuPG (GNU Privacy Guard) or PGP which can be used to encrypt your data prior to sending it to the cloud. It’s sort of brute force, software only approach but its advantage is that it can be used with any cloud storage provider. Its disadvantages are that it’s slow, processor intensive, and key management is sporadic.
  • The hardware data encryption approach – there are also plenty of data encryption appliances and/or hardware options out there which can be used to encrypt data. Some of these are available at the FC switch level, some are standalone appliances, and some exist at the storage subsystem level. The problem with most of these is that they only apply to FC storage and are not readily useable by Cloud Storage (unless the provider uses FC storage as its backing store). The advantages are that it’s fast and key management is generally built into the product.

One disadvantage to any of the encryption approaches is that now one needs the encryption keys and the token to access the data. Yet one more thing to protect.

Nothing says that hardware data encryption couldn’t also work for data flowing to the cloud but they would have to support IP plus the cloud specific REST interface. Such support would depend on cloud storage provider market share, but perhaps some cloud vendor could fund a security appliance vendor to support their interface directly, providing a cloud data security option.

The software approach suffers from performance problems but supports anybody’s cloud storage. It might be useful if Cloud storage providers started offering hooks into GnuPG or PGP to directly encrypt cloud data. However, most REST interfaces require some programming to use and it’s not too much of a stretch to program in encryption into this.

I like the data dispersal approach, but most argue that security is not guaranteed as reverse engineering the dispersal algorithm allows one to reconstruct the data stream. But the other more serious problem is that it only applies to Cleversafe storage, perhaps the dispersal algorithm should be open sourced (which it already is) and/or standardized.

There are possibly other approaches which I have missed here but these can easily be used to secure cloud data-at-rest. Possibly adding more security around the data token could also help alleviate this concern. Thoughts?

Post disclosure: I am not currently working with Cleversafe, any data security appliance provider, or cloud storage provider.