Disaster recovery from VMware to AWS using Dell EMC Avamar & Data Domain

avI was at Dell EMC World2017 last week and although most of the news was on Dell’s new 14th generation server and Dell-EMC integration progress, Wednesday’s keynote was devoted to storage and non-server infrastructure news.

There was plenty of non-server news but one item that caught my attention was new functionality from Dell EMC Data Protection Division that used Avamar and Data Domain to provide disaster recovery for VMware VMs directly to AWS.

Data Domain (AWS) Cloud DR

Dell EMC Data Domain Cloud DR (DDCDR) is  a new capability that enables DD to backup to AWS S3 object storage and when needed restart the virtual machines within AWS.

DDCDR requires that a customer with Avamar backup and Data Domain (DD) storage install an OVA which deploys an “add-on” to their on-prem Avamar/DD system and install a lightweight VM (Cloud DR server) utility in their AWS domain.

Once the OVA is installed, it will read the changed data and will segment, encrypt, and compress the backup data and then send this and the backup metadata to AWS S3 objects. Avamar/DD policies can be established to control how many daily backup copies are to be saved to S3 object storage. There’s no need for Data Domain or Avamar to run in AWS.

When there’s a problem at the primary data center, an admin can click on a Avamar GUI button and have the Cloud DR server, uncompress, decrypt, rehydrate and restore the backup data into EBS volumes, translate the VMware VM image to an AMI image and then restarts the AMI on an AWS virtual server (EC2) with its data on EBS volume storage. The Cloud DR server will use the backup metadata to select the AWS EC2 instance with the proper CPU and RAM needed to run the application. Once this completes, the VM is running standalone, in an AWS EC2 instance. Presumably, you have to have EC2 and EBS storage volumes resources available under your AWS domain to be able to install the application and restore its data.

For simplicity purposes, the user can control almost all of the required functionality for DDCDR from the Avamar GUI alone. But in case of a site outage, the user can initiate the application DR from a portal supplied by the Cloud DR server utility.

There you have it, simplified, easy to use (AWS) Cloud DR for your VM applications all through Dell EMC Avamar, Data Domain storage and DDCDR. At the moment, it only works with AWS cloud but it’s likely to be available for other public clouds in the near future.

~~~~

There was much more infrastructure news at Dell EMC World2017. I’ll discuss more details on their new storage offerings in my upcoming Storage Intelligence newsletter, due out the end of this month. If your interested in receiving your own copy of my newsletter, checkout the signup button in the upper right of this page.

Comments?

[Edits were made for readability and technical accuracy after this post was published. Ed]

There’s a new cluster filesystem on the block, Elastifile

At SFD12 last month we talked with the team from Elastifile. They are a new startup out of Israel working on a better cluster file system.

Elastifile was designed to support 1000s of nodes, 100,000 of users/client and 1000s of data containers (file systems/mount points), together with an infinite (64 bit) number of files and directories and up to Exabytes (10**18) in capacity. They also offer a 100% SSD file store capability. I encourage you to view the videos of their presentations at SFD12 to learn more.

Elastifile features

Elastifile supports data compression and optionally deduplication with NAND/Flash (e. g., low-/high-endurance) storage tiering, cloud storage tiering and multi-site storage. They also provide NFSv3/v4, SMB, AWS S3 and HDFS as native access protocols for their file storage.

They also offer non-disruptive hardware/software upgrades, n-way (2- or 3-way) data and metadata redundancy, self-healing capabilities, snapshots, and synchronous/asynchronous data replication or mirroring. Further, they provide multi-tenancy and QoS support.

Elastifile can be used in hyper converged mode as well as a dedicated storage server mode. For backend storage, they support heterogeneous, physical (block, I think?) storage systems as well as direct access storage in cluster nodes

Internals matter

Elastifile’s architecture supports accessor, owner and data nodes. But these can all be colocated on the same server or segregated across different servers.

Owner nodes, own all the metadata objects for a file or directory and caches the metadata working set in i’s memory. Ownership file or directory metadata may change in the case of hardware failures.

Elastifile supports a dynamic write data path, which means they determine, in real time, where to write file data rather than having the data locations identified before hand. They call this distributed write anywhere semantics.

Notably they don’t do data caching (with NVMe it doesn’t make sense) however, as noted above, they do use metadata caching

Internally, Elastifile uses variable length objects for both file data and metadata.

  • File data is composed of three object types: a file metadata (FileMD) object, mapping data objects, and file data objects. FileMD’s hold the normal file metadata (name, file size, create, access & modify ToDs, etc.) as well as pointing to all the Mapping Object (OIDs). Mapping objects exist for each 0.5MB of file data and consist of a 128 element table, each element mapping 4KB of file address space to a data object (OID). Each  data object holds the 4KB of compressed file data and journal log entries.
  • Director metadata is composed of directory metadata (DirMD) object and Directory listing objects. Directory listing objects maps file/directory names to FileMD or DirMD OIDs. Directory listing objects are accessed via an extensible hash table and contain a list of filenames/directory names within the directory

The Elastifile software architecture consists of three layers:

  • A protocol layer which terminates file system access protocols and translates requests into internal requests. The hashing and data compression of file data occur at this level.
  • A metadata layer which provides file system/directory name mapping to objects for owned files/directories and maintains file/directory metadata updates/journals/checkpoints.
  • A data layer which provides transaction consistency and a n-way redundant persistent data storage for (file or metadata) objects.

Metadata operations are persisted via journaled transactions and which are distributed across the cluster. For instance the journal entries for a mapping data object updates are written to the same file data object (OID) as the actual file data, the 4KB compressed data object.

There’s plenty of discussion on how they manage consistency for their metadata across cluster nodes. Elastifile invented and use Bizur, a key-value consensus based DB. Their chief architect Ezra Hoch (@EzraHoch) did a blog post and paper on Bizur for more information

~~~~

New file systems generally take many years to mature and get out into the market, cluster file systems even longer. Elastifile started in 2013, by some very smart engineers, is already on the market, just 4 years later. That’s impressive enough, but with their list of advanced functionality plus cloud storage tiering and multi-site operations all shipping in the current product is mind-blowing.

One lingering question is, does a market exist for another cluster file system? All flash is interesting but most of the current CFS’s do this and ship this today. Cloud storage tiering is interesting and a long term need but some CFSs already have this and others are no doubt implementing it as we speak. CFS’s use of objects for internal data and metadata management is not new and may make internals cleaner but don’t really provide a lot of customer benefit.

Exascale raw capacity, support for 100K users, 1000s of nodes, 1000s of file systems and an infinite # of files/directories is interesting. But most CFSs claim this level of support already, although this is more aspirational for some. And proving support at this scale is difficult, if not impossible.

On the other hand, Bizur is really neat. Its primary benefit is during recovery from hardware failures. For a CFS with 1000s of nodes, failures likely occur quite often. So Bizur’s advantage here may pay significant customer dividends.

Is that enough to to market a new CFS?

To see what other SFD12 bloggers have written on Elastifile, please see:

Hitachi and the coming IoT gold rush

img_7137Earlier this week I attended Hitachi Summit 2016 along with a number of other analysts and Hitachi executives where Hitachi discussed their current and ongoing focus on the IoT (Internet of Things) business.

We have discussed IoT before (see QoM1608: The coming IoT tsunami or not, Extremely low power transistors … new IoT applications). Analysts and companies predict  ~200B IoT devices by 2020 (my QoM prediction is 72.1B 0.7 probability). But in any case there’s a lot of IoT activity going to come online, very shortly. Hitachi is already active in IoT and if anything, wants it to grow, significantly.

Hitachi’s current IoT business

Hitachi is uniquely positioned to take on the IoT business over the coming decades, having a number of current businesses in industrial processes, transportation, energy production, water management, etc. Over time, all these industries and more are becoming much more data driven and smarter as IoT rolls out.

Some metrics indicating the scale of Hitachi’s current IoT business, include:

  • Hitachi is #79 in the Fortune Global 500;
  • Hitachi’s generated $5.4B (FY15) in IoT revenue;
  • Hitachi IoT R&D investment is $2.3B (over 3 years);
  • Hitachi has 15K customers Worldwide and 1400+ partners; and
  • Hitachi spends ~$3B in R&D annually and has 119K patents

img_7142Hitachi has been in the OT (Operational [industrial] Technology) business for over a century now. Hitachi has also had a very successful and ongoing IT business (Hitachi Data Systems) for decades now.  Their main competitors in this IoT business are GE and Siemans but neither have the extensive history in IT that Hitachi has had. But both are working hard to catchup.

Hitachi Rail-as-a-Service

img_7152For one example of what Hitachi is doing in IoT, they have recently won a 27.5 year Rail-as-a-Service contract to upgrade, ticket, maintain and manage all new trains for UK Rail.  This entails upgrading all train rolling stock, provide upgraded rail signaling, traffic management systems, depot and station equipment and ticketing services for all of UK Rail.

img_7153The success and profitability of this Hitachi service offering hinges on their ability to provide more cost efficient rail transport. A key capability they plan to deliver is predictive maintenance.

Today, in UK and most other major rail systems, train high availability is often supplied by using spare rolling stock, that’s pre-positioned and available to call into service, when needed. With Hitachi’s new predictive maintenance capabilities, the plan is to reduce, if not totally eliminate the need for spare rolling stock inventory and keep the new trains running 7X24.

img_7145Hitachi said their new trains capture 48K data items and generate over ~25GB/train/day. All this data, will be fed into their new Hitachi Insight Group Lumada platform which includes Pentaho, HSDP (Hitachi Streaming Data Platform) and their Content Analytics to analyze train data and determine how best to keep the trains running. Behind all this analytical power will no doubt be HDS HCP object store used to keep track of all the train sensor data and other information, Hitachi UCP servers to process it all, and other Hitachi software and hardware to glue it all together.

The new trains and services will be rolled out over time, but there’s a pretty impressive time table. For instance, Hitachi will add 120 new high speed trains to UK Rail by 2018.  About the only thing that Hitachi is not directly responsible for in this Rail-as-a-Service offering, is the communications network for the trains.

Hitachi other IoT offerings

Hitachi is actively seeking other customers for their Rail-as-a-service IoT service offering. But it doesn’t stop there, they would like to offer smart-water-as-a-service, smart-city-as-a-service, digital-energy-as-a-service, etc.

There’s almost nothing that Hitachi currently supplies as industrial products that they wouldn’t consider offering in an X-as-a-service solution. With HDS Lumada Analytics, HCP and HDS storage systems, Hitachi UCP converged infrastructure, Hitachi industrial products, and Hitachi consulting services, together they are primed to take over the IoT-industrial products/services market.

Welcome to the new Hitachi IoT world.

Comments?

NetApp updates their StorageGRID Webscale solution

grid001NetApp announced a new version of their object storage solution, the StorageGRID WebScale 10.3.

At a former employer, I first talked with StorageGRID (Bycast at the time) a decade or so ago. At that time, they were focused on medical and healthcare verticals and had a RAIN (redundant array of independent nodes) storage solution.  It has come a long way.

StorageGRID Business is booming

On the call, NetApp announced they sold 50PB of StorageGRID in FY’16 with 20PB of that in the last quarter and also reported 270% Y/Y revenue growth, which means they are starting to gain some traction in the marketplace. Are we seeing an acceleration of object storage adoption?

As you may recall, StorageGRID comes in a software only solution that runs on just about any white box server with DAS or as two hardware appliances: the SG5612 (12 drive); and the SG5660 (60 drive) nodes. You can mix and match any appliance with any white box software only solution, they don’t have to have the same capacity or performance. But all nodes need network and controller/admin node(s) access.

StorageGRID past

grid002Somewhere during Bycast’s journey they developed support for tape archives and information lifecycle management (ILM) for objects. The previous generation, StorageGrid 10.2 had a number of features, including:

  • S3 cloud archive support that allowed objects to be migrated to AWS S3 as they were no longer actively accessed
  • NAS bridge support that allowed CIFS/SMB or NFS access to StorageGRID objects, which could also be read as S3 objects for easier migration to/from object storage;
  • Hierarchical erasure coding option that was optimized for efficiently storing large objects;
  • Node level erasure coding support that can be used to rebuild data for node drive failures, without having to go outside the node data retrieval;
  • Object byte-granular range read support that allowed users to read an object at any byte offset without requiring rebuild;
  • Support for OpenStack Swift API that made StorageGRID objects natively available to any OpenStack service; and
  • Software support for running as Docker containers or as a VM under VMware ESX, or OpenStack KVM that allowed StorageGRID software to run just about anywhere.

StorageGRID present and future

grid003But customers complained StorageGRID was too complex to install and update which required too much hand holding by NetApp professional services. StorageGRID Webscale 10.3 was targeted to address these deficiencies. Some of the features in StorageGrid 10.3, include:

  • Radically simplified, more modern UI, new dashboard and policy wizard/editor, so that it’s a lot easier to manage the StorageGRID. All features of the UI are also available via RESTfull API access and the UI is the same for white box, software only implementations as well as appliance configurations.
  • Simplified automated installation scripts, so that installations that used to take multiple steps, separate software installs and required professional services support, now use a full-solution software stack install, take only minutes and can be done by the customers alone;
  • S3 object versioning support, so that objects can have multiple versions, limited via the UI, if needed, but provide a snapshot-like capability for S3 data that protects against object accidental deletion.
  • grid004ILM policy change predictions/modeling, so that admins can now see how changes to ILM policies will impact StorageGRID.
  • Even more flexibility in DAS storage, so that future StorageGRID configurations can support 10TB drives and 6TB FIPS-140 drive encryption support, which adds to the current drive capacity and data security options already available in StorageGRID.

To top it all off, StorageGRID 10.3 improves performance for both small (30KB) and large (300MB) object get/puts.

  • Small S3 Load Data Router (LDR, 1-thread) object performance has improved ~4X for both PUTs and GETs; and
  • Large S3 LDR (1-thread) object performance has improved ~2X for PUTs and ~4X for GETs.

Object storage market heating up

grid005Apparently, service providers are adopting object storage to  provide competition to AWS, Azure and Google cloud storage for backup and storage archives as well as for DR as a service. Also, many media and other customers managing massive data repositories are turning to object storage to support their multi-site, very large file libraries.  And as more solution vendors support S3 object protocols for data access and archive, something like StorageGRID can become their onsite-offsite storage alternative.

And Amazon, Azure and Google are starting to realize that most enterprise customers are not going to leap to the cloud for everything they do. So, some sort of hybrid solution is needed for the long term. Having an on premises and off premises object storage solution that can also archive/migrate data to the cloud is a great hybrid alternative that takes enterprises one step closer to the cloud.

Comments?

#VMworld day 1, Cloud Foundation and Cross-Cloud Services

The main keynote topic for today at VMworld was how to address the coming cloud tsunami. Pat citing his own researchers believes that 50% of all workloads (OS instances) will be running in public and private cloud by 2021 and by 2030, 50% of all workloads will be running in the Public Cloud alone. So today VMware announced two new offerings: VMware Cloud Foundation and VMware Cross-Cloud Services.

Cloud Foundation

Cloud Foundation appears to be a bundling of VMware’s SDDC, NSX®, Virtual SAN™ (VSAN) and vSphere® solutions, into a single, integrated stack/package that can be sold and licensed together. No pricing was provided at the show but essentially VMware want’s to allow customers a simple way to deploy a VMware private cloud.

VMware states that Cloud Foundation offers customers up to 6-8X faster cloud deployment at a TCO savings of >40%.

VMware also announced a joint partnership with IBM to sell Cloud Foundation services residing on the IBM Cloud to their customer base. This broaden’s the availability of VMware cloud service offerings beyond vCloud and on premises Cloud Foundation environments.

Cross-Cloud Services

IMG_6819Everyone wants to minimize cloud vendor lockin but that’s not possible today except in a few special cases (NetApp Private Storage and similar capabilities from other vendors, cloud storage gateway services, cloud archive services, etc.).

VMware Cross-Cloud Services is the next step down this path, attempting to provide easier workload/data migration, consolidated cost and workload management and security deployment across the public and private cloud boundaries.

Cross-Cloud Services was in tech preview at the show but it’s intended to make use of standard public cloud defined APIs to provide specialized targeted services to allow better cross-cloud migration and management.

The tech preview showed VMware Cross-Cloud Services deploying an NSX gateway in AWS which allowed NSX to control public cloud IP addresses and then once that was done, one could apply security templates to deploy network encryption between apps and its services. VMware used a sniffer to show the before plain text traffic and the after with encrypted traffic, all done in a matter of minutes. They also showed cost trending information for workloads running across the private and public cloud.

Next they showed a demo (movie) of VMware migrating/cloning a simple app to other public and private cloud environments. They had a public cloud Unicycle IOT app running in Ireland/AWS (I think) with a three tier (web, app, database) app structure/instances and then migrated/cloned that single site 3-tier app to be deployed across multiple cloud (web and app tiers) sites with a single database instance running in a private cloud.

I started thinking this is getting us down the path towards cloud virtualization but in the end, it’s much more targeted services, which run in instances/gateways in the public and private cloud to do very specific migration or management activities. Nonetheless a great first step towards more flexible cross-cloud deployment and management.

VMworld Day 2 looks to be more on current products and enhancements, stay tuned.

Comments?

Hedvig storage system, Docker support & data protection that spans data centers

Hedvig003We talked with Hedvig (@HedvigInc) at Storage Field Day 10 (SFD10), a month or so ago and had a detailed deep dive into their technology. (Check out the videos of their sessions here.)

Hedvig implements a software defined storage solution that runs on X86 or ARM processors and depends on a storage proxy operating in a hypervisor host (as a VM) and storage service nodes. Their proxy and the storage services can execute as separate VMs on the same host in a hyper-converged fashion or on different nodes as a separate storage cluster with hosts doing IO to the storage cluster.

Hedvig’s management team comes from hyper-scale environments (Amazon Dynamo/Facebook Cassandra) so they have lots of experience implementing distributed software defined storage at (hyper-)scale.
Continue reading “Hedvig storage system, Docker support & data protection that spans data centers”

BlockStack, a Bitcoin secured global name space for distributed storage

At USENIX ATC conference a couple of weeks ago there was a presentation by a number of researchers on their BlockStack global name space and storage system based on the blockchain based Bitcoin network. Their paper was titled “Blockstack: A global naming and storage system secured by blockchain” (see pg. 181-194, in USENIX ATC’16 proceedings).

Bitcoin blockchain simplified

Blockchain’s like Bitcoin have a number of interesting properties including completely distributed understanding of current state, based on hashing and an always appended to log of transactions.

Blockchain nodes all participate in validating the current block of transactions and some nodes (deemed “miners” in Bitcoin) supply new blocks of transactions for validation.

All blockchain transactions are sent to each node and blockchain software in the node timestamps the transaction and accumulates them in an ordered append log (the “block“) which is then hashed, and each new block contains a hash of the previous block (the “chain” in blockchain) in the blockchain.

The miner’s block is then compared against the non-miners node’s block (hashes are compared) and if equal then, everyone reaches consensus (agrees) that the transaction block is valid. Then the next miner supplies a new block of transactions, and the process repeats. (See wikipedia’s article for more info).

All blockchain transactions are owned by a cryptographic address. Each cryptographic address has a public and private key associated with it.
Continue reading “BlockStack, a Bitcoin secured global name space for distributed storage”

Better erasure coding for scale-out & cloud storage

LRcC(6,2,2) example layout
LRcC(6,2,2) example layout

Microsoft Azure uses a different style of erasure coding for their cloud storage than what I have encountered in the past. Their erasure coding technique was documented in a paper presented at USENIX ATC’12 (for more info check out their Erasure coding in Windows Azure Storage paper).

The new erasure coding can be optimized for rebuild read or storage space overhead. can at times correct for more errors than equivalent, more traditional, Reed-Solomon (RS) erasure coding schemes.
Continue reading “Better erasure coding for scale-out & cloud storage”