Object storage – Page 3 – Silverton Consulting

Rubrik has a better idea for VMware backup

Posted on February 16, 2016May 17, 2016 by Ray in Data compression, Data reduction, Distributed computing, NFS, Object storage, Storage Backup

Cluster nodes Rubrik has been around since January 2014 and just GA’d in April of last year. They recently presented at TechFieldDay 10 (TFD10, videos here) with Chris Wahl, Technical Evangelist, Arvin “Nitro” Nithrakashyap, Co-Founder and Bipul Sinha, Co-Founder, in attendance.

I have known Chris Wahl since November of 2013, from our time together on Storage Field Day 4 (SFD4). Howard and I (the “Greybeards”) also interviewed Chris Wahl for Rubrik on a Greybeards on Storage podcast.
Continue reading “Rubrik has a better idea for VMware backup” →

AWS vs. Azure security setup for Linux

Posted on February 1, 2016 by Ray in Cloud services, Cloud storage, Data security, Disk storage, Internet traffic, Object storage

Strange Clouds by michaelroper (cc) (from Flickr)

I have been doing some testing with both Azure and Amazon Web Services (AWS) these last few weeks and have observed a significant difference in the security setups for both of these cloud services, at least when it comes to Linux compute instances and cloud storage.

First, let me state at the outset, all of my security setups for both AWS and Amazon was done through using the AWS console or the Azure (classic) portal. I believe anything that can be done with the portal/console for both AWS and Azure can also be done in the CLI or the REST interface. I only used the portal/console for these services, so can’t speak to the ease of using AWS’s or Azure’s CLI or REST services.

For AWS

EC2 instance security is pretty easy to setup and use, at least for Linux users:

When you set up an (Linux) EC2 instance you are asked to set up a Public Key Infrastructure file (.pem) to be used for SSH/SFTP/SCP connections. You just need to copy this file to your desktop/laptop/? client system. When you invoke SSH/SFTP/SCP, you use the “-i” (identity file) option and specify the path to the (.PEM) certificate file. The server is already authorized for this identity. If you lose it, AWS services will create another one for you as an option when connecting to the machine.
When you configure the AWS instance, one (optional) step is to configure its security settings. And one option for this is to allow connections only from ‘my IP address’, how nice. You don’t even have to know your IP address, AWS just figures it out for itself and configures it.

That’s about it. Unclear to me how well this secures your EC2 instance but it seems pretty secure to me. As I understand it, a cyber criminal would need to know and spoof your IP address to connect to or control remotely the EC2 instance. And if they wanted to use SSH/SFTP/SCP they would either have to access to the identity file. I don’t believe I ever set up a password for the EC2 instance.

As for EBS storage, there’s no specific security associated with EBS volumes. Its security is associated with the EC2 instance it’s attached to. It’s either assigned/attached to an EC2 instance and secured there, or it’s unassigned/unattache. For unattached volumes, you may be able to snapshot it (to an S3 bucket within your administration control) or delete it (if it’s unattached, but for either of these you have to be an admin for the EC2 domain.

As for S3 bucket security, I didn’t see any S3 security setup that mimicked the EC2 instance steps outlined above. But in order to use AWS automated billing report services for S3, you have to allow the service to have write access to your S3 buckets. You do this by supplying an XML-like security policy, and applying this to all S3 buckets you wish to report on (or maybe it’s store reports in). AWS provides a link to the security policy page which just so happens to have the XML-like file you will need to do this. All I did was copy this text and insert it into a window that was opened when I said I wanted to apply a security policy to the bucket.

I did find that S3 bucket security, made me allow public access (I think, can’t really remember exactly) to the S3 bucket to be able to list and download objects from the bucket from the Internet. I didn’t like this, but it was pretty easy to turn on. I left this on. But this PM I tried to find it again (to disable it) but couldn’t seem to locate where it was.

From my perspective all the AWS security setup for EC2 instances, storage, and S3 was straightforward to use and setup, it seemed pretty secure and allowed me to get running with only minimal delay.

For Azure

First, I didn’t find the more modern, new Azure portal that useful but then I am a Mac user, and it’s probably more suitable for Windows Server admins. The classic portal was as close to the AWS console as I could find and once I discovered it, I never went back.

Setting up a Linux compute instance under Azure was pretty easy, but I would say the choices are a bit overwhelming and trying to find which Linux distro to use was a bit of a challenge. I settled on SUSE Enterprise, but may have made a mistake (EXT4 support was limited to RO – sigh). But configuring SUSE Enterprise Linux without any security was even easier than AWS.

However, Azure compute instance security was not nearly as straightforward as in AWS. In fact, I could find nothing similar to securing your compute instance to “My IP” address like I did in AWS. So, from my perspective my Azure instances are not as secure.

I wanted to be able to SSH/SFTP/SCP into my Linux compute instances on Azure just like I did on AWS. But, there was no easy setup of the identity file (.PEM) like AWS supported. So I spent some time, researching how to create a Cert file with the Mac (didn’t seem able to create a .PEM file). Then more time researching how to create a Cert file on my Linux machine. This works but you have to install OpenSSL, and then issue the proper “create” certificate file command, with the proper parameters. The cert file creation process asks you a lot of questions, one for a pass phrase, and then for a network (I think) phrase. Of course, it asks for name, company, and other identification information, and at the end of all this you have created a set of cert files on your linux machine.

But there’s a counterpart to the .pem file that needs to be on the server to authorize access. This counterpart needs to be placed in a special (.ssh/authorized) directory and I believe needs to be signed by the client needing to be authorized. But I didn’t know if the .cert, .csr, .key or .pem file needed to be placed there and I had no idea how to” sign it”. After spending about a day and a half on all this, I decided to abandon the use of an identity file and just use a password. I believe this provides less security than an identity file.

As for BLOB storage, it was pretty easy to configure a PageBlob for use by my compute instances. It’s security seemed to be tied to the compute instance it was attached to.

As for my PageBlob containers, there’s a button on the classic portal to manage access keys to these. But it said once generated, you will need to update all VMs that access these storage containers with the new keys. Not knowing how to do that. I abandoned all security for my container storage on Azure.

So, all in all, I found Azure a much more manual security setup for Linux systems than AWS and in the end, decided to not even have the same level of security for my Linux SSH/SFTP/SCP services that I did on AWS. As for container security, I’m not sure if there’s any controls on the containers at this point. But I will do some more research to find out more.

In all fairness, this was trying to setup a Linux machine on Azure, which appears more tailored for Windows Server environments. Had I been in an Active Directory group, I am sure much of this would have been much easier. And had I been configuring Windows compute instances instead of Linux, all of this would have also been much easier, I believe.

~~~~

All in all, I had fun using AWS and Azure services these last few weeks, and I will be doing more over the next couple of months. So I will let you know what else I find as significant differences between AWS and Azure. So stay tuned.

Comments?

DDN unchains Wolfcreek, unleashes IME and updates WOS

Posted on July 2, 2015 by Ray in Block Storage, Clustered storage, File Storage, IOPS, Object storage, Server virtualization, SPC-1, SSD storage, Storage performance, Visionary leadershp

It’s not every day that we get a vendor claiming 2.5X the top SPC-1 IOPS (currently held by Hitachi G1000 VSP all flash array at ~2M IOPS) as DataDirect Networks (DDN) has claimed for an all-flash version of their new Wolfcreek hyper converged appliance. DDN says their new 4U appliance is capable of 60GB/sec of throughput and over 5M IOPS. (See their press release for more information.) Unclear if these are SPC-1 IOPS or not, but I haven’t seen any SPC-1 report on it yet.

In addition to the new Wolfcreek appliance, DDN announced their new Infinite Memory Engine™ (IME) flash caching software and WOS® 360 V2.0, an enhanced version of their object storage.

DDN if you haven’t heard of them has done well in the Web 2.0 environments and is a leading supplier to high performance computing (HPC) sites. They have object storage system (WOS), all flash block storage (SFA12KXi), hybrid (disk-SSD) block storage (SFA7700X™ & SFA12KX™), Lustre file appliance (EXAScaler), IBM GPFS™ NAS appliance (GRIDScaler), media server appliance (MEDIAScaler™) and software defined storage (Storage Fusion Accelerator [SFX™] flash caching software).

Wolfcreek hyper converged appliance

The converged solution comes in a 4U appliance using dual Haswell Intel microprocessors (with up to 18 cores each), includes a PCIe fabric which supports 48-NVMe flash cards or 72-SFF SSDs. With the NVMe or SSDs, Wolfcreek will be using their new IME software to accelerate IO activity.

Wolfcreek IME software supports either burst mode IO caching cluster or a storage cluster of nodes. I assume burst mode is a storage caching layer for backend file system stoorage. As a storage cluster I assume this would include some of their scale-out file system software on the nodes. Wolfcreek cluster interconnect is 40Gb Infiniband or 10/40Gb Ethernet and also will support Intel’s Omni-Path. The Wolfcreek appliance is compatible with HPC Lustre and IBM GPFS scale out file systems.

Wolfcreek appliance can be a great platform for OpenStack and Hadoop environments. But it also supports virtual machine hypervisors from VMware, Citrix and Microsoft. DDN says that the Wolfcreek appliance can scale up to support 100K VMs. I’ve been told that IME will not be targeted to work with Hypervisors in the first release.

Recall that with a hyper converged appliance, some portion of the system resources (memory and CPU cores) must be devoted to server and VM application activities and the remainder to storage activity. How this is divided up and whether this split is dynamic (changes over time) or static (fixed over time) in the Wolfcreek appliance is not indicated.

The hyper converged field is getting crowded of late what with VMware EVO:RAIL, Nutanix, ScaleComputing, Simplivity and others coming out with solutions. But there aren’t many that support all-flash storage and it seems unusual that hyper converged customers would have need for that much IO performance. But I could be wrong, especially for HPC customers.

There’s much more to hyper convergence than just having storage and compute in the same node. The software that links it all together, manages, monitors and deploys these combined hypervisor, storage and server systems is almost as important as any of the hardware. There wasn’t much talk about the software that DDN is putting together for Wolfcreek but it’s still early yet. With their roots in HPC, it’s likely that any DDN hyper converged solution will target this market first and broaden out from there.

Infinite Memory Engine (IME)

IME is an outgrowth of DDN’s SFX software and seem to act as a caching layer for parallel file system IO. It makes use of NVMe or SSDs for its IO caching. And according to DDN can offer up to 1000X IO acceleration to storage or 100X file system acceleration.

It does this primarily by providing an application aware IO caching layer and supplying more effective IO to the file system layer using PCIe NVMe or SSD flash storage for hardware IO acceleration. According to the information provided by DDN, IME can provide 50 GB/sec bandwidth to a host compute cluster while only doing 4GB/sec of throughput to a backend file system, presumably by better caching of file IO.

WOS 360 V2.0

The new WOS 360 V2.0 object storage system features include

Higher density storage package with 98-8TB SATA drives or 768TB raw capacity in 4U) supporting 8B objects each and over 100B objects in a cluster.
Native SWIFT API support for OpenStack environments which includes gateway or embedded deployments, up to 5000 concurrent users and 5B objects/namespace.
Global ObjectAssure data encoding with lower storage overhead (1.5x or a 20% reduction from their previous encoding option) for highly durable and available object storage usiing a two level hierarchical erasure code for object storage.
Enhanced network security with SSL which provides end-to-end SSL network data transport between clients and WOS and betweenWOS storage nodes.
Simplified cluster installation, deployment and maintenance with can now deploy a WOS cluster in minutes, with a simple point and click GUI for installation and cluster deployment with automated non-disruptive software upgrade.
Performance improvements for better video streaming, content distribution and large file transfers with improved QoS for latency sensitive applications.

~~~~

Probably more going on with DDN than covered here but this hits the highlights. I wish there was more on their Wolfcreek appliance and its various configurations and performance benchmarks but there’s not.

Comments?

Photo Credits: wolf-63503+1920 by _Liquid

EMCWorld2015 Day 2&3 news

Posted on May 6, 2015May 6, 2015 by Ray in Block Storage, Cloud services, Cloud storage, Clustered storage, Information economy, Object storage, PCIe, Storage, Storage architecture, Storage performance, Strategic Inflection Points, System effectiveness

Some additional news from EMCWorld2015 this week:

EMC announced directed availability for DSSD, their Rack scale shared Flash storage solution using a PCIe3 (switched) fabric with 36 dual ported, flash modules, which hold 512 NAND chips for 144TB NAND flash storage. On the stage floor they had a demonstration pitting a 40 node Hadoop cluster with DAS against a 15 node Hadoop cluster using the DSSD, both running HIVE and working on the same Query. By the time the 40node/DAS solution got to about 2% of the query completion the 15node/DSSD based cluster had finished the query without breaking a sweat. They then ran an even more complex query and it took no time at all.

They also simulated a copy of a 4TB file (~32K-128K IOs) from memory to memory and it took literally seconds, then copied it to SSD that took considerably longer (didn’t catch how long but much longer than memory), and then they showed the same file copy to DSSD and it only took seconds, almost looked exactly a smidgen slower than the memory to memory copy.

They said the PCIe fabric (no indication what the driver was) provided much more parallelism to the dual ported flash storage that the system was almost able to complete the 4TB copy at memory to memory speeds. It was all pretty impressive, albeit a simulation of the real thing.

EMC indicated that they designed the flash modules themselves and expect to double capacity of the DSSD to 288TB shortly. They showed the controller board that had a mezzanine board over a part of it, but together had 12 major chips on it which I assume had something to do with the PCIe fabric. They said there were two controllers in the system for high availability and the 144TB DSSD was deployed in 5U of space.

I can see how this would play well for real time analytics, high frequency trading and HPC environments but there’s more to shared storage than just speed. Cost wasn’t mentioned neither was the software driver but with the ease with which it worked on the Hive query, I can only assume at some lever it must look something like a DAS device but with memory access times… NVMe anyone?

Project CoprHD was announced which open sourced EMC’s ViPR Controller software. Many ViPR customers were asking for EMC to open source ViPR controller, apparently their listening. Hopefully this will enable some participation from non-EMC storage vendors to allow their storage to be brought under the management of ViPR Controller. I believe the intent is to have an EMC hardened/supported version of Project CoprHD or ViPR Controller to coexist with the open source project version which anyone can download and modify for themselves.

A Non-production, downloadable version of ScaleIO was also announced. The test-dev version is a free download with unlimited capacity, full functionality and available for an unlimited time but only for non-production use. Another of the demos onstage this morning was Chad configuring storage across a ScaleIO cluster and using its QoS services to limit the impact of a specific workload. There was talk that ScaleIO was available previously as a free download but it took a bunch of effort to find and download. They have removed all these prior hindrances and soon, if not today it’s freely available for anyone. ScaleIO runs on VMware and other hypervisors (maybe bare metal as well). So if you wanted to get your feet wet with software defined storage, this sounds like the perfect opportunity.

ECS is being added to EMC’s Data Lake foundation. Not exactly sure what are all the components in the data lake solution but previously the only Data Lake storage was Isilon based. This week EMC added Elastic Cloud Storage to the picture. Recall that Elastic Cloud Storage comes in either a software only or hardware appliance deployment and provides object storage.

I missed Project Liberty before but it’s a virtual VNX appliance, software only version. I assume this is intended for ROBO deployments or very low end business environments. Presumably it runs on VMware and has some sort of storage limitations. It seems, more and more of EMC products are coming out in virtual appliance versions.

Project Falcon was also announced which is a virtual Data Domain appliance, software only solution, targeted for ROBO environments and other small enterprises. The intent is to have an onramp for DataDomain backup storage. I assume runs under VMware.

Project Caspian – rolling out CloudScaling orchestration/automation for OpenStack deployments. On the big stage today, Chad and Jeremy demonstrated Project Caspian on a VCE VxRACK deploying racks of servers under OpenStack control. They were able within a couple of clicks define and deploy openstack on bare metal hardware and deploy applications to the OpenStack servers. They had a monitoring screen which showed the OpenStack server activity (transactions) in real time and showed an over commit of the rack and how easy it was to add a new rack with more servers. All this seemed to take but a few clicks. The intent is not to create another OpenStack distribution but to provide an orchestration/automation/monitoring layer of software on top of OpenStack to “industrialize OpenStack” for enterprise users. Looked pretty impressive to me.

I would have to say the DSSD box was most impressive. It would have been interesting to get an upclose look at the box with some more specifications but they didn’t have one on the Expo floor.

Object store and hybrid clouds at Cloudian

Posted on April 23, 2015 by Ray in Data growth, Distributed computing, File Storage, NFS, Object storage, SSD storage, Storage resiliency

Out of Japan comes another object storage provider called Cloudian. We talked with Cloudian at Storage Field Day 7 (SFD7) last month in San Jose (see the videos of their presentations here).

Cloudian history

Cloudian has been out on the market since March of 2011 but we haven’t heard much about them, probably because their focus has been East Asia. The same day that the Tōhoku Earthquake and Tsunami hit the company announced Cloudian, an Amazon S3 Compliant Multi-Tenant Cloud Storage solution.

Their timing couldn’t have been better. Japanese IT organizations were beating down their door over the next two years for a useable and (earthquake and tsunami) resilient storage solution.

Cloudian spent the next 2 years, hardening their object storage system, the HyperStore, and now they are ready to take on the rest of the world.

Currently Cloudian has about 20PB of storage under management and are shipping a HyperStore Appliance or a software only distribution of their solution. Cloudian’s solutions support S3 and NFS access protocols.

Their solution uses Cassandra, a highly scaleable, distributed NoSQL database which came out of FaceBook for their meta-data database. This provides a scaleable, non-sharable meta-data data base for object meta-data repository and lookup.

Cloudian creates virtual storage pools on backend storage which can be optimized for small objects, replication or erasure coding and can include automatic tiering to any Amazon S3 and Glacier compatible cloud storage. I would guess this is how they qualify for Hybrid Cloud status.

The HyperStore appliance

Cloudian creates a HyperStore P2P ring structure. Each appliance has Cloudian management console services as well as the HyperStore engine which supports three different data stores: Cassandra, Replicas, and Erasure coding. Unlike Scality, it appears as if one HyperStore Ring must exist in a region. But it can be split across data centers. Unclear what their definition of a “region” is.

HyperStore hardware come in entry level (HSA-2024: 24TB/1U), capacity optimized (HSA-2048: 48TB/1U), performance optimized (HSA-2060: all flash, 60TB/2U

Replication with Dynamic Consistency

The other thing that Cloudian supports is different levels of consistency for replicated data. Most object stores support eventual consistency (see Eventual Data Consistency and Cloud Storage post). HyperStore supports 3 (well maybe 5) different levels of consistency:

One – object written to one replica and committed there before responding to client
Quorum – object written to N/2+1 replicas before responding to client
1. Local Quorum – replicas are written to N/2+1 nodes in same data center before responding to client
2. Each Quorum – replicas are written to N/2+1 nodes in each data center before responding to client.
All – all replicas must have received and committed the object write before responding to client

There are corresponding read consistency levels as well. The objects writes have a “coordinator” node which handles this consistency. The implication is that consistency could be established on an object basis. Unclear to me whether Read and Write dynamic consistency can be different?

Apparently small objects are also stored in the Cassandra datastore. That way HyperStore optimizes for object size. Also, HyperStore nodes can be added to a ring and the system will auto balance the data across the old and new nodes automatically.

Cloudian also support object versioning, ACL, and QoS services as well.

~~~

I was a bit surprised by Cloudian. I thought I knew all the object storage solutions out on the market. But then again they made their major strides in Asia and as an on-premises Amazon S3 solution, rather than a generic object store.

For more on Cloudian from SFD7 see:

Cloudian – Storage Field Day 7 Preview by @VirtualizedGeek (Keith Townsend)

What’s next for Nexenta

Posted on January 23, 2015January 23, 2015 by Ray in Block Storage, CIFS/SMB, Clustered storage, DAS, desktop virtualization, Distributed computing, iSCSI, NFS, Object storage, Storage, Strategic Inflection Points

We talked with Nexenta at Storage Field Day 6 where they discussed their current and future software defined storage solutions. I highly encourage you to see the SFD6 videos of their sessions if you want to learn more about them.

Nexenta was an earlier adopter of software defined storage and have recently signed with Solinea to support Nexenta under OpenStack CINDER block storage. Nexenta is based on ZFS and supports inline deduplication and advanced performance functionality.

NexentaStor™

NexentaStor™ is there base storage software and comes as a download in both an Enterprise edition and Community edition. NexentaStor can run on most industry standard, x86 server platforms.

The Community edition supports up to 18TB and uses DAS and/or SAS connected storage to supply NFS and SMB file services.
The Enterprise edition extends capacity into the PB and supports FC and iSCSI block storage services as well as file services. The Enterprise edition supports plugins for HA solutions and storage replication.

Nexenta mentioned that they had over 6500 customers for NexentaStor of which 1500 are cloud service providers. But they have a whole lot more to offer than just NexentaStor including NexentaConnect™ and coming soon, NexentaEdge™ and NexentaFusion™.

NexentaConnect™

NexentaConnect software works with VMware or Citrix solutions to provide advanced storage services, such as file services, IO acceleration, and storage automation/analytics. There are three products in the NexentaConnect family:

NexentaConnect for VMware Virtual SAN – by combining NexentaConnect together with VMware Virtual SAN software and DAS or SAS storage one can offer NFS and SMB/CIFS file services. Prior to NexentaConnect, VMware Virtual SAN storage only provided VMware dedicated SAN storage, but now that same infrastructure can be used for any NFS or SMB/CIFS file system storage.
NexentaConnect for VMware Horizon – by combining NexentaConnect with VMware Horizon and DAS plus local SSD storage, one can provide accelerated virtual desktop IO with state of the art write logging, inline deduplication, and GUI based storage automation/analytics.
NexentaConnect for Citrix XenDesktop (in Beta now) – by combining NexentaConnect with Citrix XenDesktop software and DAS plus local SSD storage, one can accelerate XenDesktop IO and ease the management of XenDesktop storage.

Nexenta has teamed up with Dell to offer Dell-Nexenta (and VMware) storage solution using NexentaConnect and VMware Virtual SAN software on Dell hardware.

NexentaEdge™

They spent a lot of time on NexentaEdge and what they plan to offer is a software defined object storage solution. Most object storage systems on the market either started as software only or currently support a software only version. But Nexenta is the first to come at it from a file services heritage that I know of.

NexentaEdge will offer iSCSI services as well as standard object storage services such as Amazon S3 and OpenStack SWIFT. Their solution splits up objects into chunks and replicates/distributes the object chunks across their software defined (object) storage cluster.

Cluster communications uses UDP (not TCP) and so has less overhead. NexentaEdge cluster communications uses their own Replicast protocol to send messages and data out across the cluster. .

They designed NexentaEdge to be able to support Shingle Magnetic Recording (SMR) disks which are very dense storage but occasionally have to go “away” while they perform garbage collection/re-organization. I did two posts about SMR disks a while back (see Shingled magnetic recording disks and Sequential-only disk for more information on SMR).

I have to admit I had a BIG problem with support for iSCSI over eventually consistent storage. I don’t see how this can be used to support ACID database requests but I suppose Nexenta would argue that anyone using object storage for ACID database IO needs to have their head examined.

NexentaFusion™

Although this was not discussed as much, NexentaFusion is another future offering supplying software defined storage analytics and orchestration automation. They intent is to use NexentaFusion with NexentaStor, NexentaConnect and/or NexentaEdge. As you scale up your Nexenta storage cluster, automation/orchestration and storage analytics starts to become a more pressing need. According to Nexenta’s website NexentaFusion 1.0 will support multi-tennant storage monitoring and real time storage analytics while NexentaFusion 2.0 will supportstorage provisioning and orchestration.

~~~~

Nexenta provided Converse all-star shoes to all the participants as well as pens and notebooks. I had to admit I liked the look of the new tennis shoes but my wife and kids thought I was crazy.

Different views on Nexenta from the other SFD6 bloggers can be found below:

SFD6 – Day 2 – Nexenta from PenguinPunk (Dan Firth, @PenguinPunk)

Nexenta – Back in da house by Nigel Poulton (@NigelPoulton)

Sorry Nexenta, but I don’t get it … and questions arise by Juku (Enrico Signoretti, @ESignoretti)

Day 2 at SFD6: Nexenta by Absolutely Windows (John Obeto, @JohnObeto)

Data virtualization surfaces

Posted on January 9, 2015 by Ray in Block Storage, Cloud services, Clustered storage, DAS, Data, data access, data protection, Data QoS, Distributed computing, File Storage, Object storage, SSD storage, Storage, Storage virtualization, Strategy, Systems

There’s a new storage startup out of stealth, called Primary Data and it’s implementing data (note, not storage) virtualization.

They already have $60M in funding with some pretty highpowered talent from Fusion IO, namely David Flynn, Rick White and Steve Wozniak (the ‘Woz’) (also of Apple fame).

There have been a number of attempts at creating a virtualization layers for data namely ViPR (See my post ViPR virtues, vexations but no storage virtualization) but Primary Data is taking a different tack to the problem.

Data virtualization explained

Data hypervisor, software defined storage, data plane, control plane — (c) 2012 Silverton Consulting, Inc. All rights reserved

Essentially they want to separate the data plane from the control plane (See my Data Hypervisor post and comments for another view on this).

The data plane consists of those storage system activities that actually perform IO or read and writes.
The control plane is those storage system activities that do everything else that has to be done by a storage system, including provisioning, monitoring, and managing the storage.

Separating the data plane from the control plane offers a number of advantages. EMC ViPR does this but it’s data plane is either standard storage systems like VMAX, VNX, Isilon etc, or software defined storage solutions. Primary Data wants to do it all.

Their meta data or control plane engine is called a Data Director which holds information about the data objects that are stored in the Primary Data system, runs a data policy management engine and handles data migration.

Primary Data relies on purpose-built, Data Hypervisor (client) software that talks to Data Directors to understand where data objects reside and how to go about accessing them. But once the metadata information is transferred to the client SW, then IO activity can go directly between the host and the storage system in a protocol independent fashion.

[The graphic above is from my prior post and I assumed the data hypervisor (DH) would be co-located with the data but Primary Data has rightly implemented this as a separate layer in host software.]

Data Hypervisor protocol independence?

As I understand it this means that customers could use file storage, object storage or block storage to support any application requirement. This also means that file data (objects) could be migrated to block storage and still be accessed as file data. But the converse is also true, i.e., block data (objects) could be migrated to file storage and still be accessed as block data. You need to add object, DAS, PCIe flash and cloud storage to the mix to see where they are headed.

All data in Primary Data’s system are object encapsulated and all data objects are catalogued within a single, global namespace that spans file, block, object and cloud storage repositories

Data objects can reside on Primary storage systems, external non-Primary data aware file or block storage systems, DAS, PCIe Flash, and even cloud storage.

How does Data Virtualization compare to Storage Virtualization?

There are a number of differences:

Most storage virtualization solutions are in the middle of the data path and because of this have to be fairly significant, highly fault-tolerant solutions.
Most storage virtualization solutions don’t have a separate and distinct meta-data engine.
Most storage virtualization systems don’t require any special (data hypervisor) software running on hosts or clients.
Most storage virtualization systems don’t support protocol independent access to data storage.
Most storage virtualization systems don’t support DAS or server based, PCIe flash for permanent storage. (Yes this is not supported in the first release but the intent is to support this soon.)
Most storage virtualization systems support internal storage that resides directly inside the storage virtualization system hardware.
Most storage virtualization systems support an internal DRAM cache layer which is used to speed up IO to internal and external storage and is in addition to any caching done at the external storage system level.
Most storage virtualization systems only support external block storage.

There are a few similarities as well:

They both manage data migration in a non-disruptive fashion.
They both support automated policy management over data placement, data protection, data performance, and other QoS attributes.
They both support multiple vendors of external storage.
They both can support different host access protocols.

Data Virtualization Policy Management

A policy engine runs in the Data Directors and provides SLAs for data objects. This would include performance attributes, protection attributes, security requirements and cost requirements. Presumably, policy specifications for data protection would include RAID level, erasure coding level and geographic dispersion.

In Primary Data, backup becomes nothing more than object snapshots with different protection characteristics, like offsite full copy. Moreover, data object migration can be handled completely outboard and without causing data access disruption and on an automated policy basis.

Primary Data first release

Primary Data will be initially deployed as an integrated data virtualization solution which includes an all flash NAS storage system and a standard NAS system. Over time, Primary Data will add non-Primary Data external storage and internal storage (DAS, SSD, PCIe Flash).

The Data Policy Engine and Data Migrator functionality will be separately charged for software solutions. Data Directors are sold in pairs (active-passive) and can be non-disruptively upgraded. Storage (directors?) are also sold separately.

Data Hypervisor (client) software is available for most styles of Linux, Openstack and coming for ESX. Windows SMB support is not split yet (control plane/data plane) but Primary data does support SMB. I believe the Data Hypervisor software will also be released in an upcoming version of the Linux kernel.

They are currently in testing. No official date for GA but they did say they would announce pricing in 2015.

~~~~

Comments?

Disclosure: We have done work for Primary Data over the past year.

Photo Credits:

Screen shot of beta test system supplied by Primary Data
Graphic created by SCI for prior Data Hypervisor post

EMC acquisitions & other announcements at #EMCSummit last week

Posted on November 3, 2014 by Ray in Cloud services, Cloud storage, data protection, Mobile computing, Object storage

EMC’s recently announced the acquisition of Maginetics and Spanning, which are mainly for pushing data protection out to cloud storage and cloud storage data protection. The other major item to come out of EMC Global Analyst Summit last week was the announcement of EMC’s Hybrid Cloud Solution.

Magnetic for cloud onramp

Maginetics is intended to be yet another tier in the deep archive for Avatar, Data Domain and NetWorker but this one is intended to be in the cloud. MagFS Maginetic’s file system manages the file to object transition, provides their own deduplication and can replicate data to one or more cloud providers, even supporting different cloud services such as, AWS, Google Compute, Azzure, CleverSafe etc. I think ultimately this may be broadened beyond just data protection to be another tier for Unified storage to move to the cloud as well but that’s subject for another post.

How does Maginetics differs from another recent EMC acquisition, TwinStrata? TwinStrata is mainly targeted for primary storage moving a LUN to the cloud and maintaining data avaliability and something like reasonable responsiveness to the data that’s moved to the cloud. So where Maginetics is for data protection storage TwinStrata is for primary storage. Unclear where this leaves other file storage…

Spanning for protection of cloud data

Spanning is intended to be a data protection solution for data that’s born in the cloud. In this case, you can use Spanning to backup your cloud applications to different cloud service providers or even the same cloud service providers. Even if you don’t want to use Spanning to backup your cloud app’s data, with a cloud version of Data Protection Advisor (based somewhat on Spanning), you should be ultimately able to use it to monitor your current cloud provider’s replication/protection activities to insure they are copying and backing up your data properly across data center domains etc. In this way you can better monitor your cloud providers internal data replication/protection services.

EMC seems like it’s got a vision of where it intends to go and the cloud represents a significant new potential data stream and they want to be there to help protect it and use it to help protect other data.

EMC’s Hybrid Cloud announcement

The main announcement at the summit was on EMC’s Hybrid Cloud Offering which was pre-announced at EMC World last spring. With their Hybrid Cloud Offering making it easier for data centers to take advantage of the cloud to burst applications back and forth, EMC’s trying to cover anyway to use the cloud that makes sense.

EMC announced that their Hybrid Cloud solution will support a “federation” hybrid cloud solution based on VCE/VBLOCK or VSPEX and a software defined version based on ViPR storage controller. They also made a statement of direction to have their Hybrid Cloud solution support Microsoft Azure as well as OpenStack at some point in the future…

Well I think that about covers it for EMC cloud announcements from the EMC Global Summit last week.

Comments?