Tag Archives: Deduplication

34: GreyBeards talk Copy Data Management with Ash Ashutosh, CEO Actifio

In this episode, we talk with Ash Ashutosh (@ashashutosh), CEO of Actifio a copy data virtualization company. Howard met up with Ash at TechFieldDay11 (TFD11) a couple of weeks back and wanted another chance to talk with him.  Ash seems to have been around forever, the first time we met I was at a former employer and he was with AppIQ (later purchased by HP).  Actifio is populated by a number of industry veterans and since being founded in 2009 is doing really well, with over 1000 customers.

So what’s copy data virtualization (management) anyway?  At my former employer, we did an industry study that determined that IT shops (back in the 90’s) were making 9-13 copies of their data. These days,  IT is making, even more, copies of the exact same data.

Data copies proliferate like weeds

Engineers use snapshots for development, QA and validation. Analysts use data copies to better understand what’s going on in their customer-partner interactions, manufacturing activities, industry trends, etc. Finance, marketing , legal, etc. all have similar needs which just makes the number of data copies grow out of sight. And we haven’t even started to discuss backup.

Ash says things reached a tipping point when server virtualization become the dominant approach to running applications, which led to an ever increasing need for data copies as app’s started being developed and run all over the place. Then came along data deduplication which displaced tape in IT’s backup process, so that backup data (copies) now could reside on disk.  Finally, with the advent of disk deduplication, backups no longer had to be in TAR (backup) formats but could now be left in-app native formats. In native formats, any app/developer/analyst could access the backup data copy.

Actifio Copy Data Virtualization

So what is Actifio? It’s essentially a massively distributed object storage with a global name space, file system on top of it. Application hosts/servers run agents in their environments (VMware, SQL Server, Oracle, etc.) to provide change block tracking and other metadata as to what’s going on with the primary data to be backed up. So when a backup is requested, only changed blocks have to be transferred to Actifio and deduped. From that deduplicated change block backup, a full copy can be synthesized, in native format, for any and all purposes.

With change block tracking, backups become very efficient and deduplication only has to work on changed data so that also becomes more effective. Data copying can also be done more effectively since their only tracking deduplicated data. If necessary, changed blocks can also be applied to data copies to bring them up to date and current.

With Actifio, one can apply SLA’s to copy data. These SLA’s can take the form of data governance, such that some copies can’t be viewed outside the country, or by certain users. And they can also provide analytics on data copies. Both of these capabilities take copy data to whole new level.

We didn’t get into all Actifio’s offerings on the podcast but Actifio CDS is as a high availability appliance which runs their  object/file system and contains data storage. Actifio also comes in a virtual appliance as Actifio SKY, which runs as a VM under VMware, using anyone’s storage.  Actifio supports NFS, SMB/CIFS, FC, and iSCSI access to data copies, depending on the solution chosen. There’s a lot more information on their website.

It sounds a little bit like PrimaryData but focused on data copies rather than data migration and mostly tier 2 data access.

The podcast runs ~46 minutes and  covers a lot of ground. I spent most of the time asking Ash to explain Actifio (for Howard, TFD11 filled this in). Howard had some technical difficulties during the call which caused him to go offline but then came back on the call. Ash and I never missed him :), listen to the podcast to learn more.

Ash Ashutosh, CEO Actifio

Ash Ashutosh Hi Res copy-resizedAsh Ashutosh brings more than 25 years of storage industry and entrepreneurship experience to his role of CEO at Actifio. Ashutosh is a recognized leader and architect in the storage industry where he has spearheaded several major industry initiatives, including iSCSI and storage virtualization, and led the authoring of numerous storage industry standards. Ashutosh was most recently a Partner with Greylock Partners where he focused on making investments in enterprise IT companies. Prior to Greylock, he was Vice President and Chief Technologist for HP Storage.

Ashutosh founded and led AppIQ, a market leader of Storage Resource Management (SRM) solutions, which was acquired by HP in 2005. He was also the founder of Serano Systems, a Fibre Channel controller solutions provider, acquired by Vitesse Semiconductor in 1999. Prior to Serano, Ashutosh was Senior Vice President at StorageNetworks, the industry’s first Storage Service Provider. He previously worked as an architect and engineer at LSI and Intergraph.

GreyBeards deconstruct storage with Brian Biles and Hugo Patterson, CEO and CTO, Datrium

In this our 32nd episode we talk with Brian Biles (@BrianBiles), CEO & Co-founder and Hugo Patterson, CTO & Co-founder of Datrium a new storage startup. We like to call it storage deconstructed, a new view of what storage could be based on today and future storage technologies.  If I had to describe it succinctly, I would say it’s a hybrid between software defined storage, server side flash and external disk storage.  We have discussed server side flash before but this takes it to a whole another level.

Their product, the DVX consists of Hyperdriver host software and a NetShelf, external disk storage unit. The DVX was designed from the ground up based on the use of host/server side flash or non-volatile memory as a given and built everything else around that. I hesitate to say this but the DVX NetShelf backend storage is pretty unintelligent, just a dual controller disk storage with a multi-task coordinator. In contrast, the DVX Hyperdriver host software used to access their storage system is pretty smart and is installed as a VIB in vSphere. Customers can assign up to 8TB of host-based, server side flash/non-volatile memory to the storage system per server. The Datrium DVX does the rest.

The Hyperdriver leverages host flash, DRAM and compute cores to act as a caching layer for read and write IO and as a data management engine. Write data is write-thru straight from the server side flash to the NetShelf storage system which has Non-volatile DRAM (NVRAM) caching. Once write data is in NetShelf cache, it’s in two places, one on the host server side flash and the other in storage NVRAM. Reads are easier to handle, just being cached from the NetShelf storage in the server side flash. There’s no unique data residing in the hosts.

The Hyperdriver looks like a NFS mount to vSphere and the DVX uses a proprietary protocol to talk with the backend DVX NetShelf. Datrium supports up to 32 hosts and you can define the amount of Flash, DRAM and host compute allocated to the DVX Hyperdriver activity.

But the other interesting part about DVX is that much of the storage management functionality and storage control logic is partitioned between the host  Hyperdriver and NetShelf, with both participating to do what they do best.

For example,  disk rebuilds are done in combination with the host Hyperdriver. DVX RAID rebuild brings data from the backend into host cache, computes rebuild data and writes the reconstructed data back out to the NetShelf backend. This way rebuild performance can scale up with the number of hosts active in a cluster.

DVX data are compressed and deduplicated at the host before being sent to the NetShelf. The NetShelf backend also does a global deduplication on the host data. Hashing computations and data compression activities are all done on the host and passed on to the NetShelf.  Brian and Hugo were formerly with EMC Data Domain, and know all about data deduplication.

At the moment DVX is missing some storage functionality but they have an extensive roadmap with engineering resources to match and are plugging away at all of it. On the other hand, very few disk storage devices offer deduped/compressed data storage and warm server side caches during vMotion. They also support QoS functionality to limit the amount of host resources consumed by DVX Hyperdriver software

The podcast runs ~41 minutes and episode covers a lot of ground about how the new DVX product came about, how they separated storage functionality between host and backend and other aspects of DVX storage.  Listen to the podcast to learn more.

AAEAAQAAAAAAAAK8AAAAJGQyODQwNjg1LWI3NTMtNGY0OC04MGVmLTc5Nzg3N2IyMmEzYQBrian Biles, Datrium CEO & Co-founder

Prior to Datrium, Brian was Founder and VP of Product Mgmt. at EMC Backup Recovery Systems Division. Prior to that he was Founder, VP of Product Mgmt. and Business Development for Data Domain (acquired by EMC in 2009).

Hugo Patterson, Datrium CTO & Co-founderAAEAAQAAAAAAAANZAAAAJDhiMTI2NzMyLTdkZDAtNDE5Yy1hMTM5LTNiMWM2MWM3NTlmMA

Prior to Datrium, Hugo was an EMC Fellow serving as CTO of the EMC Backup Recovery Systems Division, and the Chief Architect and CTO of Data Domain (acquired by EMC in 2009), where he built the first deduplication storage system. Prior to that he was the engineering lead at NetApp, developing SnapVault, the first snap-and-replicate disk-based backup product. Hugo has a Ph.D. from Carnegie Mellon.