A couple of weeks ago we met with Primary Data, Lance Smith, CEO, David Flynn, CTO and Kaycee Lai, SVP Product & Sales who were presenting at Storage Field Day 8 (SFD8, videos of their sessions available here). Primary Data has just emerged out of stealth late last year and has ~$60M in funding. Also they have Steve Wozniak (of Apple fame) as Chief Scientist, but he wasn’t at the SFD8 session 🙁
Primary Data seems out to change the world. At first I thought this was just another form of storage virtualization but they are laser focused on data virtualization or what they call data mobility. It differs from pure storage virtualization by being outside the data path. (I have written about data virtualization before as well as the data hypervisor a long time ago). Nowadays they seem to be using the tag line of data in motion.
Why move data?
David has a theory behind the proliferation of startup storage companies. The spectrum behind capacity and performance has gotten immense, over time, which has provided an opening for a number of companies to address these widening needs.
David believes that caching at the storage system or in the servers is an attempt to address this issue by “loaning” the data from the storage silo to the cache. This is trying to supply a lower cost $/IOP for the data. Similar considerations are apparent at the other side where customer’s use archive or backup services to take advantage of much cheaper $/GB storage.
However, given the difficulty of moving data around in present day storage environments, customer data has become essentially immobile. Primary Data is trying to bring about a data mobility revolution and allow data to move over this spectrum of performance and capacity of storage with ease. Doing so easily, will provide significant benefits as customers can more fully take advantage of the various levels of performance and capacity in their data center storage environments.
Primary Data architecture
Primary Data is providing data mobility by using their meta-data service called the DataSphere appliance and their client software running on host servers called the Data Portal. Their offering can be best explained in three layers:
- Data virtualization layer – provides continuity of identity and continuity of access across multiple physical storage systems. That is the same data (identity continuity) can be accessed wherever it resides (access continuity) by server applications. Such access and identity must transcend access protocols and interfaces. The Data Portal client software intercepts the server data activity and does control plane activity using the DataSphere appliance and performs IO directly using the physical storage.
- Objective based data management – supplies a data affinity service. That is data can have a temporary location relationship with physical storage depending on the current performance (R:W, IOPS, bandwidth, latency) and protection (durability, availability, disaster recoverability, security, copy-ability, version-ability) characteristics of the data. These data objectives are matched to the capabilities or service catalog of the storage infrastructure and data objectives can change over time
- Analytics in the loop – detects the performance and other characteristics of the storage and data in real-time. That is by monitoring the storage IO activity Primary Data can determine the actual performance attribute of the storage. Similarly, by monitoring the applications IO characteristics over time the system can determine the performance objectives of its data. The system also takes advantage of SMI-S to define some of the other characteristics of the storage systems.
How does Primary Data work?
Primary Data has taken advantage of parallel NFS extensions (pNFS) in NFSv4 to externalize and separate the storage control plane from the IO data plane. This works well for native Linux where the main developer of the Linux file system stack is on their payroll.
In Windows they put a filter driver in front of SMB to split off the control from data IO plane. Something similar is done for VMware ESX servers to supply the control-data plane split but in this case there is a software defined Data Portal that goes along with the DataSphere Service client that can do it all within the same ESX server. Another alternative exists and that is to use the Data Portal appliance as a storage virtualization service but then the IO data path goes through the portal.
According to their datasheet they currently support data virtualization services for NetApp cDOT and 7-mode, EMC Isilon OneFS7.2, and Nexenta 4.x&5.0 but plan on more.
They are not quite GA yet, but are close.