Data QoS – Silverton Consulting

Storageless data!?

Posted on March 10, 2021March 18, 2021 by Ray in data access, data mobility, Data QoS, Distributed computing, File Storage, Strategic Inflection Points

I (virtually) attended SFD21 earlier this year and a company called Hammerspace presented discussing their vision for storageless data (see videos of their session at SFD21).

We’ve talked them before but now they have something to offer the enterprise – data mobility or storageless data.

The white board after David Flynn’s session at SFD8

In essence, customers want to be able to run their workloads wherever it makes the most sense, on prem, in private cloud, and in the public cloud among other places. Historically, it’s been relatively painless to transfer an application’s binary from one to another data center, to a managed service provider or to the public cloud.

And with VMware Cloud Foundation, Kubernetes, Docker and Linux operating everywhere, the runtime environment and other OS services that applications depend on are pretty much available in any of those locations. So now customers have 2 out of 3, what’s left?

It’s all about the Data

Data can take a very long time to move around a data center, let alone across the web between locations. MBs and even GBs of data may be relatively painless to move, but TBs of data can be take days, and moving PBs of data is suicidal.

For instance, when we signed up for a globally accessible file synch and share storage service, I probably had 75GB or so of data I wanted managed. It took literally several days of time to upload this. Yes, I didn’t have data center class internet access, but even that might have only sped this up 2-5X. Ok, now try this with 1TB or more and it’s pretty much going to take days, and you can easily multiple that by 10 to do a PB or more. And that’s if it happens to continue to perform the transfer without disruption.

So what’s Hammerspace storageless data got to do with any of this.

Hammerspace’s idea

It’s been sort of a ground truth of storage, since I’ve been in the industry (40+ years now), that not all random IO data is accessed at the same frequency. That is, some data is accessed a lot and other data accessed hardly at all. That’s why DRAM caching of data can be so important to a host or storage system.

Similarly for sequential access, if you can get the first blocks of data to the host and then stream the rest in time, a storage system can appear to read fast.

Now I won’t go into all the tricks of doing good data caching, (the secret sauce to every vendor’s enterprise storage), but if you can appear to cache data well, you don’t actually need to transfer all the data associated with an application to a location it’s running in, you can appear as if all the data is there, when actually only some of it is present.

Essentially, Hammerspace creates a global file system for your data, across any locations you wish to use it, with great caching, optimized data transfer and with real storage behind it. Servers running your applications mount a Hammerspace file system/share that stitches together all the file storage behind it, across all the locations it’s operating in.

An application request goes to Hammerspace and if the data is not present there, Hammerspace goes and fetches and caches blocks of data as fast as it can. This will let the application start performing IO while the rest of the data is being cached and if allowed, moved to the new location.

Storage can be not managed by Hammerspace, read-write managed by Hammerspace or read-only managed by Hammerspace. For customers who want the whole Hammerspace storageless data functionality they would use read-write mode. For those who just want to access data elsewhere read-only would suffice. Customers who want to continue to access data directly but want read access globally, would use the read-only mode.

Once read-write storage is assigned to Hammerspace grabs all the file metadata information on the storage system. Once this process completes, customers no longer access this file data directly, but rather must access it through Hammerspace. At that point, this data is essentially storageless and can be accessed wherever Hammerspace services are available.

How does Hammerspace do it

Behind the scenes is a lot of technology. Some of which is discussed in the SFD21 sessions (see video’s above). Hammerspace is not in the data path but rather in the control path of data access. But it does orchestrate data movement, and it does route data IO requests from an application to where the data (currently) resides.

Hammerspace also supports Service Level Objectives (SLOs) for performance, geolocation, security, data protection options,, etc. These can be used to keep data in particular regions, to encrypt data (using KMIP), ensure high performance, high data availability, etc.

Hammerspace can manage data across 32 separate sites. It takes a couple of hours to deploy. per site. Each site has a Hammerspace metadata service with standalone access to all data within that site. For example, standalone access could be used, in the event of a network loss.

At the moment, they support eventual consistency and don’t support a global lock service. Rather, Hammerspace uses a conflict resolution service in the event data is overwritten by two or more applications. For any file that was being updated in two or more locations, that file would be flagged as in conflict, Hammerspace would provide snapshots of the various versions of the file(s) and it would require some sort of manual intervention to resolve the conflict. Each location would have (temporary) access to the data it had written directly, but at some point the conflict would need resolution.

They also support NFS and SMB file access for the front end and use object storage services for backend data. Data is copied on demand to the local site’s storage when accessed based on the SLO policies in effect for it. During data movement it is copied up, temporarily into objects on AWS, Microsoft Azure, or GCP, and then copied down to the location it’s being moved to. I believe this temporary object data is encrypted and compressed. Hammerspace support KMIP key providers.

Pricing for Hammerspace is on a managed capacity basis. But anyone can use Hammerspace for up to 10TB for free. Hammerspace is available in AWS marketplace for configuration there.

~~~~

Well it’s been a long time coming, but it appears to be here. Any customers wanting hybrid-cloud operations or global access to their data would be remiss to not check out Hammerspace.

[Edited after posting, The Eds.]

Is hardware innovation accelerating – hardware vs. software innovation (round 6)

Posted on December 17, 2020April 8, 2021 by Ray in Artificial Intelligence, Brain emulation, Cognitive computing, Data compression, Data QoS, Data transmission, Deep Learning, Ethernet, Infiniband, Market dynamics, Neuromorphic, NVMe, Processing performance, Strategic Inflection Points

There’s something happening to the IT industry, that maybe has not happened in a couple of decades or so but hardware innovation is back. We’ve been covering bits and pieces of it in our hardware vs software innovation series (see Open source ASiCs – HW vs. SW innovation [round 5] post).

But first please take our new poll:

Hardware innovation never really went away, Intel, AMD, Apple and others had always worked on new compute chips. DRAM and NAND also have taken giant leaps over the last two decades. These were all major hardware suppliers. But special purpose chips, non CPU compute engines, and hardware accelerators had been relegated to the dustbins of history as the CPU giants kept assimilating their functionality into the next round of CPU chips.

And then something happened. It kind of made sense for GPUs to be their own electronics as these were SIMD architectures intrinsically different than SISD, standard von Neumann X86 and ARM CPUs architectures

But for some reason it didn’t stop there. We first started seeing some inklings of new hardware innovation in the AI space with a number of special purpose DL NN accelerators coming online over the last 5 years or so (see Google TPU, SC20-Cerebras, GraphCore GC2 IPU chip, AI at the Edge Mythic and Syntiants IPU chips, and neuromorphic chips from BrainChip, Intel, IBM , others). Again, one could look at these as taking the SIMD model of GPUs into a slightly different direction. It’s probably one reason that GPUs were so useful for AI-ML-DL but further accelerations were now possible.

But it hasn’t stopped there either. In the last year or so we have seen SPUs (Nebulon Storage), DPUs (Fungible, NVIDIA Networking, others), and computational storage (NGD Systems, ScaleFlux Storage, others) all come online and become available to the enterprise. And most of these are for more normal workload environments, i.e., not AI-ML-DL workloads,

I thought at first these were just FPGAs implementing different logic but now I understand that many of these include ASICs as well. Most of these incorporate a standard von Neumann CPU (mostly ARM) along with special purpose hardware to speed up certain types of processing (such as low latency data transfer, encryption, compression, etc.).

What happened?

It’s pretty easy to understand why non-von Neumann computing architectures should come about. Witness all those new AI-ML-DL chips that have become available. And why these would be implemented outside the normal X86-ARM CPU environment.

But SPU, DPUs and computational storage, all have typical von Neumann CPUs (mostly ARM) as well as other special purpose logic on them.

Why?

I believe there are a few reasons, but the main two are that Moore’s law (every 2 years halving the size of transistors, effectively doubling transistor counts in same area) is slowing down and Dennard scaling (as you reduce the size of transistors their power consumption goes down and speed goes up) has stopped almost. Both of these have caused major CPU chip manufacturers to focus on adding cores to boost performance rather than just adding more transistors to the same core to increase functionality.

This hasn’t stopped adding instruction functionality to each CPU, but it has slowed considerably. And single (core) processor speeds (GHz) have reached a plateau.

But what it has stopped is having the real estate available on a CPU chip to absorb lots of additional hardware functionality. Which had been the case since the 1980’s.

I was talking with a friend who used to work on math co-processors, like the 8087, 80287, & 80387 that performed floating point arithmetic. But after the 486, floating point logic was completely integrated into the CPU chip itself, killing off the co-processors business.

Hardware design is getting easier & chip fabrication is becoming a commodity

We wrote a post a couple of weeks back talking about an open foundry (see HW vs. SW innovation round 5 noted above)that would take a hardware design and manufacture the ASICs for you for free (or at little cost). This says that the tool chain to perform chip design is becoming more standardized and much less complex. Does this mean that it takes less than 18 months to create an ASIC. I don’t know but it seems so.

But the real interesting aspect of this is that world class foundries are now available outside the major CPU developers. And these foundries, for a fair but high price, would be glad to fabricate a 1000 or million chips for you.

Yes your basic state of the art fab probably costs $12B plus these days. But all that has meant is that A) they will take any chip design and manufacture it, B) they need to keep the factory volume up by manufacturing chips in order to amortize the FAB’s high price and C) they have to keep their technology competitive or chip manufacturing will go elsewhere.

So chip fabrication is not quite a commodity. But there’s enough state of the art FABs in existence to make it seem so.

But it’s also physics

The extremely low latencies that are available with NVMe storage and, higher speed networking (100GbE & above) are demanding a lot more processing power to keep up with. And just the physics of how long it takes to transfer data across a distance (aka racks) is starting to consume too much overhead and impacting other work that could be done.

When we start measuring IO latencies in under 50 microseconds, there’s just not a lot of CPU instructions and task switching that can go on anymore. Yes, you could devote a whole core or two to this process and keep up with it. But wouldn’t the data center be better served keeping that core busy with normal work and offloading that low-latency, realtime (like) work to a hardware accelerator that could be executing on the network rather than behind a NIC.

So real time processing has become faster, or rather the amount of time to execute CPU instructions to switch tasks and to process data that needs to be done in realtime to keep up with faster line speed is becoming shorter.

So that explains DPUs, smart NICS, DPUs, & SPUs. What about the other hardware accelerator cards.

AI-ML-DL is becoming such an important and data AND compute intensive workload that just like GPUs before them, TPUs & IPUs are becoming a necessary evil if we want to service those workloads effectively and expeditiously.
Computational storage is becoming more wide spread because although data compression can be easily done at the CPU, it can be done faster (less data needs to be transferred back and forth) at the smart Drive.

My guess we haven’t seen the end of this at all. When you open up the possibility of having a long term business model, focused on hardware accelerators there would seem to be a lot of stuff that needs to be done and could be done faster and more effectively outside the core CPU.

There was a point over the last decade where software was destined to “eat the world”. I get a lot of flack for saying that was BS and that hardware innovation is really eating the world. Now that hardtware innovation’s back, it seems to be a little of both.

Comments?

Photo Credits:

Cerebras chip, Cerebras (see SC20 post)
Mythic architecture, Mythic computing (see AI at the edge post)
TPU2-iot, Google (see TPU post)
130nm layouts (see Open source ASICs post)
Moore’s law chart – wikipedia, By Max Roser – https://ourworldindata.org/uploads/2019/05/Transistor-Count-over-time-to-2018.png, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=79751151

Data virtualization surfaces

Posted on January 9, 2015 by Ray in Block Storage, Cloud services, Clustered storage, DAS, Data, data access, data protection, Data QoS, Distributed computing, File Storage, Object storage, SSD storage, Storage, Storage virtualization, Strategy, Systems

There’s a new storage startup out of stealth, called Primary Data and it’s implementing data (note, not storage) virtualization.

They already have $60M in funding with some pretty highpowered talent from Fusion IO, namely David Flynn, Rick White and Steve Wozniak (the ‘Woz’) (also of Apple fame).

There have been a number of attempts at creating a virtualization layers for data namely ViPR (See my post ViPR virtues, vexations but no storage virtualization) but Primary Data is taking a different tack to the problem.

Data virtualization explained

Essentially they want to separate the data plane from the control plane (See my Data Hypervisor post and comments for another view on this).

The data plane consists of those storage system activities that actually perform IO or read and writes.
The control plane is those storage system activities that do everything else that has to be done by a storage system, including provisioning, monitoring, and managing the storage.

Separating the data plane from the control plane offers a number of advantages. EMC ViPR does this but it’s data plane is either standard storage systems like VMAX, VNX, Isilon etc, or software defined storage solutions. Primary Data wants to do it all.

Their meta data or control plane engine is called a Data Director which holds information about the data objects that are stored in the Primary Data system, runs a data policy management engine and handles data migration.

Primary Data relies on purpose-built, Data Hypervisor (client) software that talks to Data Directors to understand where data objects reside and how to go about accessing them. But once the metadata information is transferred to the client SW, then IO activity can go directly between the host and the storage system in a protocol independent fashion.

[The graphic above is from my prior post and I assumed the data hypervisor (DH) would be co-located with the data but Primary Data has rightly implemented this as a separate layer in host software.]

Data Hypervisor protocol independence?

As I understand it this means that customers could use file storage, object storage or block storage to support any application requirement. This also means that file data (objects) could be migrated to block storage and still be accessed as file data. But the converse is also true, i.e., block data (objects) could be migrated to file storage and still be accessed as block data. You need to add object, DAS, PCIe flash and cloud storage to the mix to see where they are headed.

All data in Primary Data’s system are object encapsulated and all data objects are catalogued within a single, global namespace that spans file, block, object and cloud storage repositories

Data objects can reside on Primary storage systems, external non-Primary data aware file or block storage systems, DAS, PCIe Flash, and even cloud storage.

How does Data Virtualization compare to Storage Virtualization?

There are a number of differences:

Most storage virtualization solutions are in the middle of the data path and because of this have to be fairly significant, highly fault-tolerant solutions.
Most storage virtualization solutions don’t have a separate and distinct meta-data engine.
Most storage virtualization systems don’t require any special (data hypervisor) software running on hosts or clients.
Most storage virtualization systems don’t support protocol independent access to data storage.
Most storage virtualization systems don’t support DAS or server based, PCIe flash for permanent storage. (Yes this is not supported in the first release but the intent is to support this soon.)
Most storage virtualization systems support internal storage that resides directly inside the storage virtualization system hardware.
Most storage virtualization systems support an internal DRAM cache layer which is used to speed up IO to internal and external storage and is in addition to any caching done at the external storage system level.
Most storage virtualization systems only support external block storage.

There are a few similarities as well:

They both manage data migration in a non-disruptive fashion.
They both support automated policy management over data placement, data protection, data performance, and other QoS attributes.
They both support multiple vendors of external storage.
They both can support different host access protocols.

Data Virtualization Policy Management

A policy engine runs in the Data Directors and provides SLAs for data objects. This would include performance attributes, protection attributes, security requirements and cost requirements. Presumably, policy specifications for data protection would include RAID level, erasure coding level and geographic dispersion.

In Primary Data, backup becomes nothing more than object snapshots with different protection characteristics, like offsite full copy. Moreover, data object migration can be handled completely outboard and without causing data access disruption and on an automated policy basis.

Primary Data first release

Primary Data will be initially deployed as an integrated data virtualization solution which includes an all flash NAS storage system and a standard NAS system. Over time, Primary Data will add non-Primary Data external storage and internal storage (DAS, SSD, PCIe Flash).

The Data Policy Engine and Data Migrator functionality will be separately charged for software solutions. Data Directors are sold in pairs (active-passive) and can be non-disruptively upgraded. Storage (directors?) are also sold separately.

Data Hypervisor (client) software is available for most styles of Linux, Openstack and coming for ESX. Windows SMB support is not split yet (control plane/data plane) but Primary data does support SMB. I believe the Data Hypervisor software will also be released in an upcoming version of the Linux kernel.

They are currently in testing. No official date for GA but they did say they would announce pricing in 2015.

~~~~

Comments?

Disclosure: We have done work for Primary Data over the past year.

Photo Credits:

Screen shot of beta test system supplied by Primary Data
Graphic created by SCI for prior Data Hypervisor post

Fall SNWUSA wrap-up

Posted on October 19, 2012January 21, 2013 by Ray in Block Storage, Cloud storage, Data QoS, Disk storage, Ethernet, FC, File Storage, iSCSI, SSD storage, Storage, Storage performance, System effectiveness

Attended SNWUSA this week in San Jose, It’s hard to see the show gradually change when you attend each one but it does seem that the end-user content and attendance is increasing proportionally. This should bode well for future SNWs. Although, there was always a good number of end users at the show but the bulk of the attendees in the past were from storage vendors.

Another large storage vendor dropped their sponsorship. HDS no longer sponsors the show and the last large vendor still standing at the show is HP. Some of this is cyclical, perhaps the large vendors will come back for the spring show, next year in Orlando, Fl. But EMC, NetApp and IBM seemed to have pretty much dropped sponsorship for the last couple of shows at least.

SSD startup of the show

Skyhawk hardware (c) 2012 Skyera, all rights reserved (from their website)

The best, new SSD startup had to be Skyera. A 48TB raw flash dual controller system supporting iSCSI block protocol and using real commercial grade MLC. The team at Skyera seem to be all ex-SandForce executives and technical people.

Skyera’s team have designed a 1U box called the Skyhawk, with a phalanx of NAND chips, there own controller(s) and other logic as well. They support software compression and deduplication as well as a special designed RAID logic that claims to reduce extraneous write’s to something just over 1 for RAID 6, dual drive failure equivalent protection.

Skyera’s underlying belief is that just as consumer HDAs took over from the big monster 14″ and 11″ disk drives in the 90’s sooner or later commercial NAND will take over from eMLC and SLC. And if one elects to stay with the eMLC and SLC technology you are destined to be one to two technology nodes behind. That is, commercial MLC (in USB sticks, SD cards etc) is currently manufactured with 19nm technology. The EMLC and SLC NAND technology is back at 24 or 25nm technology. But 80-90% of the NAND market is being driven by commercial MLC NAND. Skyera came out this past August.

Coming in second place was Arkologic an all flash NAS box using SSD drives from multiple vendors. In their case a fully populated rack holds about 192TB (raw?) with an active-passive controller configuration. The main concern I have with this product is that all their metadata is held in UPS backed DRAM (??) and they have up to 128GB of DRAM in the controller.

Arkologic’s main differentiation is supporting QOS on a file system basis and having some connection with a NIC vendor that can provide end to end QOS. The other thing they have is a new RAID-AS which is special designed for Flash.

I just hope their USP is pretty hefty and they don’t sell it someplace where power is very flaky, because when that UPS gives out, kiss your data goodbye as your metadata is held nowhere else – at least that’s what they told me.

Cloud storage startup of the show

There was more cloud stuff going on at the show. Talked to at least three or four cloud gateway providers. But the cloud startup of the show had to be Egnyte. They supply storage services that span cloud storage and on premises storage with an in band or out-of-band solution and provide file synchronization services for file sharing across multiple locations. They have some hooks into NetApp and other major storage vendor products that allows them to be out-of-band for these environments but would need to be inband for other storage systems. Seems an interesting solution that if succesful may help accelerate the adoption of cloud storage in the enterprise, as it makes transparent whether storage you access is local or in the cloud. How they deal with the response time differences is another question.

Different idea startup of the show

The new technology showplace had a bunch of vendors some I had never heard of before but one that caught my eye was Actifio. They were at VMworld but I never got time to stop by. They seem to be taking another shot at storage virtualization. Only in this case rather than focusing on non-disruptive file migration they are taking on the task of doing a better job of point in time copies for iSCSI and FC attached storage.

I assume they are in the middle of the data path in order to do this and they seem to be using copy-on-write technology for point-in-time snapshots. Not sure where this fits, but I suspect SME and maybe up to mid-range.

Most enterprise vendors have solved these problems a long time ago but at the low end, it’s a little more variable. I wish them luck but although most customers use snapshots if their storage has it, those that don’t, seem unable to understand what they are missing. And then there’s the matter of being in the data path?!

~~~~

If there was a hybrid startup at the show I must have missed them. Did talk with Nimble Storage and they seem to be firing on all cylinders. Maybe someday we can do a deep dive on their technology. Tintri was there as well in the new technology showcase and we talked with them earlier this year at Storage Tech Field Day.

The big news at the show was Microsoft purchasing StorSimple a cloud storage gateway/cache. Apparently StorSimple did a majority of their business with Microsoft’s Azure cloud storage and it seemed to make sense to everyone.

The SNIA suite was hopping as usual and the venue seemed to work well. Although I would say the exhibit floor and lab area was a bit to big. But everything else seemed to work out fine.

On Wednesday, the CIO from Dish talked about what it took to completely transform their IT environment from a management and leadership perspective. Seemed like an awful big risk but they were able to pull it off.

All in all, SNW is still a great show to learn about storage technology at least from an end-user perspective. I just wish some more large vendors would return once again, but alas that seems to be a dream for now.

Data hypervisor

Posted on August 14, 2012 by Ray in Block Storage, Clustered storage, DAS, Data grid, data logistics, Data QoS, Data transmission, Disk storage, File Storage, Server virtualization, Storage architecture, Storage Features, Storage virtualization, Strategic Inflection Points, System effectiveness

(c) 2012 Silverton Consulting, Inc. All rights reserved

With all this talk of software defined networking and server virtualization where does storage virtualization stand. I blogged about some problems with storage virtualization a week or so ago in my post on Storage Utilization is broke and this post takes it to the next level. Also I was at a financial analyst conference this week in Vail where I heard Mark Lewis of Tekrocket but formerly of EMC discuss the need for a data hypervisor to provide software defined storage.

I now believe what we really need for true storage virtualization is a renewed focus on data hypervisor functionality. The data hypervisor would need both a control plane and a data plane in order to function properly. Ideally the control plane would set up the interface and routing for the data plane hardware and the server and/or backend storage would be none the wiser.

DMs everywhere

I envision a scenario where a customer’s application data is packaged with a data hypervisor which runs on a commodity data switch hardware with data plane and control plane software running on it. Sort of creating (virtual) data machines or DMs.

All enterprise and nowadays most midrange storage provide most of the functionality of a storage control plane such as defining units of storage, setting up physical to logical storage mapping, incorporating monitoring, and management of the physical storage layer, etc. So control planes are pervasive in today’s storage but proprietary.

In addition most storage systems have data plane functionality which operates to connect a host IO request to the actual data which resides in backend storage or internal cache. But again although data planes are everywhere in storage today they are all proprietary to a specific vendor’s storage system.

Data switch needed

But in order to utilize a data hypervisor and create a more general purpose control plane layer, we need a more generic data plane layer that operates on commodity hardware. This is different from today’s SAN storage switches or DCB switches but similar in a some ways.

The functions of the data switch/data plane layer would be to take routing instructions from the control plane layer and direct the server IO request to the proper storage unit using the data plane layer. Somewhere in this world view, probably at the data plane level it would introduce data protection services like RAID or other erasure coding schemes, point in time copy/clone services and replication services and other advanced storage features needed by enterprise storage today.

Also it would need to provide some automated storage movement across and within tiers of physical storage and it would connect server storage interfaces at the front end to storage interfaces at the backend. Not unlike SAN or DCB switches but with much more advanced functionality.

Ideally data switch storage interfaces could attach to dedicated JBOD, Flash arrays as well as systems using DAS storage. In addition, it would be nice if the data switch could talk to real storage arrays on SAN, IP/SANs or NFS&CIFS/SMB storage systems.

The other thing one would like out of a data switch is support for a universal translator that would map one protocol to another, such as iSCSI to SAS, NFS to FC, or FC to NFS and any other combination, depending on the needs of the server and the storage in the configuration.

Now if the data switch were built on top of commodity x86 hardware and software with the data switch as just a specialized application that would create the underpinnings for a true data hypervisor with a control and data plane that could be independent and use anybody’s storage.

Data hypervisor

Assuming all this were available then we would have true storage virtualization. With these capabilities, storage could be repurposed on the fly, added to, subtracted from, and in general be a fungible commodity not unlike server processing MIPs under VMware or Hyper-V.

Application data would then needed to be packaged into a data machine which would offer all the host services required to support host data access. The data hypervisor would handle the linkages required to interface with the control and data layers.

Applications could be configured to utilize available storage at ease and storage could grow, shrink or move to accommodate the required workload just as easily as VMs can be deployed today.

How we get there

Aside from the VMware, Citrix, Microsoft thrusts towards virtual storage there are plenty of storage virtualization solutions that can control most backend enterprise SAN storage. However, the problem with these solutions is that in general the execute only on a specific vendors hardware and don’t necessarily talk to DAS or JBOD storage.

In addition, not all of the current generation storage virtualization solutions are unified. That is most of these today only talk FC, FCoE or iSCSI and don’t support NFS or CIFS/SMB.

These don’t appear to be insurmountable obstacles and with proper allocation of R&D funding, could all be solved.

However the more problematic is that none of these solutions operate on commodity hardware or commodity software.

The hardware is probably the easiest to deal with. Today many enterprise storage systems are built ontop of x86 processor storage controllers. Albeit sometimes they incorporate specialized packaging for redundancy and high availability.

The harder problem may be commodity software. Although the genesis for a few storage virtualization systems might come from BSD or other “commodity” software operating systems. They have been modified over the years to no longer represent anything that can run on standard off the shelf operating systems.

Then again some storage virtualization systems started out with special home grown hardware and software. As such, converting these over to something more commodity oriented would be a major transition.

But the challenge is how to get there from here and would anyone want to take this on. The other problem is that the value add that storage vendors supply currently would be somewhat eroded. Not unlike what happened to proprietary Unix systems with the advent of VMware.

But this will not take place overnight and the company that takes this on and makes a go at it can have a significant software monopoly that would be hard to crack.

Perhaps it will take a startup to do this but I believe the main enterprise storage vendors are best positioned to take this on.

Comments?

SSD news roundup

Posted on November 8, 2011April 10, 2012 by Ray in Block Storage, Cloud services, Cloud storage, Data QoS, Data reduction, Ethernet, SSD storage, Storage Features, Storage performance, Strategic Inflection Points, System effectiveness

The NexGen n5 Storage System (c) 2011 NexGen Storage, All Rights Reserved

NexGen comes out of stealth

NexGen Storage a local storage company came out of stealth today and is also generally available. Their storage system has been in beta since April 2011 and is in use by a number of customers today.

Their product uses DRAM caching, PCIe NAND flash, and nearline SAS drives to provide guaranteed QoS for LUN I/O. The system can provision IOP rate, bandwidth and (possibly) latency over a set of configured LUNs. Such provisioning can change using policy management on a time basis to support time-based tiering. Also, one can prioritize how important the QoS is for a LUN so that it could be guaranteed or could be sacrificed to support performance for other storage system LUNs.

The NexGen storage provides a multi-tiered hybrid storage system that supports 10GBE iSCSI, and uses MLC NAND PCIe card to boost performance for SAS nearline drives. NexGen also supports data deduplication which is done during off-peak times to reduce data footprint.

DRAM replacing Disk!?

In a report by ARS Technica, a research group out of Stanford is attempting to gang together server DRAM to create a networked storage system. There have been a number of attempts to use DRAM as a storage system in the past but the Stanford group is going after it in a different way by aggregating together DRAM across a gaggle of servers. They are using standard disks or SSDs for backup purposes because DRAM is, of course, a volatile storage device but the intent is to keep all in memory to speed up performance.

I was at SNW USA a couple of weeks ago talking to a Taiwanese company that was offering a DRAM storage accelerator device which also used DRAM as a storage service. Of course, Texas Memory Systems and others have had DRAM based storage for a while now. The cost for such devices was always pretty high but the performance was commensurate.

In contrast, the Stanford group is trying to use commodity hardware (servers) with copious amounts of DRAM, to create a storage system. The article seems to imply that the system could take advantage of unused DRAM, sitting around your server farm. But, I find it hard to believe that. Most virtualized server environments today are running lean on memory and there shouldn’t be a lot of excess DRAM capacity hanging around.

The other achilles heel of the Stanford DRAM storage is that it is highly dependent on low latency networking. Although Infiniband probably qualifies as low latency, it’s not low latency enough to support this systems IO workloads. As such, they believe they need even lower latency networking than Infiniband to make it work well.

OCZ ups the IOP rate on their RevoDrive3 Max series PCIe NAND storage

Speaking of PCIe NAND flash, OCZ just announced speedier storage, upping the random read IO rates up to 245K from the 230K IOPS offered in their previous PCIe NAND storage. Unclear what they did to boost this but, it’s entirely possible that they have optimized their NAND controller to support more random reads.

OCZ announces they will ship TLC SSD storage in 2012

OCZ’s been busy. Now that the enterprise is moving to adopt MLC and eMLC SSD storage, it seems time to introduce TLC (3-bits/cell) SSDs. With TLC, the price should come down a bit more (see chart in article), but the endurance should also suffer significantly. I suppose with the capacities available with TLC and enough over provisioning OCZ can make a storage device that would be reliable enough for certain applications at a more reasonable cost.

I never thought I would see MLC in enterprise storage so, I suppose at some point even TLC makes sense, but I would be even more hesitant to jump on this bandwagon for awhile yet.

Solid Fire obtains more funding

Early last week Solid Fire, another local SSD startup obtained $25M in additional funding. Solid Fire, an all SSD storage system company, is still technically in beta but expect general availability near the end of the year. We haven’t talked about them before in RayOnStorage but they are focusing on cloud service providers with an all SSD solution which includes deduplication. I promise to talk about them some more when they reach GA.

LaCIE introduces a Little Big Disk, a Thunderbolt SSD

Finally, in the highend consumer space, LaCie just released a new SSD which attaches to servers/desktops using the new Apple-Intel Thunderbolt IO interface. Given the expense (~$900) for 128GB SSD, it seems a bit much but if you absolutely have to have the performance this may be the only way to go.

—-

Well that’s about all I could find on SSD and DRAM storage announcements. However, I am sure I missed a couple so if you know one I should have mentioned please comment.

NewSQL and the curse of Old SQL database systems

Posted on July 8, 2011July 8, 2011 by Ray in Cloud services, data access, Data analytics, Data availability, Data QoS, Distributed computing, Information economy, System effectiveness

database 2 by Tim Morgan (cc) (from Flickr)

There was some twitter traffic yesterday on how Facebook was locked into using MySQL (see article here) and as such, was having to shard their MySQL database across 1000s of database partitions and memcached servers in order to keep up with the processing load.

The article indicated that this was painful, costly and time consuming. Also they said Facebook would be better served moving to something else. One answer was to replace MySQL with recently emerging, NewSQL database technology.

One problem with old SQL database systems is they were never architected to scale beyond a single server. As such, multi-server transactional operations was always a short-term fix to the underlying system, not a design goal. Sharding emerged as one way to distribute the data across multiple RDBMS servers.

What’s sharding?

Relational database tables are sharded by partitioning them via a key. By hashing this key one can partition a busy table across a number of servers and use the hash function to lookup where to process/access table data. An alternative to hashing is to use a search lookup function to determine which server has the table data you need and process it there.

In any case, sharding causes a number of new problems. Namely,

Cross-shard joins – anytime you need data from more than one shard server you lose the advantages of distributing data across nodes. Thus, cross-shard joins need to be avoided to retain performance.
Load balancing shards – to spread workload you need to split the data by processing activity. But, knowing ahead of time what the table processing will look like is hard and one weeks processing may vary considerably from the next weeks load. As such, it’s hard to load balance shard servers.
Non-consistent shards – by spreading transactions across multiple database servers and partitions, transactional consistency can no longer be guaranteed. While for some applications this may not be a concern, traditional RDBMS activity is consistent.

These are just some of the issues with sharding and I am certain there are more.

What about Hadoop projects and its alternatives?

One possibility is to use Hadoop and its distributed database solutions. However, Hadoop systems were not intended to be used for transaction processing. Nonetheless, Cassandra and HyperTable (see my post on Hadoop – Part 2) can be used for transaction processing and at least Casandra can be tailored to any consistency level. But both Cassandra and HyperTable are not really meant to support high throughput, consistent transaction processing.

Also, the other, non-Hadoop distributed database solutions support data analytics and most are not positioned as transaction processing systems (see Big Data – Part 3). Although Teradata might be considered the lone exception here and can be a very capable transaction oriented database system in addition to its data warehouse operations. But it’s probably not widely distributed or scaleable above a certain threshold.

The problems with most of the Hadoop and non-Hadoop systems above mainly revolve around the lack of support for ACID transactions, i.e., atomic, consistent, isolated, and durable transaction processing. In fact, most of the above solutions relax one or more of these characteristics to provide a scaleable transaction processing model.

NewSQL to the rescue

There are some new emerging database systems that are designed from the ground up to operate in distributed environments called “NewSQL” databases. Specifically,

Clustrix – is a MySQL compatible replacement, delivered as a hardware appliance that can be distributed across a number of nodes that retains fully ACID transaction compliance.
GenieDB – is a NoSQL and SQL based layered database that is consistent (atomic), available and partition tolerant (CAP) but not fully ACID compliant, offers a MySQL and popular content management systems plugins that allow MySQL and/or CMSs to execute using GenieDB clusters with minimal modification.
NimbusDB – is a client-cloud based SQL service which distributes copies of data across multiple nodes and offers a majority of SQL99 standard services.
VoltDB – is a fully SQL compatible, ACID compliant, distributed, in-memory database system offered as a software only solution executing on 64bit CentOS system but is compatible with any POSIX-compliant, 64bit Linux platform.
Xeround – is a cloud based, MySQL compatible replacement delivered as a (Amazon, Rackspace and others) service offering that provides ACID compliant transaction processing across distributed nodes.

I might be missing some, but these seem to be the main ones today. All the above seem to take a different tack to offer distributed SQL services. Some of the above relax ACID compliance in order to offer distributed services. But for all of them distributed scale out performance is key and they all offer purpose built, distributed transactional relational database services.

—–

RDBMS technology has evolved over the last century and have had at least ~35 years of running major transactional systems. But todays hardware architecture together with web scale performance requirements stretch these systems beyond their original design envelope. As such, NewSQL database systems have emerged to replace old SQL technology, with a new, intrinsically distributed system architecture providing high performing, scaleable transactional database services for today and the foreseeable future.

Comments?

SolidFire supplies scale-out SSD storage for cloud service providers

Posted on June 21, 2011April 10, 2012 by Ray in Block Storage, Data compression, Data QoS, Data reduction, Ethernet, SSD storage, Storage architecture, Storage performance, Strategic Inflection Points

SolidFire SF3010 node (c) 2011 SolidFire (from their website)

I was talking with a local start up called SolidFire the other day with an interesting twist on SSD storage. They were targeting cloud service providers with a scale-out, cluster based SSD iSCSI storage system. Apparently a portion of their team had come from Lefthand (now owned by HP) another local storage company and the rest came from Rackspace, a national cloud service provider.

The hardware

Their storage system is a scale-out cluster of storage nodes that can range from 3 to a theoretical maximum of 100 nodes (validated node count ?). Each node comes equipped with 2-2.4GHz, 6-core Intel processors and 10-300GB SSDs for a total of 3TB raw storage per node. Also they have 8GB of non-volatile DRAM for write buffering and 72GB read cache resident on each node.

The system also uses 2-10GbE links for host to storage IO and inter-cluster communications and support iSCSI LUNs. There are another 2-1GigE links used for management communications.

SolidFire states that they can sustain 50K IO/sec per node. (This looks conservative from my viewpoint but didn’t state any specific R:W ratio or block size for this performance number.)

The software

They are targeting cloud service providers and as such the management interface was designed from the start as a RESTful API but they also have a web GUI built out of their API. Cloud service providers will automate whatever they can and having a RESTful API seems like the right choice.

QoS and data reliability

The cluster supports 100K iSCSI LUNs and each LUN can have a QoS SLA associated with it. According to SolidFire one can specify a minimum/maximum/burst level for IOPS and a maximum or burst level for throughput at a LUN granularity.

With LUN based QoS, one can divide cluster performance into many levels of support for multiple customers of a cloud provider. Given these unique QoS capabilities it should be relatively easy for cloud providers to support multiple customers on the same storage providing very fine grained multi-tennancy capabilities.

This could potentially lead to system over commitment, but presumably they have some way to ascertain over commitment is near and not allowing this to occur.

Data reliability is supplied through replication across nodes which they call Helix(tm) data protection. In this way if an SSD or node fails, it’s relatively easy to reconstruct the lost data onto another node’s SSD storage. Which is probably why the minimum number of nodes per cluster is set at 3.

Storage efficiency

Aside from the QoS capabilities, the other interesting twist from a customer perspective is that they are trying to price an all-SSD storage solution at the $/GB of normal enterprise disk storage. They believe their node with 3TB raw SSD storage supports 12TB of “effective” data storage.

They are able to do this by offering storage efficiency features of enterprise storage using an all SSD configuration. Specifically they provide,

Thin provisioned storage – which allows physical storage to be over subscribed and used to support multiple LUNs when space hasn’t completely been written over.
Data compression – which searches for underlying redundancy in a chunk of data and compresses it out of the storage.
Data deduplication – which searches multiple blocks and multiple LUNs to see what data is duplicated and eliminates duplicative data across blocks and LUNs.
Space efficient snapshot and cloning – which allows users to take point-in-time copies which consume little space useful for backups and test-dev requirements.

Tape data compression gets anywhere from 2:1 to 3:1 reduction in storage space for typical data loads. Whether SolidFire’s system can reach these numbers is another question. However, tape uses hardware compression and the traditional problem with software data compression is that it takes lots of processing power and/or time to perform it well. As such, SolidFire has configured their node hardware to dedicate a CPU core to each physical disk drive (2-6 core processors for 10 SSDs in a node).

Deduplication savings are somewhat trickier to predict but ultimately depends on the data being stored in a system and the algorithm used to deduplicate it. For user home directories, typical deduplication levels of 25-40% are readily attainable. SolidFire stated that their deduplication algorithm is their own patented design and uses a small fixed block approach.

The savings from thin provisioning ultimately depends on how much physical data is actually consumed on a storage LUN but in typical environments can save 10-30% of physical storage by pooling non-written or free storage across all the LUNs configured on a storage system.

Space savings from point-in-time copies like snapshots and clones depends on data change rates and how long it’s been since a copy was made. But, with space efficient copies and a short period of existence, (used for backups or temporary copies in test-development environments) such copies should take little physical storage.

Whether all of this can create a 4:1 multiplier for raw to effective data storage is another question but they also have a eScanner tool which can estimate savings one can achieve in their data center. Apparently the eScanner can be used by anyone to scan real customer LUNs and it will compute how much SolidFire storage will be required to support the scanned volumes.

—–

There are a few items left on their current road map to be delivered later, namely remote replication or mirroring. But for now this looks to be a pretty complete package of iSCSI storage functionality.

SolidFire is currently signing up customers for Early Access but plan to go GA sometime around the end of the year. No pricing was disclosed at this time.

I was at SNIA’s BoD meeting the other week and stated my belief that SSDs will ultimately lead to the commoditization of storage. By that I meant that it would be relatively easy to configure enough SSD hardware to create a 100K IO/sec or 1GB/sec system without having to manage 1000 disk drives. Lo and behold, SolidFire comes out the next week. Of course, I said this would happen over the next decade – so I am off by a 9.99 years…

Comments?