Pure Storage surfaces

1 controller X 1 storage shelf (c) 2011 Pure Storage (from their website)
1 controller X 1 storage shelf (c) 2011 Pure Storage (from their website)

We were talking with Pure Storage last week, another SSD startup which just emerged out of stealth mode today.  Somewhat like SolidFire which we discussed a month or so ago, Pure Storage uses only SSDs to provide primary storage.  In this case, they are supporting a FC front end, with an all SSDs backend, and implementing internal data deduplication and compression, to try to address the needs of enterprise tier 1 storage.

Pure Storage is in final beta testing with their product and plan to GA sometime around the end of the year.

Pure Storage hardware

Their system is built around MLC SSDs which are available from many vendors but with a strategic investment from Samsung, currently use that vendor’s storage.  As we know, MLC has write endurance limitations but Pure Storage was built from the ground up knowing they were going to use this technology and have built their IP to counteract these issues.

The system is available in one or two controller configurations, with an Infiniband interconnect between the controllers, 6Gbps SAS backend, 48GB of DRAM per controller for caching purposes, and NV-RAM for power outages.  Each controller has 12-cores supplied by 2-Intel Xeon processor chips.

With the first release they are limiting the controllers to one or two (HA option) but their storage system is capable of clustering together many more, maybe even up to 8-controllers using the Infiniband back end.

Each storage shelf provides 5.5TB of raw storage using 2.5″ 256GB MLC SSDs.  It looks like each controller can handle up to 2-storage shelfs with the HA (dual controller option) supporting 4 drive shelfs for up to 22TB of raw storage.

Pure Storage Performance

Although these numbers are not independently verified, the company says a single controller (with 1-storage shelf) they can do 200K sustained 4K random read IOPS, 2GB/sec bandwidth, 140K sustained write IOPS, or 500MB/s of write bandwidth.  A dual controller system (with 2-storage shelfs) can achieve 300K random read IOPS, 3GB/sec bandwidth, 180K write IOPS or 1GB/sec of write bandwidth.  They also claim that they can do all this IO with an under 1 msec. latency.

One of the things they pride themselves on is consistent performance.  They have built their storage such that they can deliver this consistent performance even under load conditions.

Given the amount of SSDs in their system this isn’t screaming performance but is certainly up there with many enterprise class systems sporting over 1000 disks.  The random write performance is not bad considering this is MLC.  On the other hand the sequential write bandwidth is probably their weakest spec and reflects their use of MLC flash.

Purity software

One key to Pure Storage (and SolidFire for that matter) is their use of inline data compression and deduplication. By using these techniques and basing their system storage on MLC, Pure Storage believes they can close the price gap between disk and SSD storage systems.

The problems with data reduction technologies is that not all environments can benefit from them and they both require lots of CPU power to perform well.  Pure Storage believes they have the horsepower (with 12 cores per controller) to support these services and are focusing their sales activities on those (VMware, Oracle, and SQL server) environments which have historically proven to be good candidates for data reduction.

In addition, they perform a lot of optimizations in their backend data layout to prolong the life of MLC storage. Specifically, they use a write chunk size that matches the underlying MLC SSDs page width so as not to waste endurance with partial data writes.  Also they migrate old data to new locations occasionally to maintain “data freshness” which can be a problem with MLC storage if the data is not touched often enough.  Probably other stuff as well, but essentially they are tuning their backend use to optimize endurance and performance of their SSD storage.

Furthermore, they have created a new RAID 3D scheme which provides an adaptive parity scheme based on the number of available drives that protects against any dual SSD failure.  They provide triple parity, dual parity for drive failures and another parity for unrecoverable bit errors within a data payload.  In most cases, a failed drive will not induce an immediate rebuild but rather a reconfiguration of data and parity to accommodate the failing drive and rebuild it onto new drives over time.

At the moment, they don’t have snapshots or data replication but they said these capabilities are on their roadmap for future delivery.

—-

In the mean time, all SSD storage systems seem to be coming out of the wood work. We mentioned SolidFire, but WhipTail is another one and I am sure there are plenty more in stealth waiting for the right moment to emerge.

I was at a conference about two months ago where I predicted that all SSD systems would be coming out with little of the engineering development of storage systems of yore. Based on the performance available from a single SSD, one wouldn’t need 100s of SSDs to generate 100K IOPS or more.  Pure Storage is doing this level of IO with only 22 MLC SSDs and a high-end, but essentially off-the-shelf controller.

Just imagine what one could do if you threw some custom hardware at it…

Comments?

Data compression lives on

Macroblocking: demolish the eerie ▼oid by Rosa Menkman (cc) (from Flickr)
Macroblocking: demolish the eerie ▼oid by Rosa Menkman (cc) (from Flickr)

Last week NetApp announced the availability of data compression on many of their unified storage platforms, which includes block and file storage.  Earlier this year EMC announced data compression for LUNs on CLARiion and Celerra.  I must commend both of them for re-integrating data compression back into primary storage systems, missing since IBM and Sun stopped marketing RVA and SVA.

Data compression algorithms

Essentially data compression is an algorithm that eliminates redundancy in data streams.  Data compression can be “lossy” or “loss-less”.  Data compression in storage subsystems is typically loss-less which means that the original data can be reconstructed without any loss of information.  One sees lossy algorithms in video/audio data compression which doesn’t impact video/audio fidelity unless it results in significant loss of data.

One simple example of loss-less data compression is Run-Length Encoding which substitutes a trigger, count, and character string for any character repeated more than 4 times in a block of data.  This compresses well any text strings with lots of blanks, numerical data with lot’s of 0’s and initial format data written with a repeating character.

There are other, more sophisticated compression algorithms like Huffman Coding, which identify the most frequent bytes in a block of data and replace these bytes with shorter bit patterns. For example if ~50% of the characters in a text file are the letters “a”, “e”, “i”, “o”, “t”, and “n” (see Wikipedia, Frequency Analysis) then these characters can take up much less space if we encode them 4 or less bits rather than the 8-bits in a byte.

I am certain that both EMC and NetApp are using much more sophisticated algorithms than either of these and it wouldn’t surprise me to know they are using something like the open source algorithm like Zlib (gzip) or Bzip2 (see my Poor deduplication with Oracle RMAN compressed backups post for an explanation) which uses Huffman Coding and adds even more sophistication.  Data compression algorithms like these could offer something like 50% compression, i.e., your data could be stored in 50% less space.

Data compression is often confused with Data Deduplication but it’s not the same. Deduplication looks for duplicate data across different data blocks and files while data compression is strictly only examining the data stream within a block or file and doesn’t depend on any other data.

Storage system data compression

In the past, data compression was relegated to a separate appliance, tape storage systems, and/or host software.  By integrating these algorithms into their main storage engines, both NetApp and EMC are taking advantage of the recent processor speed increases being embedding into their systems to offer offline functionality for online data.

Historically, compression algorithms such as these were implemented in hardware but nowadays they can easily be done in software by being relegated to operate during off-peak IO time or execute as the lowest priority task in the storage system. As such, there can be no guarantee when your data will finally be compressed but it will be compressed eventually.

Data compression like this is great for data that isn’t modified frequently.  It takes some processing time to compress data and as such, would need to be repeated after every modification of a compressed block or file.  So if the data isn’t modified that much, compression’s processing cost could be amortized over longer data lifetimes.

Further, data compression must be undone at read time, i.e., the data needs to be de-compressed and handed off to the IO requesting it.  De-compression is a much faster algorithm than compression because in the case of something like Huffman Coding the character dictionary is already known and as such, it’s just a matter of table lookup and bit field isolation. It would be convenient if this data were sitting in the system DRAM someplace but lacking that, moving it from cache to DRAM could be done quickly enough, processed there, and then moved back before final transfer to the requesting IO.

As such, data compression may impact response time for compressed data reads.  How much of an impact is yet TBD.

Data writes will not be impacted at all because the compression activity is done much later. Whether the data stays in cache until compressed or is brought back in at some later time is another algorithm question which may impact cache hit rates/compression performance but this doesn’t have to be a serious impediment.

NetApp is able to offer this capability for both block and file storage because of it’s WAFL backend data structure which essentially allows it to create variable length blocks for file and block data.  EMC only offers this for LUN data (block storage) as of yet but it’s probably just a matter of time before it’s available for other data as well.

Any questions?

Primary storage compression can work

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)
Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

Since IBM’s announced their intent to purchase StorWize there has been much discussion on whether primary storage data compression can be made to work.  As far as I know StorWize only offered primary storage compression for file data but there is nothing that prohibits doing something similar for block storage as long as you have some control over how blocks are laid down on disk.

Although secondary block  data compression has been around for years in enterprise tape and more recently with some deduplication appliances, primary storage compression pre-dates secondary storage compression.  STK delivered primary storage data compression with Iceberg in the early 90’s but it wasn’t until a couple of years later that they introduced compression on tape.

In both primary and secondary storage, data compression works to reduce the space needed to store data.  Of course, not all data compresses well, most notably image data (as it’s already compressed) but compression ratios of 2:1 were common for primary storage of that time and are normal for today’s secondary storage.  I see no reason why such ratios couldn’t be achieved for current primary storage block data.

Implementing primary block storage data compression

There is significant interest in implementing deduplication for primary storage as NetApp has done but supporting data compression is not much harder.  I believe much of the effort to deduplicate primary storage lies in creating a method to address partial blocks out of order, which I would call data block virtual addressing which requires some sort of storage pool.  The remaining effort to deduplicate data involves implementing the chosen (dedupe) algorithm, indexing/hashing, and other administrative activities.  These later activities aren’t readily transferable to data compression but the virtual addressing and space pooling should be usable by data compression.

Furthermore, block storage thin provisioning requires some sort of virtual addressing as does automated storage tiering.  So in my view, once you have implemented some of these advanced capabilities, implementing data compression is not that big a deal.

The one question that remains is does one implement compression with hardware or software (see Better storage through hardware for more). Considering that most deduplication is done via software today it seems that data compression in software should be doable.  The compression phase could run in the background sometime after the data has been stored.  Real time decompression using software might take some work, but would cost considerably less than any hardware solution.  Although the intensive bit fiddling required to perform data compression/decompression may argue for some sort of hardware assist.

Data compression complements deduplication

The problem with deduplication is that it needs duplicate data.  This is why it works so well for secondary storage (backing up the same data over and over) and for VDI/VMware primary storage (with duplicated O/S data).

But data compression is an orthogonal or complementary technique which uses the inherent redundancy in information to reduce storage requirements.  For instance, something like LZ compression takes advantage of the fact that in text some letters occur more often than others (see letter frequency). For instance, in English, ‘e’, ‘t’, ‘a’, ‘o’, ‘i’, and ‘n, represent over 50% of the characters in most text documents.  By using shorter bit combinations to encode these letters one can reduce the bit-length of any (English) text string substantially.  Another example is run length encoding which takes any repeated character and substitutes a trigger character, the character itself, and a count of the number of repetitions for the repeated string.

Moreover, the nice thing about data compression is that all these techniques can be readily combined to generate even better compression rates.  And of course compression could be applied after deduplication to reduce storage footprint even more.

Why would any vendor compress data?

For a couple of reasons:

  • Compression not only reduces storage footprint but with hardware assist it can also increase storage throughput. For example, if 10GB of data compresses down to 5GB, it should take ~1/2 the time to read.
  • Compression reduces the time it would take time to clone, mirror or replicate.
  • Compression increases the amount of data that could be stored which should incentivise them to pay more for your storage.

In contrast, with data compression vendors might may sell less storage.  But the advantages of enterprise storage is in the advanced functionality/features and higher reliability/availability/performance that are available.  I see data compression as just another advantages to enterprise class storage and as a feature, the user could enable or disable it and see how well it works for there data.

What do you think?