Pure Storage surfaces

1 controller X 1 storage shelf (c) 2011 Pure Storage (from their website)
1 controller X 1 storage shelf (c) 2011 Pure Storage (from their website)

We were talking with Pure Storage last week, another SSD startup which just emerged out of stealth mode today.  Somewhat like SolidFire which we discussed a month or so ago, Pure Storage uses only SSDs to provide primary storage.  In this case, they are supporting a FC front end, with an all SSDs backend, and implementing internal data deduplication and compression, to try to address the needs of enterprise tier 1 storage.

Pure Storage is in final beta testing with their product and plan to GA sometime around the end of the year.

Pure Storage hardware

Their system is built around MLC SSDs which are available from many vendors but with a strategic investment from Samsung, currently use that vendor’s storage.  As we know, MLC has write endurance limitations but Pure Storage was built from the ground up knowing they were going to use this technology and have built their IP to counteract these issues.

The system is available in one or two controller configurations, with an Infiniband interconnect between the controllers, 6Gbps SAS backend, 48GB of DRAM per controller for caching purposes, and NV-RAM for power outages.  Each controller has 12-cores supplied by 2-Intel Xeon processor chips.

With the first release they are limiting the controllers to one or two (HA option) but their storage system is capable of clustering together many more, maybe even up to 8-controllers using the Infiniband back end.

Each storage shelf provides 5.5TB of raw storage using 2.5″ 256GB MLC SSDs.  It looks like each controller can handle up to 2-storage shelfs with the HA (dual controller option) supporting 4 drive shelfs for up to 22TB of raw storage.

Pure Storage Performance

Although these numbers are not independently verified, the company says a single controller (with 1-storage shelf) they can do 200K sustained 4K random read IOPS, 2GB/sec bandwidth, 140K sustained write IOPS, or 500MB/s of write bandwidth.  A dual controller system (with 2-storage shelfs) can achieve 300K random read IOPS, 3GB/sec bandwidth, 180K write IOPS or 1GB/sec of write bandwidth.  They also claim that they can do all this IO with an under 1 msec. latency.

One of the things they pride themselves on is consistent performance.  They have built their storage such that they can deliver this consistent performance even under load conditions.

Given the amount of SSDs in their system this isn’t screaming performance but is certainly up there with many enterprise class systems sporting over 1000 disks.  The random write performance is not bad considering this is MLC.  On the other hand the sequential write bandwidth is probably their weakest spec and reflects their use of MLC flash.

Purity software

One key to Pure Storage (and SolidFire for that matter) is their use of inline data compression and deduplication. By using these techniques and basing their system storage on MLC, Pure Storage believes they can close the price gap between disk and SSD storage systems.

The problems with data reduction technologies is that not all environments can benefit from them and they both require lots of CPU power to perform well.  Pure Storage believes they have the horsepower (with 12 cores per controller) to support these services and are focusing their sales activities on those (VMware, Oracle, and SQL server) environments which have historically proven to be good candidates for data reduction.

In addition, they perform a lot of optimizations in their backend data layout to prolong the life of MLC storage. Specifically, they use a write chunk size that matches the underlying MLC SSDs page width so as not to waste endurance with partial data writes.  Also they migrate old data to new locations occasionally to maintain “data freshness” which can be a problem with MLC storage if the data is not touched often enough.  Probably other stuff as well, but essentially they are tuning their backend use to optimize endurance and performance of their SSD storage.

Furthermore, they have created a new RAID 3D scheme which provides an adaptive parity scheme based on the number of available drives that protects against any dual SSD failure.  They provide triple parity, dual parity for drive failures and another parity for unrecoverable bit errors within a data payload.  In most cases, a failed drive will not induce an immediate rebuild but rather a reconfiguration of data and parity to accommodate the failing drive and rebuild it onto new drives over time.

At the moment, they don’t have snapshots or data replication but they said these capabilities are on their roadmap for future delivery.

—-

In the mean time, all SSD storage systems seem to be coming out of the wood work. We mentioned SolidFire, but WhipTail is another one and I am sure there are plenty more in stealth waiting for the right moment to emerge.

I was at a conference about two months ago where I predicted that all SSD systems would be coming out with little of the engineering development of storage systems of yore. Based on the performance available from a single SSD, one wouldn’t need 100s of SSDs to generate 100K IOPS or more.  Pure Storage is doing this level of IO with only 22 MLC SSDs and a high-end, but essentially off-the-shelf controller.

Just imagine what one could do if you threw some custom hardware at it…

Comments?

Is FC dead?!

SNIA Tech Center Computer Lab 2 switching hw (c) 2011 Silverton Consulting, Inc.
SNIA Tech Center Computer Lab 2 switching hw (c) 2011 Silverton Consulting, Inc.

Was at the Pacific Crest/Mosaic annual conference cocktail hour last night surrounded by a bunch of iSCSI/NAS storage vendors and they made the statement that FC is dead.

Apparently, 40GbE is just around the corner and 10GbE cards have started a steep drop in price and are beginning to proliferate through the enterprise.  The vendors present felt that an affordable 40GbE that does iSCSI and/or FCoE would be the death knell for FC as we know it.

As evidence they point to Brocade’s recent quarterly results that shows their storage business is in decline, down 5-6% YoY for the quarter. In contrast, Brocade’s Ethernet business is up this quarter 12-13% YoY (albeit, from a low starting point).  Further confusing the picture, Brocade is starting to roll out 16Gbps FC  (16GFC) while the storage market is still trying to ingest the changeover to 8Gbps FC.

But do we need the bandwidth?

One question is do we need 16GFC or even 40GbE for the enterprise today.  Most vendors speak of the high bandwidth requirements for server virtualization as a significant consumer of enterprise bandwidth.  But it’s unclear to me whether this is reality or just the next wave of technology needing to find a home.

Let’s consider for the moment what 16GFC and 40GbE can do for data transfer. If we assume ~10 bits pre byte then:

  • 16GFC can provide 1.6GB/s of data transfer,
  • 40GbE can provide 4GB/s of data transfer.

Using Storage Performance Council’s SPC-2 results the top data transfer subsystem (IBM DS8K) is rated at 9.7GB/s so with 40GbE it could use about 3 links and for the 16GFC it would be able to sustain this bandwidth using about 7 links.

So there’s at least one storage systems out there that can utilize the extreme bandwidth that such interfaces supply.

Now as for the server side nailing down the true need is a bit harder to do.  Using Amdahl’s IO law, which states there is 1 IO for every 50K instructions, and with Intel’s Core I7 Extreme edition rated at 159KMips, it should be generating about 3.2M IO/s and at 4KB per IO this would be about 12GB/sec.  So the current crop of high processors seem able to consume this level of bandwidth, if present.

FC or Ethernet?

Now the critical question, which interface does the data center use to provide that bandwidth.  The advantages of FC are becoming less so over time as FCoE becomes more widely adopted and any speed advantage that FC had should go away with the introduction of data center 40GbE.

The other benefit that Ethernet offers is a “single data center backbone” which can handle all network/storage traffic.  Many large customers are almost salivating at the possibility of getting by with a single infrastructure for everything vs. having to purchase and support separate cabling, switches and server cards to use FC.

On the other hand, having separate networks, segregated switching, isolation between network and storage traffic can provide better security, availability, and reliability that are hard to duplicate with a single network.

To summarize, one would have to say is that there are some substantive soft benefits to having both Ethernet and FC infrastructure but there are hard cost and operational advantages to having a single infrastructure based on 10GbE or hopefully, someday 40GbE.

—-

So I would have to conclude that FC’s days are numbered especially when 40GbE becomes affordable and thereby, widely adopted in the data center.

Comments?

Why Open-FCoE is important

FCoE Frame Format (from Wikipedia, http://en.wikipedia.org/wiki/File:Ff.jpg)
FCoE Frame Format (from Wikipedia, http://en.wikipedia.org/wiki/File:Ff.jpg)

I don’t know much about O/S drivers but I do know lots about storage interfaces. One thing that’s apparent from yesterday’s announcement from Intel is that Fibre Channel over Ethernet (FCoE) has taken another big leap forward.

Chad Sakac’s chart of FC vs. Ethernet target unit shipments (meaning, storage interface types, I think) clearly indicate a transition to ethernet is taking place in the storage industry today. Of course Ethernet targets can be used for NFS, CIFS, Object storage, iSCSI and FCoE so this doesn’t necessarily mean that FCoE is winning the game, just yet.

WikiBon did a great post on FCoE market dynamics as well.

The advantage of FC, and iSCSI for that matter, is that every server, every OS, and just about every storage vendor in the world supports them. Also there are plethera of economical, fabric switches available from multiple vendors that can support multi-port switching with high bandwidth. And there many support matrixes, identifying server-HBAs, O/S drivers for those HBA’s and compatible storage products to insure compatibility. So there is no real problem (other than wading thru the support matrixes) to implementing either one of these storage protocols.

Enter Open-FCoE, the upstart

What’s missing from 10GBE FCoE is perhaps a really cheap solution, one that was universally available, using commodity parts and could be had for next to nothing. The new Open-FCoE drivers together with the Intels x520 10GBE NIC has the potential to answer that need.

But what is it? Essentially Intel’s Open-FCoE is an O/S driver for Windows and Linux and a 10GBE NIC hardware from Intel. It’s unclear whether Intel’s Open-FCoE driver is a derivative of the Open-FCoe.org’s Linux driver or not but either driver works to perform some of the FCoE specialized functions in software rather than hardware as done by CNA cards available from other vendors. Using server processing MIPS rather than ASIC processing capabilities should make FCoE adoption in the long run, even cheaper.

What about performance?

The proof of this will be in benchmark results but it’s quite possible to be a non-issue. Especially, if there is not a lot of extra processing involved in a FCoE transaction. For example, if Open-FCoE only takes let’s say 2-5% of server MIPS and bandwidth to perform the added FCoE frame processing then this might be in the noise for most standalone servers and would showup only minimally in storage benchmarks (which always use big, standalone servers).

Yes, but what about virtualization?

However real world, virtualized servers is another matter. I believe that virtualized servers generally demand more intensive I/O activity anyway and as one creates 5-10 VMs, ESX server, it’s almost guaranteed to have 5-10X the I/O happening. If each standalone VM requires 2-5% of a standalone processor to perform Open-FCoE processing, then it could easily represent 5-7 X 2-5% on a 10VM ESX server (assumes some optimization for virtualization, if virtualization degrades driver processing, it could be much worse), which would represent a serious burden.

Now these numbers are just guesses on my part but there is some price to pay for using host server MIPs for every FCoE frame and it does multiply for use with virtualized servers, that much I can guarantee you.

But the (storage) world is better now

Nonetheless, I must applaud Intel’s Open-FCoE thrust as it can open up a whole new potential market space that today’s CNAs maybe couldn’t touch. If it does that, it introduces low-end systems to the advantages of FCoE then as they grow moving their environments to real CNAs should be a relatively painless transition. And this is where the real advantage lies, getting smaller data centers on the right path early in life will make any subsequent adoption of hardware accelerated capabilities much easier.

But is it really open?

One problem I am having with the Intel announcement is the lack of other NIC vendors jumping in. In my mind, it can’t really be “open” until any 10GBE NIC can support it.

Which brings us back to Open-FCoE.org. I checked their website and could see no listing for a Windows driver and there was no NIC compatibility list. So, I am guessing their work has nothing to do with Intel’s driver, at least as presently defined – too bad

However, when Open-FCoE is really supported by any 10GB NIC, then the economies of scale can take off and it could really represent a low-end cost point for storage infrastructure.

Unclear to me what Intel has special in their x520 NIC to support Open-FCoE (maybe some TOE H/W with other special sauce) but anything special needs to be defined and standardized to allow broader adoption by other Vendors. Then and only then will Open-FCoE reach it’s full potential.

—-

So great for Intel, but it could be even better if a standardized definition of an “Open-FCoE NIC” were available, so other NIC manufacturers could readily adopt it.

Comments?

Top 10 storage technologies over the last decade

Aurora's Perception or I Schrive When I See Technology by Wonderlane (cc) (from Flickr)
Aurora's Perception or I Schrive When I See Technology by Wonderlane (cc) (from Flickr)

Some of these technologies were in development prior to 2000, some were available in other domains but not in storage, and some were in a few subsystems but had yet to become popular as they are today.  In no particular order here are my top 10 storage technologies for the decade:

  1. NAND based SSDs – DRAM and other technology solid state drives (SSDs) were available last century but over the last decade NAND Flash based devices have dominated SSD technology and have altered the storage industry forever more.  Today, it’s nigh impossible to find enterprise class storage that doesn’t support NAND SSDs.
  2. GMR head– Giant Magneto Resistance disk heads have become common place over the last decade and have allowed disk drive manufacturers to double data density every 18-24 months.  Now GMR heads are starting to transition over to tape storage and will enable that technology to increase data density dramatically
  3. Data DeduplicationDeduplication technologies emerged over the last decade as a complement to higher density disk drives as a means to more efficiently backup data.  Deduplication technology can be found in many different forms today, ranging from file and block storage systems, backup storage systems, to backup software only solutions.
  4. Thin provisioning – No one would argue that thin provisioning emerged last century but it took the last decade to really find its place in the storage pantheon.  One almost cannot find a data center class storage device that does not support thin provisioning today.
  5. Scale-out storage – Last century if you wanted to get higher IOPS from a storage subsystem you could add cache or disk drives but at some point you hit a subsystem performance wall.  With scale-out storage, one can now add more processing elements to a storage system cluster without having to replace the controller to obtain more IO processing power.  The link reference talks about the use of commodity hardware to provide added performance but scale-out storage can also be done with non-commodity hardware (see Hitachi’s VSP vs. VMAX).
  6. Storage virtualizationserver virtualization has taken off as the dominant data center paradigm over the last decade but a counterpart to this in storage has also become more viable as well.  Storage virtualization was originally used to migrate data from old subsystems to new storage but today can be used to manage and migrate data over PBs of physical storage dynamically optimizing data placement for cost and/or performance.
  7. LTO tape When IBM dominated IT in the mid to late last century, the tape format dejour always matched IBM’s tape technology.  As the decade dawned, IBM was no longer the dominant player and tape technology was starting to diverge into a babble of differing formats.  As a result, IBM, Quantum, and HP put their technology together and created a standard tape format, called LTO, which has become the new dominant tape format for the data center.
  8. Cloud storage Unclear just when over the last decade cloud storage emerged but it seemed to be a supplement to cloud computing that also appeared this past decade.  Storage service providers had existed earlier but due to bandwidth limitations and storage costs didn’t survive the dotcom bubble. But over this past decade both bandwidth and storage costs have come down considerably and cloud storage has now become a viable technological solution to many data center issues.
  9. iSCSI SCSI has taken on many forms over the last couple of decades but iSCSI has the altered the dominant block storage paradigm from a single, pure FC based SAN to a plurality of technologies.  Nowadays, SMB shops can have block storage without the cost and complexity of FC SANs over the LAN networking technology they already use.
  10. FCoEOne could argue that this technology is still maturing today but once again SCSI has taken opened up another way to access storage. FCoE has the potential to offer all the robustness and performance of FC SANs over data center Ethernet hardware simplifying and unifying data center networking onto one technology.

No doubt others would differ on their top 10 storage technologies over the last decade but I strived to find technologies that significantly changed data storage that existed in 2000 vs. today.  These 10 seemed to me to fit the bill better than most.

Comments?