HP Tech Day – StoreServ Flash Optimizations

Attended HP Tech Field Day late last month in Disneyland. Must say the venue was the best ever for HP, and getting in on Nth Generation Conference was a plus. Sorry it has taken so long for me to get around to writing about it.

We spent a day going over HP’s new converged storage, software defined storage and other storage topics. HP has segmented the Software Defined Data Center (SDDC) storage requirements into cost optimized, Software Defined Storage and SLA optimized, Service Refined Storage. Under Software Defined storage they talked about their StoreVirtual product line which is an outgrowth of the Lefthand Networks VSA, first introduced in 2007. This June, they extended SDS to include their StoreOnce VSA product to go after SMB and ROBO backup storage requirements.

We also discussed some of HP’s OpenStack integration work to integrate current HP block storage into OpenStack Cinder. They discussed some of the integrations they plan for file and object store as well.

However what I mostly want to discuss in this post is the session discussing how HP StoreServ 3PAR had optimized their storage system for flash.

They showed an SPC-1 chart depicting various storage systems IOPs levels and response times as they ramped from 10% to 100% of their IOPS rate. StoreServ 3PAR’s latest entry showed a considerable band of IOPS (25K to over 250K) all within a sub-msec response time range. Which was pretty impressive since at the time no other storage systems seemed able to do this for their whole range of IOPS. (A more recent SPC-1 result from HDS with an all-flash VSP with Hitachi Accelerated Flash also was able to accomplish this [sub-msec response time throughout their whole benchmark], only in their case it reached over 600K IOPS – read about this in our latest performance report in our newsletter, sign up above right).

  • Adaptive Read – As I understood it, this changed the size of backend reads to match the size requested by the front end. For disk systems, one often sees that a host read of say 4KB often causes a read of 16KB from the backend, with the assumption that the host will request additional data after the block read off of disk and 90% of the time spent to do a disk read is getting the head to the correct track and once there it takes almost no effort to read more data. However with flash, there is no real effort to get to a proper location to read a block of flash data and as such, there is no advantage to reading more data than the host requests, because if they come back for more one can immediately read from the flash again.
  • Adaptive Write – Similar to adaptive read, adaptive write only writes the changed data to flash. So if a host writes a 4KB block then 4KB is written to flash. This doesn’t help much for RAID 5 because of parity updates but for RAID 1 (mirroring) this saves on flash writes which ultimately lengthens flash life.
  • Adaptive Offload (destage) – This changes the frequency of destaging or flushing cache depending on the level of write activity. Slower destaging allows written (dirty) data to accumulate in cache if there’s not much write activity going on, which means in RAID 5 parity may not need to be updated as one could potentially accumulate a whole stripe’s worth of data in cache. In low-activity situations such destaging could occur every 200 msecs. whereas with high write activity destaging could occur as fast as every 3 msecs.
  • Multi-tennant IO processing – For disk drives, with sequential reads, one wants the largest stripes possible (due to head positioning penalty) but for SSDs one wants the smallest stripe sizes possible. The other problem with large stripe sizes is that devices are busy during the longer sized IO while performing the stripe writes (and reads). StoreServ modified the stripe size for SSDs to be 32KB so that other IO activity need not have to wait as long to get their turn in the (IO device) queue. The other advantage is when one is doing SSD rebuilds, with a 32KB stripe size one can intersperse more IO activity for the devices involved in the rebuild without impacting rebuild performance.

Of course the other major advantage of HP StoreServ’s 3PAR architecture provides for Flash is its intrinsic wide striping that’s done across a storage pool. This way all the SSDs can be used optimally and equally to service customer IOs.

I am certain there were other optimizations HP made to support SSDs in StoreServ storage, but these are the ones they were willing to talk publicly about.

No mention of when Memristor SSDs were going to be available but stay tuned, HP let slip that sooner or later Memristor Flash storage will be in HP storage & servers.

Comments?

Photo Credits: (c) 2013 Silverton Consulting, Inc

EMC buys ExtremeIO

Wow, $430M for a $25M startup that’s been around since 2009 and hasn’t generated any revenue yet.  It probably compares well against Facebook’s recent $1B acquisition of Instagram but still it seems a bit much.

It certainly signals a significant ongoing interest in flash storage in whatever form that takes. Currently EMC offers PCIe flash storage (VFCache), SSD options in VMAX and VNX, and has plans for a shared flash cache array (project: Thunder).  An all-flash storage array makes a lot of sense if you believe this represents an architecture that can grab market share in storage.

I have talked with ExtremeIO in the past but they were pretty stealthy then (and still are as far as I can tell). Not much details about their product architecture, specs on performance, interfaces or anything substantive. The only thing they told me then was that they were in the flash array storage business.

In a presentation to SNIA’s BOD last summer I said that the storage industry is in revolution.  When a 20 or so device system can generate ~250K or more IO/second with a single controller, simple interfaces, and solid state drives, we are no longer in Kansas anymore.

Can a million IO storage system be far behind.

It seems to me, that doing enterprise storage performance has gotten much easier over the last few years.  Now that doesn’t mean enterprise storage reliability, availability or features but just getting to that level of performance before took 1000s of disk drives and racks of equipment.  Today, you can almost do it in a 2U enclosure and that’s without breaking a sweat.

Well that seems to be the problem, with a gaggle of startups, all vying after SSD storage in one form or another the market is starting to take notice.  Maybe EMC felt that it was a good time to enter the market with their own branded product, they seem to already have all the other bases covered.

Their website mentions that ExtremeIO was a load balanced, deduplicated clustered storage system with enterprise class services (this could mean anything). Nonetheless, a deduplicating, clustered SSD storage system built out of commodity servers could define at least 3 other SSD startups I have recently talked with and a bunch I haven’t talked with in awhile.

Why EMC decided that ExtremeIO was the one to buy is somewhat a mystery.  There was some mention of an advanced data protection scheme for the flash storage but no real details.

Nonetheless, enterprise SSD storage services with relatively low valuation and potential to disrupt enterprise storage might be something to invest in.  Certainly EMC felt so.

~~~~

Comments, anyone know anything more about ExtremeIO?

Thoughts on Spring SNW 2012 in Dallas

Viking Technology NAND/DIMM SSD 32TB/1U demo box
Viking Technology NAND/DIMM SSD 32TB/1U demo box

[Updated photo] Well the big news today was the tornado activity in the Dallas area. When the tornado warnings were announced customers were stuck on the exhibit floor and couldn’t leave (which made all the vendors very happy). Meetings with vendors still went on but were held in windowless rooms and took some ingenuity to get to. I offered to meet in the basement but was told I couldn’t go down there.

As for technology at the show, I was pretty impressed with the Viking booth. They had a 512GB MLC NAND flash card placed in spare DIMM slots with MLC or SLC NAND flash storage in them which takes power from the DIMM slot and uses a separate SATA cabling to cable the SSD storage together. It could easily be connected to a MegaRAID card and RAIDed together. The cards are mainly sold to OEMs but they are looking to gain some channel partners willing to sell them directly to end users.

In addition to the MLC NAND/DIMM card, they had a demo box with just a whole bunch of DIMM slots, where they modified the DIMM connections to also support SATA interface through their mother board. They had on display 1U storage box with 32TB of MLC NAND/DIMM cards and a single power supply supporting 6 lanes of SAS connectivity to the storage. Wasn’t clear what they were trying to do with this other than stimulate thought and interest from OEMs. It was a very interesting demo

There a few major vendors including Fujitsu, HDS, HP, and Oracle exhibiting at the show with a slew of minor ones as well. But noticeable by their absence was Dell, EMC, IBM, and NetApp not to mention Brocade, Cisco and Emulex.

Someone noticed that a lot of the smaller SSD startups weren’t here as well, e.g., no PureStorage, NexGen, SolidFire, Whiptail etc. Even FusionIO with their bank of video streams was missing from the show. In times past, smaller startups would use SNWto get vendor and end-user customer attention. I suppose nowadays, they do this at VMworld, Oracle Openworld, Sapphire or other vertical specific conferences.

20120403-181058.jpg
Marc Farley of StorSimple discussing cloud storage

In the SSD space there was Nimbus Data, TMS, Micron and OCZ where here showing off their latest technology. Also, there were a few standard bearers like FalconStor, Veeam, Sepaton, Ultrium and Qlogic were exhibiting as well. A couple of pure cloud players as well like RackSpace, StorSimple and a new player Symform.

Didn’t get to attend any technical sessions today but made the keynote last night which was pretty good. That talk was all about how the CIO has to start playing offense and getting ahead of where the business is heading rather than playing defense playing catchup to where the business needed to be before.

More on SNWusa tomorrow.

SCI SPC-1 results analysis: Top 10 $/IOPS – chart-of-the-month

Column chart showing the top 10 economically performing systems for SPC-1
(SCISPC120226-003) (c) 2012 Silverton Consulting, Inc. All Rights Reserved

Lower is better on this chart.  I can’t remember the last time we showed this Top 10 $/IOPS™ chart from the Storage Performance Council SPC-1 benchmark.  Recall that we prefer our IOPS/$/GB which factors in subsystem size but this past quarter two new submissions ranked well on this metric.  The two new systems were the all SSD Huawei Symantec Oceanspace™ Dorado2100 (#2) and the latest Fujitsu ETERNUS DX80 S2 storage (#7) subsystems.

Most of the winners on $/IOPS are SSD systems (#1-5 and 10) and most of these were all SSD storage system.  These systems normally have better $/IOPS by hitting high IOPS™ rates for the cost of their storage. But they often submit relatively small systems to SPC-1 reducing system cost and helping them place better on $/IOPS.

On the other hand, some disk only storage do well by abandoning any form of protection as with the two Sun J4400 (#6) and J4200 (#8) storage systems which used RAID 0 but also had smaller capacities, coming in at 2.2TB and 1.2TB, respectively.

The other two disk only storage systems here, the Fujitsu ETERNUS DX80 S2 (#7) and the Huawei Symantec Oceanspace S2600 (#9) systems also had relatively small capacities at 9.7TB and 2.9TB respectively.

The ETERNUS DX80 S2 achieved ~35K IOPS and at a cost of under $80K generated a $2.25 $/IOPS.  Of course, the all SSD systems blow that away, for example the Oceanspace Dorado2100 (#2), all SSD system hit ~100K IOPS but cost nearly $90K for a $0.90 $/IOPS.

Moreover, the largest capacity system here with 23.7TB of storage was the Oracle Sun ZFS (#10) hybrid SSD and disk system which generated ~137K IOPS at a cost of ~$410K hitting just under $3.00 $/IOPS.

Still prefer our own metric on economical performance but each has their flaws.  The SPC-1 $/IOPS metric is dominated by SSD systems and our IOPS/$/GB metric is dominated by disk only systems.   Probably some way to do better on the cost of performance but I have yet to see it.

~~~~

The full SPC performance report went out in SCI’s February newsletter.  But a copy of the full report will be posted on our dispatches page sometime next month (if all goes well). However, you can get the full SPC performance analysis now and subscribe to future free newsletters by just sending us an email or using the signup form above right.

For a more extensive discussion of current SAN or block storage performance covering SPC-1 (top 30), SPC-2 (top 30) and ESRP (top 20) results please see SCI’s SAN Storage Buying Guide available on our website.

As always, we welcome any suggestions or comments on how to improve our analysis of SPC results or any of our other storage performance analyses.

 

Super Talent releases a 4-SSD, RAIDDrive PCIe card

RAIDDrive UpStream (c) 2012 Super Talent (from their website)
RAIDDrive UpStream (c) 2012 Super Talent (from their website)

Not exactly sure what is happening, but PCIe cards are coming out containing multiple SSD drives.

For example, the recently announced Super Talent RAIDDrive UpStream card contains 4 SAS embedded SSDs that can push storage capacity up to almost a TB of MLC NAND.   They have an optional SLC version but there were no specs provided on this.

It looks like the card uses an LSI RAID controller and SANDforce NAND controller.  Unlike the other RAIDDrive cards that support RAID5, the UpStream can be configured with RAID 0, 1 or 1E (sort of RAID 1 only striped across even or odd drive counts) and currently supports capacities of 220GB, 460GB or 960GB total.

Just like the rest of the RAIDDrive product line, the UpStream card is PCIe x8 connected and requires host software (drivers) for Windows, NetWare, Solaris and other OSs but not for “most Linux distributions”.  Once the software is up, the RAIDDrive can be configured and then accessed just like any other “super fast” DAS device.

Super Talent’s data sheet states UpStream performance at are 1GB/sec Read and 900MB/Sec writes. However, I didn’t see any SNIA SSD performance test results so it’s unclear how well performance holds up over time and whether these performance levels can be independently verified.

It seems just year ago that I was reviewing Virident’s PCIe SSD along with a few others at Spring SNW.   At the time, I thought there were a lot of PCIe NAND cards being shown at the show.  Given Super Talent’s and the many other vendors sporting PCIe SSDs today, there’s probably going to be a lot more this time.

No pricing information was available.

~~~~

Comments?

Storage performance matters, even for smartphones

Portrait of a Young Girl With an iPhone, after Agnolo Bronzino by Mike Licht,...  (cc) (From Flickr)
Portrait of a Young Girl With an iPhone, after Agnolo Bronzino by Mike Licht,... (cc) (From Flickr)

 

Read an interesting article from MIT’s Technical Review about a study presented at last weeks Usenix FAST (File and Storage Technology) conference on How Data Storage Cripples Mobile Apps.  It seems storage performance can seriously slow down smartphone functioning, not unlike IT applications (see IO throughput vs. response time and why it matters post for more).

The smartphone research was done by NEC.  They took an Android phone and modified  the O/S to use an external memory card for all of the App data needs.

Then they ran a number of Apps through their paces with various external memory cards.  It turned out that depending on the memory card in use, the mobile phones email and Twitter Apps launched 2-3X faster.   Also, the native web App was tested with over 50 pages loads and had at best, a 3X faster page load time.

All the tests were done using a cable to simulate advanced network connections, above and beyond today’s capabilities and to eliminate that as the performance bottleneck.  In the end, faster networking didn’t have as much of a bearing on App performance as memory card speed.

(NAND) memory card performance

The problem, it turns out is due to data writes.  It seems that the non-volatile memory used in most external memory cards is NAND flash, which as is we all know, has much slower write time than read time, almost 1000X  (see my post on Why SSD performance is such a mystery).  Most likely the memory cards are pretty “dumb” so many performance boosting techniques used in enterprise class SSDs are not available (e.g., DRAM write buffering).

Data caching helps

The researchers did another experiment with the phone, using a more sophisticated version of data caching and a modified Facebook App.  Presumably, this new “data caching” minimized the data write penalty by caching writes to DRAM first and only destaging data to NAND flash when absolutely necessary.   By using the more sophisticated “data caching” they were able to speed up the modified Facebook App by 4X.

It seems that storage sophistication matters even in smartphones, I think I am going to  need to have someone port the caching portions of Data ONTAP® or Enginuity™ to run on my iPhone.

Comments?

 

Why EMC is doing Project Lightening and Thunder

Picture of atmospheric lightening striking ground near a building at night
rayo 3 by El Garza (cc) (from Flickr)

Although technically Project Lightening and Thunder represent some interesting offshoots of EMC software, hardware and system prowess,  I wonder why they would decide to go after this particular market space.

There are plenty of alternative offerings in the PCIe NAND memory card space.  Moreover, the PCIe card caching functionality, while interesting is not that hard to replicate and such software capability is not a serious barrier of entry for HP, IBM, NetApp and many, many others.  And the margins cannot be that great.

So why get into this low margin business?

I can see a couple of reasons why EMC might want to do this.

  • Believing in the commoditization of storage performance.  I have had this debate with a number of analysts over the years but there remain many out there that firmly believe that storage performance will become a commodity sooner, rather than later.  By entering the PCIe NAND card IO buffer space, EMC can create a beachhead in this movement that helps them build market awareness, higher manufacturing volumes, and support expertise.  As such, when the inevitable happens and high margins for enterprise storage start to deteriorate, EMC will be able to capitalize on this hard won, operational effectiveness.
  • Moving up the IO stack.  From an applications IO request to the disk device that actually services it is a long journey with multiple places to make money.  Currently, EMC has a significant share of everything that happens after the fabric switch whether it is FC,  iSCSI, NFS or CIFS.  What they don’t have is a significant share in the switch infrastructure or anywhere on the other (host side) of that interface stack.  Yes they have Avamar, Networker, Documentum, and other software that help manage, secure and protect IO activity together with other significant investments in RSA and VMware.   But these represent adjacent market spaces rather than primary IO stack endeavors.  Lightening represents a hybrid software/hardware solution that moves EMC up the IO stack to inside the server.  As such, it represents yet another opportunity to profit from all the IO going on in the data center.
  • Making big data more effective.  The fact that Hadoop doesn’t really need or use high end storage has not been lost to most storage vendors.  With Lightening, EMC has a storage enhancement offering that can readily improve  Hadoop cluster processing.  Something like Lightening’s caching software could easily be tailored to enhance HDFS file access mode and thus, speed up cluster processing.  If Hadoop and big data are to be the next big consumer of storage, then speeding cluster processing will certainly help and profiting by doing this only makes sense.
  • Believing that SSDs will transform storage. To many of us the age of disks is waning.  SSDs, in some form or another, will be the underlying technology for the next age of storage.  The densities, performance and energy efficiency of current NAND based SSD technology are commendable but they will only get better over time.  The capabilities brought about by such technology will certainly transform the storage industry as we know it, if they haven’t already.  But where SSD technology actually emerges is still being played out in the market place.  Many believe that when industry transitions like this happen it’s best to be engaged everywhere change is likely to happen, hoping that at least some of them will succeed. Perhaps PCIe SSD cards may not take over all server IO activity but if it does, not being there or being late will certainly hurt a company’s chances to profit from it.

There may be more reasons I missed here but these seem to be the main ones.  Of the above, I think the last one, SSD rules the next transition is most important to EMC.

They have been successful in the past during other industry transitions.  If anything they have shown similar indications with their acquisitions by buying into transitions if they don’t own them, witness Data Domain, RSA, and VMware.  So I suspect the view in EMC is that doubling down on SSDs will enable them to ride out the next storm and be in a profitable place for the next change, whatever that might be.

And following lightening, Project Thunder

Similarly, Project Thunder seems to represent EMC doubling their bet yet again on the SSDs.  Just about every month I talk to another storage startup coming out in the market providing another new take on storage using every form of SSD imaginable.

However, Project Thunder as envisioned today is not storage, but rather some form of external shared memory.  I have heard this before, in the IBM mainframe space about 15-20 years ago.  At that time shared external memory was going to handle all mainframe IO processing and the only storage left was going to be bulk archive or migration storage – a big threat to the non-IBM mainframe storage vendors at the time.

One problem then was that the shared DRAM memory of the time was way more expensive than sophisticated disk storage and the price wasn’t coming down fast enough to counteract increased demand.  The other problem was making shared memory work with all the existing mainframe applications was not easy.  IBM at least had control over the OS, HW and most of the larger applications at the time.  Yet they still struggled to make it usable and effective, probably some lesson here for EMC.

Fast forward 20 years and NAND based SSDs are the right hardware technology to make  inexpensive shared memory happen.  In addition, the road map for NAND and other SSD technologies looks poised to continue the capacity increase and price reductions necessary to compete effectively with disk in the long run.

However, the challenges then and now seem as much to do with software that makes shared external memory universally effective as with the hardware technology to implement it.  Providing a new storage tier in Linux, Windows and/or VMware is easier said than done. Most recent successes have usually been offshoots of SCSI (iSCSI, FCoE, etc).  Nevertheless, if it was good for mainframes then, it certainly good for Linux, Windows and VMware today.

And that seems to be where Thunder is heading, I think.

Comments?

 

Comments?

Latest SPECsfs2008 results, over 1 million NFS ops/sec – chart-of-the-month

Column chart showing the top 10 NFS througput operations per second for SPECsfs2008
(SCISFS111221-001) (c) 2011 Silverton Consulting, All Rights Reserved

[We are still catching up on our charts for the past quarter but this one brings us up to date through last month]

There’s just something about a million SPECsfs2008(r) NFS throughput operations per second that kind of excites me (weird, I know).  Yes it takes over 44-nodes of Avere FXT 3500 with over 6TB of DRAM cache, 140-nodes of EMC Isilon S200 with almost 7TB of DRAM cache and 25TB of SSDs or at least 16-nodes of NetApp FAS6240 in Data ONTAP 8.1 cluster mode with 8TB of FlashCache to get to that level.

Nevertheless, a million NFS throughput operations is something worth celebrating.  It’s not often one achieves a 2X improvement in performance over a previous record.  Something significant has changed here.

The age of scale-out

We have reached a point where scaling systems out can provide linear performance improvements, at least up to a point.  For example, the EMC Isilon and NetApp FAS6240 had a close to linear speed up in performance as they added nodes indicating (to me at least) there may be more there if they just throw more storage nodes at the problem.  Although maybe they saw some drop off and didn’t wish to show the world or potentially the costs became prohibitive and they had to stop someplace.   On the other hand, Avere only benchmarked their 44-node system with their current hardware (FXT 3500), they must have figured winning the crown was enough.

However, I would like to point out that throwing just any hardware at these systems doesn’t necessary increase performance.  Previously (see my CIFS vs NFS corrected post), we had shown the linear regression for NFS throughput against spindle count and although the regression coefficient was good (~R**2 of 0.82), it wasn’t perfect. And of course we eliminated any SSDs from that prior analysis. (Probably should consider eliminating any system with more than a TB of DRAM as well – but this was before the 44-node Avere result was out).

Speaking of disk drives, the FAS6240 system nodes had 72-450GB 15Krpm disks, the Isilon nodes had 24-300GB 10Krpm disks and each Avere node had 15-600GB 7.2Krpm SAS disks.  However the Avere system also had a 4-Solaris ZFS file storage systems behind it each of which had another 22-3TB (7.2Krpm, I think) disks.  Given all that, the 16-node NetApp system, 140-node Isilon and the 44-node Avere systems had a total of 1152, 3360 and 748 disk drives respectively.   Of course, this doesn’t count the system disks for the Isilon and Avere systems nor any of the SSDs or FlashCache in the various configurations.

I would say with this round of SPECsfs2008 benchmarks scale-out NAS systems have come out.  It’s too bad that both NetApp and Avere didn’t release comparable CIFS benchmark results which would have helped in my perennial discussion on CIFS vs. NFS.

But there’s always next time.

~~~~

The full SPECsfs2008 performance report went out to our newsletter subscriber’s last December.  A copy of the full report will be up on the dispatches page of our site sometime later this month (if all goes well). However, you can see our full SPECsfs2008 performance analysis now and subscribe to our free monthly newsletter to receive future reports directly by just sending us an email or using the signup form above right.

For a more extensive discussion of file and NAS storage performance covering top 30 SPECsfs2008 results and NAS storage system features and functionality, please consider purchasing our NAS Buying Guide available from SCI’s website.

As always, we welcome any suggestions on how to improve our analysis of SPECsfs2008 results or any of our other storage system performance discussions.

Comments?