Fall SNWUSA wrap-up

Attended SNWUSA this week in San Jose,  It’s hard to see the show gradually change when you attend each one but it does seem that the end-user content and attendance is increasing proportionally.  This should bode well for future SNWs. Although, there was always a good number of end users at the show but the bulk of the attendees in the past were from storage vendors.

Another large storage vendor dropped their sponsorship.  HDS no longer sponsors the show and the last large vendor still standing at the show is HP.  Some of this is cyclical, perhaps the large vendors will come back for the spring show, next year in Orlando, Fl.  But EMC, NetApp and IBM seemed to have pretty much dropped sponsorship for the last couple of shows at least.

SSD startup of the show

Skyhawk hardware (c) 2012 Skyera, all rights reserved (from their website)
Skyhawk hardware (c) 2012 Skyera, all rights reserved (from their website)

The best, new SSD startup had to be Skyera. A 48TB raw flash dual controller system supporting iSCSI block protocol and using real commercial grade MLC.  The team at Skyera seem to be all ex-SandForce executives and technical people.

Skyera’s team have designed a 1U box called the Skyhawk, with  a phalanx of NAND chips, there own controller(s) and other logic as well. They support software compression and deduplication as well as a special designed RAID logic that claims to reduce extraneous write’s to something just over 1 for  RAID 6, dual drive failure equivalent protection.

Skyera’s underlying belief is that just as consumer HDAs took over from the big monster 14″ and 11″ disk drives in the 90’s sooner or later commercial NAND will take over from eMLC and SLC.  And if one elects to stay with the eMLC and SLC technology you are destined to be one to two technology nodes behind. That is, commercial MLC (in USB sticks, SD cards etc) is currently manufactured with 19nm technology.  The EMLC and SLC NAND technology is back at 24 or 25nm technology.  But 80-90% of the NAND market is being driven by commercial MLC NAND.  Skyera came out this past August.

Coming in second place was Arkologic an all flash NAS box using SSD drives from multiple vendors. In their case a fully populated rack holds about 192TB (raw?) with an active-passive controller configuration.  The main concern I have with this product is that all their metadata is held in UPS backed DRAM (??) and they have up to 128GB of DRAM in the controller.

Arkologic’s main differentiation is supporting QOS on a file system basis and having some connection with a NIC vendor that can provide end to end QOS.  The other thing they have is a new RAID-AS which is special designed for Flash.

I just hope their USP is pretty hefty and they don’t sell it someplace where power is very flaky, because when that UPS gives out, kiss your data goodbye as your metadata is held nowhere else – at least that’s what they told me.

Cloud storage startup of the show

There was more cloud stuff going on at the show. Talked to at least three or four cloud gateway providers.  But the cloud startup of the show had to be Egnyte.  They supply storage services that span cloud storage and on premises  storage with an in band or out-of-band solution and provide file synchronization services for file sharing across multiple locations.  They have some hooks into NetApp and other major storage vendor products that allows them to be out-of-band for these environments but would need to be inband for other storage systems.  Seems an interesting solution that if succesful may help accelerate the adoption of cloud storage in the enterprise, as it makes transparent whether storage you access is local or in the cloud. How they deal with the response time differences is another question.

Different idea startup of the show

The new technology showplace had a bunch of vendors some I had never heard of before but one that caught my eye was Actifio. They were at VMworld but I never got time to stop by.  They seem to be taking another shot at storage virtualization. Only in this case rather than focusing on non-disruptive file migration they are taking on the task of doing a better job of point in time copies for iSCSI and FC attached storage.

I assume they are in the middle of the data path in order to do this and they seem to be using copy-on-write technology for point-in-time snapshots.  Not sure where this fits, but I suspect SME and maybe up to mid-range.

Most enterprise vendors have solved these problems a long time ago but at the low end, it’s a little more variable.  I wish them luck but although most customers use snapshots if their storage has it, those that don’t, seem unable to understand what they are missing.  And then there’s the matter of being in the data path?!

~~~~

If there was a hybrid startup at the show I must have missed them. Did talk with Nimble Storage and they seem to be firing on all cylinders.  Maybe someday we can do a deep dive on their technology.  Tintri was there as well in the new technology showcase and we talked with them earlier this year at Storage Tech Field Day.

The big news at the show was Microsoft purchasing StorSimple a cloud storage gateway/cache.  Apparently StorSimple did a majority of their business with Microsoft’s Azure cloud storage and it seemed to make sense to everyone.

The SNIA suite was hopping as usual and the venue seemed to work well.  Although I would say the exhibit floor and lab area was a bit to big. But everything else seemed to work out fine.

On Wednesday, the CIO from Dish talked about what it took to completely transform their IT environment from a management and leadership perspective.  Seemed like an awful big risk but they were able to pull it off.

All in all, SNW is still a great show to learn about storage technology at least from an end-user perspective.  I just wish some more large vendors would return once again, but alas that seems to be a dream for now.

Dell Storage Forum 2012 – day 2

At the second day of Dell Storage Forum in Boston, they announced:

  • New FluidFS (Exanet) FS8600 front end NAS gateway for Dell Compellent storage. The new gateway can be scaled from 1 to 4 dual controller configurations and can support a single file system/name space of up to 1PB in size. The FS8600 is available with 1GbE or 10GbE options and support 8Gbps FC attachments to backend storage.
  • New Dell Compellent SC8000 controllers based on Dell’s 2U, 12th generation server hardware that can now be cooled with ambient air (115F?) and consumes lower power than previous Series 40 whitebox server controllers. Also the new hardware comes with dual 6-core processors and support 16 to 64GB of DRAM per controller or up to 128GB with dual controllers. The new controllers GA this month, support PCIe slots for backend 6Gbps SAS and frontend connectivity of 1GbE or 10GbE iSCSI, 10GbE FCoE or 8Gbps FC, with 16Gbps FC coming out in 2H2012.
  • New Dell Compellent SC200 and SC220 drive enclosures a 2U 24 SFF drive enclosure or a 2U 12LFF drive enclosure configuration supporting 6Gbps SAS connectivity.
  • New Dell Compellent SC6.0 operating software supporting a 64 bit O/S for larger memory, dual/multi-core processing.
  • New FluidFS FS7600 (1GbE)/FS7610 (10GbE) 12th generation server front end NAS gateways for Dell EqualLogic storage which supports asynchronous replication at the virtual file system level. The new gateways also support 10GbE iSCSI and can be scaled up to 507TB in a single name space.
  • New FluidFS NX3600 (1GbE) /NX3610 (10GbE) 12th generation server front end NAS gateways for PowerVault storage systems which can support up to 576TB of raw capacity for a single gateway or scale to two gateways for up to 1PB of raw storage in a single namespace/file system.
  • Appasure 5 which includes better performance based on a new backend object store to protect even larger datasets. At the moment Appasure is a Windows only solution but with block deduplication/compression and change block tracking is already WAN optimized. Dell announced Linux support will be available later this year.

Probably more interesting was talk and demoing a prototype of their RNA Networks acquisition which supports a cache coherent PCIe SSD cards in Dell servers. The new capability is still on the drawing boards but is intended to connect to Dell Compellent storage and move tier 1 out to the server. Lot’s more to come on this. They call this Project Hermes for the Greek messenger god. Not sure but something about having lightening bolts on his shoes comes to mind…

Comments?

 

20120612-114420.jpg

OpenFlow part 2, Cisco’s response

 

organic growth by jurvetson
organic growth by jurvetson

Cisco’s CTO Padmasree Warrior, was interviewed today by NetworkWorld discussing their response to all the recent press on OpenFlow coming out of the Open Networking Summit (see my OpenFlow the next wave in networking post).  Apparently, Cisco is funding a new spin-in company to implement new networking technology congruent with Cisco’s current and future switches and routers.

Spin-in to the rescue

We have seen this act before, Andiamo was another Cisco spin-in company (brought back in ~2002), only this time focused on FC or SAN switching technology.  Andiamo was successful in that it created FC switch technology which allowed Cisco to go after the storage networking market and probably even helped them design and implement FCoE.

This time’s, a little different however. It’s in Cisco’s backyard, so to speak.  The new spin-in is called Insieme and will be focused on “OpenStack switch hardware and distributed data storage”.

Distributed data storage sounds a lot like cloud storage to me.  OpenStack seems to be an open source approach to define cloud computing systems. What all that has to do with software defined networking I am unable to understand.

Nonetheless, Cisco has invested $100M in the startup and have capped their acquisition cost at $750M if it succeeds.

But is it SDN?

Ms. Warrior does go on to discuss that software programmable switches will be integrated across Cisco’s product line sometime in the near future but says that OpenFlow and OpenStack are only two ways to do that. Other ways exist, such as adding  new features to NX-OS today or modifying their Nexus 1000v (software only, VMware based, virtual switch) they have been shipping since 2009.

As for OpenFlow commoditizing networking technology, Ms. Warrior doesn’t believe that any single technology is going to change the leadership in networking.  Programmability is certainly of interest to one segment of users with massive infrastructure but most data centers have no desire to program their own switches.  And in the end, networking success depends as much channels and goto market programs as it does on great technology.

Cisco’s CTO was reluctant to claim that Insieme was their response to SDN but it seems patently evident to the rest of us that it’s at least one of its objectives.  Something like this is a two edged sword, on the one hand it helps Cisco go after and help define the new technology on the other hand it legitimizes the current players.

~~~~

Nicira is probably rejoicing today what with all the news coming out of the Summit and the creation of Insieme.  Probably yet another reason not to label it SDN…

SCI SPC-1 results analysis: Top 10 $/IOPS – chart-of-the-month

Column chart showing the top 10 economically performing systems for SPC-1
(SCISPC120226-003) (c) 2012 Silverton Consulting, Inc. All Rights Reserved

Lower is better on this chart.  I can’t remember the last time we showed this Top 10 $/IOPS™ chart from the Storage Performance Council SPC-1 benchmark.  Recall that we prefer our IOPS/$/GB which factors in subsystem size but this past quarter two new submissions ranked well on this metric.  The two new systems were the all SSD Huawei Symantec Oceanspace™ Dorado2100 (#2) and the latest Fujitsu ETERNUS DX80 S2 storage (#7) subsystems.

Most of the winners on $/IOPS are SSD systems (#1-5 and 10) and most of these were all SSD storage system.  These systems normally have better $/IOPS by hitting high IOPS™ rates for the cost of their storage. But they often submit relatively small systems to SPC-1 reducing system cost and helping them place better on $/IOPS.

On the other hand, some disk only storage do well by abandoning any form of protection as with the two Sun J4400 (#6) and J4200 (#8) storage systems which used RAID 0 but also had smaller capacities, coming in at 2.2TB and 1.2TB, respectively.

The other two disk only storage systems here, the Fujitsu ETERNUS DX80 S2 (#7) and the Huawei Symantec Oceanspace S2600 (#9) systems also had relatively small capacities at 9.7TB and 2.9TB respectively.

The ETERNUS DX80 S2 achieved ~35K IOPS and at a cost of under $80K generated a $2.25 $/IOPS.  Of course, the all SSD systems blow that away, for example the Oceanspace Dorado2100 (#2), all SSD system hit ~100K IOPS but cost nearly $90K for a $0.90 $/IOPS.

Moreover, the largest capacity system here with 23.7TB of storage was the Oracle Sun ZFS (#10) hybrid SSD and disk system which generated ~137K IOPS at a cost of ~$410K hitting just under $3.00 $/IOPS.

Still prefer our own metric on economical performance but each has their flaws.  The SPC-1 $/IOPS metric is dominated by SSD systems and our IOPS/$/GB metric is dominated by disk only systems.   Probably some way to do better on the cost of performance but I have yet to see it.

~~~~

The full SPC performance report went out in SCI’s February newsletter.  But a copy of the full report will be posted on our dispatches page sometime next month (if all goes well). However, you can get the full SPC performance analysis now and subscribe to future free newsletters by just sending us an email or using the signup form above right.

For a more extensive discussion of current SAN or block storage performance covering SPC-1 (top 30), SPC-2 (top 30) and ESRP (top 20) results please see SCI’s SAN Storage Buying Guide available on our website.

As always, we welcome any suggestions or comments on how to improve our analysis of SPC results or any of our other storage performance analyses.

 

Latest SPC-1 results – IOPS vs drive counts – chart-of-the-month

Scatter plot of SPC-1  IOPS against Spindle count, with linear regression line showing Y=186.18X + 10227 with R**2=0.96064
(SCISPC111122-004) (c) 2011 Silverton Consulting, All Rights Reserved

[As promised, I am trying to get up-to-date on my performance charts from our monthly newsletters. This one brings us current up through November.]

The above chart plots Storage Performance Council SPC-1 IOPS against spindle count.  On this chart, we have eliminated any SSD systems, systems with drives smaller than 140 GB and any systems with multiple drive sizes.

Alas, the regression coefficient (R**2) of 0.96 tells us that SPC-1 IOPS performance is mainly driven by drive count.  But what’s more interesting here is that as drive counts get higher than say 1000, the variance surrounding the linear regression line widens – implying that system sophistication starts to matter more.

Processing power matters

For instance, if you look at the three systems centered around 2000 drives, they are (from lowest to highest IOPS) 4-node IBM SVC 5.1, 6-node IBM SVC 5.1 and an 8-node HP 3PAR V800 storage system.  This tells us that the more processing (nodes) you throw at an IOPS workload given similar spindle counts, the more efficient it can be.

System sophistication can matter too

The other interesting facet on this chart comes from examining the three systems centered around 250K IOPS that span from ~1150 to ~1500 drives.

  • The 1156 drive system is the latest HDS VSP 8-VSD (virtual storage directors, or processing nodes) running with dynamically (thinly) provisioned volumes – which is the first and only SPC-1 submission using thin provisioning.
  • The 1280 drive system is a (now HP) 3PAR T800 8-node system.
  • The 1536 drive system is an IBM SVC 4.3 8-node storage system.

One would think that thin provisioning would degrade storage performance and maybe it did but without a non-dynamically provisioned HDS VSP benchmark to compare against, it’s hard to tell.  However, the fact that the HDS-VSP performed as well as the other systems did with much lower drive counts seems to tell us that thin provisioning potentially uses hard drives more efficiently than fat provisioning, the 8-VSD HDS VSP is more effective than an 8-node IBM SVC 4.3 and an 8-node (HP) 3PAR T800 systems, or perhaps some combination of these.

~~~~

The full SPC performance report went out to our newsletter subscriber’s last November.  [The one change to this chart from the full report is the date in the chart’s title was wrong and is fixed here].  A copy of the full report will be up on the dispatches page of our website sometime this month (if all goes well). However, you can get performance information now and subscribe to future newsletters to receive these reports even earlier by just sending us an email or using the signup form above right.

For a more extensive discussion of block or SAN storage performance covering SPC-1&-2 (top 30) and ESRP (top 20) results please consider purchasing our recently updated SAN Storage Buying Guide available on our website.

As always, we welcome any suggestions on how to improve our analysis of SPC results or any of our other storage system performance discussions.

Comments?

Top 10 blog posts for 2011

Merry Christmas! Buon Natale! Frohe Weihnachten! by Jakob Montrasio (cc) (from Flickr)
Merry Christmas! Buon Natale! Frohe Weihnachten! by Jakob Montrasio (cc) (from Flickr)

Happy Holidays.

I ranked my blog posts using a ratio of hits to post age and have identified with the top 10 most popular posts for 2011 (so far):

  1. Vsphere 5 storage enhancements – We discuss some of the more interesting storage oriented Vsphere 5 announcements that included a new DAS storage appliance, host based (software) replication service, storage DRS and other capabilities.
  2. Intel’s 320 SSD 8MB problem – We discuss a recent bug (since fixed) which left the Intel 320 SSD drive with only 8MB of storage, we presumed the bug was in the load leveling logic/block mapping logic of the drive controller.
  3. Analog neural simulation or digital neuromorphic computing vs AI – We talk about recent advances to providing both analog (MIT) and digital versions (IBM) of neural computation vs. the more traditional AI approaches to intelligent computing.
  4. Potential data loss using SSD RAID groups – We note the possibility for catastrophic data loss when using equally used SSDs in RAID groups.
  5. How has IBM researched changed – We examine some of the changes at IBM research that have occurred over the past 50 years or so which have led to much more productive research results.
  6. HDS buys BlueArc – We consider the implications of the recent acquisition of BlueArc storage systems by their major OEM partner, Hitachi Data Systems.
  7. OCZ’s latest Z-Drive R4 series PCIe SSD – Not sure why this got so much traffic but its OCZ’s latest PCIe SSD device with 500K IOPS performance.
  8. Will Hybrid drives conquer enterprise storage – We discuss the unlikely possibility that Hybrid drives (NAND/Flash cache and disk drive in the same device) will be used as backend storage for enterprise storage systems.
  9. SNIA CDMI plugfest for cloud storage and cloud data services – We were invited to sit in on a recent SNIA Cloud Data Management Initiative (CDMI) plugfest and talk to some of the participants about where CDMI is heading and what it means for cloud storage and data services.
  10. Is FC dead?! – What with the introduction of 40GbE FCoE just around the corner, 10GbE cards coming down in price and Brocade’s poor YoY quarterly storage revenue results, we discuss the potential implications on FC infrastructure and its future in the data center.

~~~~

I would have to say #3, 5, and 9 were the most fun for me to do. Not sure why, but #10 probably generated the most twitter traffic. Why the others were so popular is hard for me to understand.

Comments?

SCI’s latest SPC-2 performance results analysis – chart-of-the-month

SCISPC110822-002 (c) 2011 Silverton Consulting, All Rights Reserved
SCISPC110822-002 (c) 2011 Silverton Consulting, All Rights Reserved

There really wasn’t that many new submissions for the Storage Performance Council SPC-1 or SPC-2 benchmarks this past quarter (just the new Fujitsu DX80S2 SPC-2 run) so we thought it time to roll out a new chart.

The chart above shows a scatter plot of the number of disk drives in a submission vs. the MB/sec attained for the Large Database Query (LDQ) component of an SPC-2 benchmark.

As one who follows this blog and our twitter feed knows we continue to have an ongoing, long running discussion on how I/O benchmarks such as this are mostly just a measure of how much hardware (disks and controllers) are thrown at them.  We added a linear regression line to the above chart to evaluate the validity of that claim and as clearly shown above, disk drive count is NOT highly correlated with SPC-2 performance.

We necessarily exclude from this analysis any system results that used NAND based caching or SSD devices so as to focus specifically on disk drive count relevance.   There are not a lot of these in SPC-2 results but there are enough to make this look even worse.

We chose to only display the LDQ segment of the SPC-2 benchmark because it has the best correlation or highest R**2 at 0.41 between workload and disk count. The aggregate MBPS as well as the other components of the SPC-2 benchmark include video on demand (VOD) and large file processing (LFP) both of which had R**2’s of less than 0.36.

For instance, just look at the vertical centered around 775 disk drives.  There are two systems that show up here, one doing ~ 6000 MBPS and the other doing ~11,500 MBPS – quite a difference.  The fact that these are two different storage architectures from the same vendor is even more informative??

Why is the overall correlation so poor?

One can only speculate but there must be something about system sophistication at work in SPC-2 results.  It’s probably tied to better caching, better data layout on disk, and better IO latency but it’s only an educated guess.  For example,

  • Most of the SPC-2 workload is sequential in nature.  How a storage system detects sequentiality in a seemingly random IO mix is an art form and what a system does armed with that knowledge is probably more of a science.
  • In the old days of big, expensive CKD DASD, sequential data was all laid out in consecutively (barring lacing) around a track and up a cylinder.  These days of zoned FBA disks one can only hope that sequential data resides in laced sectors, along consecutive tracks on the media, minimizing any head seek activity.  Another approach,  popular this last decade, has been to throw more disks at the problem, resulting in many more seeking heads to handle the workload and who care where the data lies.
  • IO latency is another factor.  We have discussed this before (see Storage throughput vs IO response time and why it matters. But one key to systems throughput is how quickly data gets out of cache and into the hands of servers. Of course the other part to this, is how fast does the storage system get the data from sitting on disk into cache.

Systems that do these better will perform better on SPC-2 like benchmarks that focus on raw sequential throughput.

Comments?

—–

The full SPC performance report went out to our newsletter subscribers last month.  A copy of the full report will be up on the dispatches page of our website later next month. However, you can get this information now and subscribe to future newsletters to receive these reports even earlier by just sending us an email or using the signup form above right.

As always, we welcome any suggestions on how to improve our analysis of SPC results or any of our other storage system performance discussions.

 

 

Latest Microsoft ESRP v3 (Exchange 2010) 1K to 5K mailbox performance results – chart of the month

SCIESRP110726-004 (c) 2011 Silverton Consulting, All Rights Reserved
SCIESRP110726-004 (c) 2011 Silverton Consulting, All Rights Reserved

Microsoft specifies two different metrics on sequential read rates for database backup activity in their Exchange Solution Reviewed Program (ESRP) reports

  • MB read/sec per database
  • MB read/sec total per server

Our problem with these metrics is that they don’t say much about the storage systems performance.  Some ESRP submissions could have a single database while others can have 100s of databases.  And the same thing applies to servers, although 20 servers seems to be about the max we have seen.  So as one can see the MB/s/DB or MB/s/server can vary all over the place depending on the Exchange configuration that one uses, even for the same exact storage system.

In the above chart, we  have attempted to move beyond some of these problems and use the information supplied in the ESRP reports to aggregate DB backups across all databases.  As such, we have derived a new metric called “total database backup”.  (Pretty simple actually just multiply the MB/s/DB times the number of databases in the Exchange configuration).

A couple of problems with our approach.

  • Current ESRP reports typically utilize a shadow storage system and shadow Exchange servers which host 50% of the databases and email activity. So what I am showing for those ESRP reports is what two storage systems can accomplish not one.
  • Another potential way to get the same result would be to use the number of servers times the MB/sec/server metric. (But try as I might these two approaches didn’t work to get the same answer so I am using the computation above – must be the way I am recording the number of [shadow] servers).
  • Although ESRP reports the average MB/sec/database to backup a single database it’s not clear that these measurements were taken while backing up all active databases at the same time, especially for those submissions with 100s of databases.

Probably the last is the most problematic critique to our new measure but may not be that harmful for smaller configurations. Nonetheless, we produced the above chart and published it in our last months review of ESRP results for the 1001 to 5000 mailbox category.

One item we discussed in our report was that numbers of disk drives didn’t seem to correlate well with high positions on this chart.  The number ten position (Fujitsu ETERNUS JX40) used over 140 disks, the number two position (Dell PowerEdge R510) had only 12 disk drives, and the number one solution (HP E5700) consisted of 56 drives, close to the average for this category.

One striking finding using this measure is that performance varies considerably from the top providing over 1600 MB/sec of database backup to the lowest of the group providing only ~800 MB/sec of backup performance. What with Exchange 2010 and lagged DAGs, some people feel that backup activity is no longer needed but we would disagree. We continue to believe that taking backups of Exchange data still makes a whole lot of sense and shouldn’t go away, ever.

It’s our hope that this or some similar follow-on metric will remove some of the Exchange configuration parameters from confounding ESRP reported storage system performance results.  We realize that this quixotic quest may never be entirely successful nevertheless we perform this duty in the hope that it will benefit today and future storage performance analysts everywhere.

Comments?

—–

The full ESRP report went out to our newsletter subscribers last month.  A copy of the full report will be up on the dispatches page of our website later next month. However, you can get this information now and subscribe to future newsletters to receive these reports even earlier by just emailing us at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_NewsletterR or using the signup form above and to the right.

As always, we welcome any suggestions on how to improve our analysis of ESRP or any of our other storage system performance discussions.