Dreaming of SCM but living with NVDIMMs…

Last months GreyBeards on Storage podcast was with Rob Peglar, CTO and Sr. VP of Symbolic IO. Most of the discussion was on their new storage product but what also got my interest is that they are developing their storage system using NVDIMM technologies.

In the past I would have called NVDIMMs NonVolatile RAM but with the latest incarnation it’s all packaged up in a single DIMM and has both NAND and DRAM on board. It looks a lot like 3D XPoint but without the wait.

IMG_2338The first time I saw similar technology was at SFD5 with Diablo Technologies and SANdisk, a Western Digital company (videos here and here). At that time they were calling them UltraDIMM and memory class storage. ULTRADIMMs had an onboard SSD and DRAM and they  provided a sort of virtual memory (paged) access to the substantial (SSD) storage behind the DRAM page(s). I  wrote two blog posts about UltraDIMM and MCS (called MCS, UltraDIMM and memory IO, the new path ahead part1 and part2).

 

NVDIMM defined

NVDIMMs are currently available today from Micron, Crucial, NetList, Viking, and probably others. With today’s NVDIMM there is no large SSD (like ULTRADIMMs, just backing flash) and the complete storage capacity is available from the DRAM in the NVDIMM. At power reset, the NVDIMM sort of acts like virtual memory paging in data from the flash until all the data is in DRAM.

NVDIMM hardware includes control logic, DRAM, NAND and SuperCAPs/Batteries together in one DIMM. DRAM is used for normal memory traffic but in the case of a power outage, the data from DRAM is offloaded onto the NAND in the NVDIMM using the SuperCAP/Battery to hold up the DRAM memory just long enough to transfer it to flash..

Th problem with good, old DRAM is that it is volatile, which means when power is gone so is your data. With NVDIMMs (3D XPoint and other new non-volatile storage class memories also share this characteristic), when power goes away your data is still available and persists across power outages.

For example, Micron offers an 8GB, JEDEC DDR4 compliant, 288-pin NVDIMM that has 8GB of DRAM and 16GB of SLC flash in a single DIMM. Depending on part, it has 14.9-16.2GB/s of bandwidth and 1866-2400 MT/s (million memory transfers/second). Roughly translating MT/s to IOPS, says with ~17GB/sec and at an 8KB block size, the device should be able to do ~2.1 MIO/s (million IO operations per second [never thought I would need an acronym for that]).

Another thing that makes NVDIMMs unique in the storage world is that they are byte addressable.

Hardware – check, Software?

SNIA has a NVM Programming (NVMP) Technical Working Group (TWG), which has been working to help adoption of the new technology. In addition to the NVMP TWG, there’s pmem.io, SANdisk’s NVMFS (2013 FMS paper, formerly known as DirectFS) and Intel’s pmfs (persistent memory file system) GitHub repository.  Couldn’t find any GitHub for NVMFS but both pmem.io and pmfs are well along the development path for Linux.

swarchThe TWG identified a three prong approach to NVDIMM adoption:  crawl, walk, run (see pmem.io blog post for more info).

  • The Crawl approach uses standard block and file system drivers on Linux to talk to a NVDIMM driver. This way has the benefit of being well tested, well known and widely available (except for the NVDIMM driver). The downside is that you have a full block IO or file IO stack in front of a device that can potentially do 2.1 MIO/s and it is likely to cause a lot of overhead reducing this potential significantly.
  • The Walk approach uses a persistent memory file system (pmfs?) to directly access the NVDIMM storage using memory mapped IO. The advantage here is that there’s absolutely no kernel code active during a NVDIMM data access. But building a file system or block store up around this may require some application level code.
  • The Run approach wasn’t described well in the blog post but it seems like SANdisk’s NVMFS approach which uses both standard NVMe SSDs and non-volatile memory to build a hybrid (NVDIMM-SSD) file system.

Symbolic IO as another run approach?

Symbolic IO computationally defined storage is intended to make use of NVDIMM technology and in the Store [update 12/16/16] appliance version has SSD storage as well in a hybrid NVDIMM-SSD run-like solution. The appliance has a full version of Linux SymCE which doesn’t use a file system or the PMEM library to access the data, it’s just byte addressable storage  with a PMEM file system embedded within [update 12/16/16]. This means that applications can use standard Linux file APIs to (directly) reference NVDIMM and the backend SSD storage.

It’s computationally defined because they use compute power to symbolically transform the data reducing data footprint in NVDIMM and subsequently in the SSD backing tier. Checkout the podcast to learn more

I came away from the podcast thinking that NVDIMMs are more prevalent than I thought. So, that’s what prompted this post.

Comments?

Photo Credit(s): UltraDIMM photo taken by Ray at SFD5, Architecture picture from pmem.io blog post

 

All flash storage performance testing

There are some serious problems with measuring IO performance of all flash arrays with what we use on disk storage systems. Mostly, these are due to the inherent differences between current flash- and disk-based storage.

NAND garbage collection

First off, garbage collection is required by any SSD or NAND storage to be able to write data. Garbage collection coalesces free space by moving non-modified data to new pages/blocks and freeing up the space held by old, no-longer current data.

The problem is NAND garbage collection takes place only after a suitable amount of write activity and measuring all-flash array storage system performance without taking into account garbage collection is misleading at best and dishonest at worse.

The only way to control for garbage collection is to write lots of data to a all-flash storage system and measure its performance over a protracted period of time. How long this takes is dependent on the amount of storage in an all flash array but filling it up to 75% of its capacity and then measuring IO performance as you fill up another 10-15% of its capacity with new data should suffice. Of course this would all have to be done consecutively, without any time off between runs (which would allow garbage collection to sneak in).

Flash data reduction

Second, many all flash arrays offer data reduction like data compression or deduplication. Standard IO benchmarks today don’t control for data reduction.

What we need is a standard corpus of reducible data for an IO workload. Such data would need to be able to be data compressed and data deduplicated. Unclear where such a data corpus could be found but one is needed to properly measure all flash system performance. What would help is some real world data reduction statistics, from a large number of customer installations that could help identify what real-world dedup and compression ratios look like. Then we could use these statistics to construct a suitable data load that can then be scaled and tailored to required performance needs.

Perhaps SNIA or maybe a big (government) customer could support the creation of this data corpus that can be used for “standard” performance testing. With real world statistics and a suitable data corpus, standard IO benchmarks could control for data reduction on flash arrays and better measure system performance.

Block IO differences

Third, block heat maps (access patterns) need to become much more realistic. For disk based systems it was important to randomize IO stream to minimize the advantage of DRAM caching. But with all flash storage arrays, cache is less useful and because flash can’t be rewritten in place, having IO occur to the same block (especially overwrites) causes NAND page fragmentation and more NAND write overhead.

~~~~

Only by controlling for garbage collection, using a standard, data reducible data load and returning to a cache friendly (or at least write cache friendly) workload we will truly understand all flash storage performance.

Comments?

Thanks to Larry Freeman (@Larry_Freeman) for the idea for today’s post.

Photo Credit(s): Race Faces by Jerome Rauckman

Coho Data, hyperloglog and the quest for IO performance

We were at SFD6, last month and Coho Data‘s CTO & Co-Founder, Andy Warfield got up to tell us what’s happening at Coho. (We also met with Andy at SFD4, check out the videolinks to learn more.)

What’s new at Coho Data

Coho Data has been shipping GA product for about 3 quarters and is a simple to use, scale-out, hybrid (SSD & disk) storage system for VMware NFS datastores. Coho Data storage uses Software Defined Networking (SDN) switches to perform faster networking handoffs and optimized data flow across storage nodes. They use standard servers and a SDN switch that can scale from two nodes (micro-arrays) to lots (100 or more?).

Version 2.0 will add remote asynch replication and enhanced API enhancements. We won’t discuss the update anymore but if you want your storage to tweet its messages/alerts check it out. Thank Chris Wahl when you start seeing storage system tweets pollute your  twitter feed.

The highlight of the session, was Andy’s discussion of HyperLogLog, a new approach to understanding customer workloads.

HyperLogLog

Coho Data was designed from the start using Microsoft IO traces (1-week of MSR Cambridge datacenter block IO traces available at SNIA IO Trace repository).  [bold italics added later, ed.] which recorded all IO from 10 But Coho also recorded linux developers developer desktop IO activity for a year, amounting to ~ 1B 7.6B IOs and multi-TBs of data. I just got a call looking for some file activity tracing, so everybody in storage could use more IO traces. But detailed IO traces take up CPU cycles and lot’s of space. HyperLogLogs can solve a portion of this.

Before we go there, a little background. For instance, with a Bloom Filter you can tell whether a block has been referenced or not. In a bloom filter you hash a key, term or whatever multiple times and then OR them into separate bitfields, one per hash. Bloom filters have a small possibility of a false positive (block-id present in filter but was not really in IO stream) but no possibility of a false negative (block-id NOT present in filter but it really was in IO stream). However, bloom filters tells us nothing about how frequently blocks were read.

With a HyperLogLog, one can approximate (within ~2%) how many times a block was referenced. By capturing multiple HyperLogLogs pictures over time, one can determine block access frequency during application processing. Each HyperLogLog trace only occupies ~2 KB, so recording one/hour takes ~50KB/day. The math is beyond me but there’s plenty info online (e.g. here).

HyperLogLog functionality will be included in a future Coho Data update. Coho Data will be implementing what they call “Counter Stacks” which makes use of hyperloglogs in a future release (see Jake Wire’s Usenix Session video/PDF)Once present, Coho Data will save hyperloglog counter stack data, analyze it, and use it to better characterize customer IO with the goal of better optimizing their storage system to actual workloads

For more info please see other SFD6 blogger posts on Coho Data:

~~~~

Now if someone could just develop a super efficient algorithm/storage structure to record block sequences I think we have this licked.

Disclosure statement: I have done work for Coho Data over the last year.

Picture credits: (Lego) Me holding a Coho (Data) Salmon 🙂

Fall SNWUSA 2013

Here’s my thoughts on SNWUSA which occurred this past week in the Long Beach Convention Center.

First, it was a great location. I saw a number of users I haven’t seen at SNWUSA ever before, some of which I have known for years from other (non-storage) venues.

Second, the exhibit hall was scantly populated. There were no major storage vendors at the show at all. Gold sponsors included NEC, Riverbed, & Sepaton, representing the largest exhibiters presenn. Making up the next (Contributing) tier were Western Digital, Toshiba, Active Archive Alliance, and LTO consortium with a smattering of smaller companies.  Finally, there were another 12 vendors with kiosks around the floor, with the largest there being Veeam Software.

I suspect VMWorld Europe happening the same time in Barcelona might have had something to do with the sparse exhibit floor but the trend has been present for the past few shows.

That being said there were still a few surprises in store, at least for me.  Two of the most interesting ones were:

  • Coho Data who came out of stealth with a scale out, RAIN (Redundant array of independent nodes) based storage cluster, with distributed, mirrored customer data across nodes and software defined networking. They currently support NFS for VMware with a management UI reminiscent of IOS 7 sans touch support. The product comes as a series of nodes with SSDs, disk storage and SDN. The SDN allows Coho Data to relocate front-end (client) connections to where the customer data lies. The distributed, mirrored backend storage provides redundancy in the case of a node/disk failure, at which time the system understands what data is now at risk and rebuilds the now-mirorless data onto other nodes. It reminds me a lot of Bycast/Archivas like architectures, with SDN and NFS support. I suppose the reason they are supporting VMware VMDKs is that the files are fairly large and thus easier to supply.
  • Cloud Physics was not exhibiting but they sponsored a break. As such, they were there talking with analysts and the press about their product. Their product installs as a VMware VM service and propagates VMware management agents to ESX servers which then pipe information back to their app about how your VMware environment is running, how VMs are performing, how your network and storage are performing for the VMs running, etc. This data is then sent to the cloud, where it’s anonymized. In the cloud, customers can use apps (called Cards) to analyze this data in the cloud, which can help them understand problem areas, predict what configuration changes can do for them, show them how VMs are performing, etc. It essentially is logging all this information to the cloud and providing ways to analyze the data to optimize your VMware environment.

Coming in just behind these two was Jeda Networks with their Software Defined Storage Network (SDSN). They use commodity (OpenFlow compatible) 10GbE switches to support a software FCoE storage SAN. Jeda Networks say that over the past two years,  most 10GbE switch hardware have started to support DCB in hardware and with that in place, plus OpenFlow compatibility, they can provide a SDSN on top of them just by emulating a control layer for FCoE switches. Of course one would still need FCoE storage and CNAs but with that in place one could use much cheaper switches to support FCoE.

CloudPhysics has a subscription based pricing model which offers three tiers:

  • Free where you get their Vapp, the management agents and a defined set of Free Card Apps for no cost;
  • Standard level where you get all the above plus a set of Card Apps which provide more VMware managability for $50/ESX server/Month; and
  • Enterprise level where you get all the above plus all the Card Apps presently available for $150/ESX server/Month.

Jeda networks and Coho Data are still developing their pricing and had none they were willing to disclose.

One of the CloudPhysics Card apps could predict how certain VMs would benefit from host based (PCIe or SSD) IO caching. They had a chart which showed working set inflection points for (I think) one VM running an OLTP application.  I have asked for this chart to discuss further in a future post.  But although CloudPhysics has the data to produce such a chart, the application shows three potential break points where say adding 500MB, 2000MB or 10000MB of SSD cache can speed up application performance by 10%, 30% or 50% (numbers here made up for example purposes and not off the chart they showed me).

A few other companies made announcements at the show. For example, Sepaton announced their new VirtuoSO, scale out hybrid reduplication appliance.

That’s about it. I would have to say that SNW needs to rethink their business model, frequency of stows or what they are trying to do at their conferences. However, on the plust side, most of the users I talked with came away with a lot of information and thought the show was worthwhile and I came away with a few surprises.

~~~~

Comments?

Fall SNWUSA wrap-up

Attended SNWUSA this week in San Jose,  It’s hard to see the show gradually change when you attend each one but it does seem that the end-user content and attendance is increasing proportionally.  This should bode well for future SNWs. Although, there was always a good number of end users at the show but the bulk of the attendees in the past were from storage vendors.

Another large storage vendor dropped their sponsorship.  HDS no longer sponsors the show and the last large vendor still standing at the show is HP.  Some of this is cyclical, perhaps the large vendors will come back for the spring show, next year in Orlando, Fl.  But EMC, NetApp and IBM seemed to have pretty much dropped sponsorship for the last couple of shows at least.

SSD startup of the show

Skyhawk hardware (c) 2012 Skyera, all rights reserved (from their website)
Skyhawk hardware (c) 2012 Skyera, all rights reserved (from their website)

The best, new SSD startup had to be Skyera. A 48TB raw flash dual controller system supporting iSCSI block protocol and using real commercial grade MLC.  The team at Skyera seem to be all ex-SandForce executives and technical people.

Skyera’s team have designed a 1U box called the Skyhawk, with  a phalanx of NAND chips, there own controller(s) and other logic as well. They support software compression and deduplication as well as a special designed RAID logic that claims to reduce extraneous write’s to something just over 1 for  RAID 6, dual drive failure equivalent protection.

Skyera’s underlying belief is that just as consumer HDAs took over from the big monster 14″ and 11″ disk drives in the 90’s sooner or later commercial NAND will take over from eMLC and SLC.  And if one elects to stay with the eMLC and SLC technology you are destined to be one to two technology nodes behind. That is, commercial MLC (in USB sticks, SD cards etc) is currently manufactured with 19nm technology.  The EMLC and SLC NAND technology is back at 24 or 25nm technology.  But 80-90% of the NAND market is being driven by commercial MLC NAND.  Skyera came out this past August.

Coming in second place was Arkologic an all flash NAS box using SSD drives from multiple vendors. In their case a fully populated rack holds about 192TB (raw?) with an active-passive controller configuration.  The main concern I have with this product is that all their metadata is held in UPS backed DRAM (??) and they have up to 128GB of DRAM in the controller.

Arkologic’s main differentiation is supporting QOS on a file system basis and having some connection with a NIC vendor that can provide end to end QOS.  The other thing they have is a new RAID-AS which is special designed for Flash.

I just hope their USP is pretty hefty and they don’t sell it someplace where power is very flaky, because when that UPS gives out, kiss your data goodbye as your metadata is held nowhere else – at least that’s what they told me.

Cloud storage startup of the show

There was more cloud stuff going on at the show. Talked to at least three or four cloud gateway providers.  But the cloud startup of the show had to be Egnyte.  They supply storage services that span cloud storage and on premises  storage with an in band or out-of-band solution and provide file synchronization services for file sharing across multiple locations.  They have some hooks into NetApp and other major storage vendor products that allows them to be out-of-band for these environments but would need to be inband for other storage systems.  Seems an interesting solution that if succesful may help accelerate the adoption of cloud storage in the enterprise, as it makes transparent whether storage you access is local or in the cloud. How they deal with the response time differences is another question.

Different idea startup of the show

The new technology showplace had a bunch of vendors some I had never heard of before but one that caught my eye was Actifio. They were at VMworld but I never got time to stop by.  They seem to be taking another shot at storage virtualization. Only in this case rather than focusing on non-disruptive file migration they are taking on the task of doing a better job of point in time copies for iSCSI and FC attached storage.

I assume they are in the middle of the data path in order to do this and they seem to be using copy-on-write technology for point-in-time snapshots.  Not sure where this fits, but I suspect SME and maybe up to mid-range.

Most enterprise vendors have solved these problems a long time ago but at the low end, it’s a little more variable.  I wish them luck but although most customers use snapshots if their storage has it, those that don’t, seem unable to understand what they are missing.  And then there’s the matter of being in the data path?!

~~~~

If there was a hybrid startup at the show I must have missed them. Did talk with Nimble Storage and they seem to be firing on all cylinders.  Maybe someday we can do a deep dive on their technology.  Tintri was there as well in the new technology showcase and we talked with them earlier this year at Storage Tech Field Day.

The big news at the show was Microsoft purchasing StorSimple a cloud storage gateway/cache.  Apparently StorSimple did a majority of their business with Microsoft’s Azure cloud storage and it seemed to make sense to everyone.

The SNIA suite was hopping as usual and the venue seemed to work well.  Although I would say the exhibit floor and lab area was a bit to big. But everything else seemed to work out fine.

On Wednesday, the CIO from Dish talked about what it took to completely transform their IT environment from a management and leadership perspective.  Seemed like an awful big risk but they were able to pull it off.

All in all, SNW is still a great show to learn about storage technology at least from an end-user perspective.  I just wish some more large vendors would return once again, but alas that seems to be a dream for now.

SNIA CDMI plugfest for cloud storage and cloud data services

Plug by Samuel M. Livingston (cc) (from Flickr)
Plug by Samuel M. Livingston (cc) (from Flickr)

Was invited to the SNIA tech center to witness the CDMI (Cloud Data Managament Initiative) plugfest that was going on down in Colorado Springs.

It was somewhat subdued. I always imagine racks of servers, with people crawling all over them with logic analyzers, laptops and other electronic probing equipment.  But alas, software plugfests are generally just a bunch of people with laptops, ethernet/wifi connections all sitting around a big conference table.

The team was working to define an errata sheet for CDMI v1.0 to be completed prior to ISO submission for official standardization.

What’s CDMI?

CDMI is an interface standard for clients talking to cloud storage servers and provides a standardized way to access all such services.  With CDMI you can create a cloud storage container, define it’s attributes, and deposit and retrieve data objects within that container.  Mezeo had announced support for CDMI v1.0 a couple of weeks ago at SNW in Santa Clara.

CDMI provides for attributes to be defined at the cloud storage server, container or data object level such as: standard redundancy degree (number of mirrors, RAID protection), immediate redundancy (synchronous), infrastructure redundancy (across same storage or different storage), data dispersion (physical distance between replicas), geographical constraints (where it can be stored), retention hold (how soon it can be deleted/modified), encryption, data hashing (having the server provide a hash used to validate end-to-end data integrity), latency and throughput characteristics, sanitization level (secure erasure), RPO, and RTO.

A CDMI client is free to implement compression and/or deduplication as well as other storage efficiency characteristics on top of CDMI server characteristics.  Probably something I am missing here but seems pretty complete at first glance.

SNIA has defined a reference implementations of a CDMI v1.0 server [and I think client] which can be downloaded from their CDMI website.  [After filling out the “information on me” page, SNIA sent me an email with the download information but I could only recognize the CDMI server in the download information not the client (although it could have been there). The CDMI v1.0 specification is freely available as well.] The reference implementation can be used to test your own CDMI clients if you wish. They are JAVA based and apparently run on Linux systems but shouldn’t be too hard to run elsewhere. (one CDMI server at the plugfest was running on a Mac laptop).

Plugfest participants

There were a number people from both big and small organizations at SNIA’s plugfest.

Mark Carlson from Oracle was there and seemed to be leading the activity. He said I was free to attend but couldn’t say anything about what was and wasn’t working.  Didn’t have the heart to tell him, I couldn’t tell what was working or not from my limited time there. But everything seemed to be working just fine.

Carlson said that SNIA’s CDMI reference implementations had been downloaded 164 times with the majority of the downloads coming from China, USA, and India in that order. But he said there were people in just about every geo looking at it.  He also said this was the first annual CDMI plugfest although they had CDMI v0.8 running at other shows (i.e, SNIA SDC) before.

David Slik, from NetApp’s Vancouver Technology Center was there showing off his demo CDMI Ajax client and laptop CDMI server.  He was able to use the Ajax client to access all the CDMI capabilities of the cloud data object he was presenting and displayed the binary contents of an object.  Then he showed me the exact same data object (file) could be easily accessed by just typing in the proper URL into any browser, it turned out the binary was a GIF file.

The other thing that Slik showed me was a display of a cloud data object which was created via a “Cron job” referencing to a satellite image website and depositing the data directly into cloud storage, entirely at the server level.  Slik said that CDMI also specifies a cloud storage to cloud storage protocol which could be used to move cloud data from one cloud storage provider to another without having to retrieve the data back to the user.  Such a capability would be ideal to export user data from one cloud provider and import the data to another cloud storage provider using their high speed backbone rather than having to transmit the data to and from the user’s client.

Slik was also instrumental in the SNIA XAM interface standards for archive storage.  He said that CDMI is much more light weight than XAM, as there is no requirement for a runtime library whatsoever and only depends on HTTP standards as the underlying protocol.  From his viewpoint CDMI is almost XAM 2.0.

Gary Mazzaferro from AlloyCloud was talking like CDMI would eventually take over not just cloud storage management but also local data management as well.  He called the CDMI as a strategic standard that could potentially be implemented in OSs, hypervisors and even embedded systems to provide a standardized interface for all data management – cloud or local storage.  When I asked what happens in this future with SMI-S he said they would co-exist as independent but cooperative management schemes for local storage.

Not sure how far this goes.  I asked if he envisioned a bootable CDMI driver? He said yes, a BIOS CDMI driver is something that will come once CDMI is more widely adopted.

Other people I talked with at the plugfest consider CDMI as the new web file services protocol akin to NFS as the LAN file services protocol.  In comparison, they see Amazon S3 as similar to CIFS (SMB1 & SMB2) in that it’s a proprietary cloud storage protocol but will also be widely adopted and available.

There were a few people from startups at the plugfest, working on various client and server implementations.  Not sure they wanted to be identified nor for me to mention what they were working on. Suffice it to say the potential for CDMI is pretty hot at the moment as is cloud storage in general.

But what about cloud data consistency?

I had to ask about how the CDMI standard deals with eventual consistency – it doesn’t.  The crowd chimed in, relaxed consistency is inherent in any distributed service.  You really have three characteristics Consistency, Availability and Partitionability (CAP) for any distributed service.  You can elect to have any two of these, but must give up the third.  Sort of like the Hiesenberg uncertainty principal applied to data.

They all said that consistency is mainly a CDMI client issue outside the purview of the standard, associated with server SLAs, replication characteristics and other data attributes.   As such, CDMI does not define any specification for eventual consistency.

Although, Slik said that the standard does guarantee if you modify an object and then request a copy of it from the same location during the same internet session, that it be the one you last modified.  Seems like long odds in my experience.   Unclear how CDMI, with relaxed consistency can ever take the place of primary storage in the data center but maybe it’s not intended to.

—–

Nonetheless, what I saw was impressive, cloud storage from multiple vendors all being accessed from the same client, using the same protocols.  And if that wasn’t simple enough for you, just use your browser.

If CDMI can become popular it certainly has the potential to be the new web file system.

Comments?

 

SNIA illuminates storage power efficiency

Untitled by johnwilson1969 (cc) (from Flickr)
Untitled by johnwilson1969 (cc) (from Flickr)

At SNW, a couple of weeks back, SNIA annouced the coming out of their green storage initiative’s new SNIA Emerald Program and the first public draft release of their storage power efficiency test  specification.  Up until now, other than SPC and some pronouncements from EPA there hasn’t been much standardization activity on how to measure storage power efficiency.

SNIA’s Storage Power Efficiency Specification

As such, SNIA felt there was a need for an industry standard on how to measure storage power use.  SNIA’s specification supplies a taxonomy for storage systems that can be used to define and categorize various storage systems. Their extensive taxonomy should minimize problems like comparing consumer storage power use against data center storage power use.  Also, the specification identifies storage use attributes such as deduplication and thin provisioning or capacity optimization features that can impact power efficiency.

In addition, the specification has two appendices:

  • Appendix A specifies the valid power and environmental meters that are to be used to measure power efficiency of the system under test.
  • Appendix B specifies the benchmark tool that is used to drive the system under test while its power efficiency is being measured.

Essentially, there are two approved benchmark drivers used to drive IOs in the online storage category Iometer and vdbench both of which are freely available.  Iometer has been employed for quite awhile now in vendor benchmarking activity.  In contrast, vdbench is a relative newcomer but I have worked with its author, Henk Vandenbergh, over many years now and he is a consummate performance analyst.  I look forward to seeing how Henk’s vdbench matures over time.

Given the spec’s taxonomy and the fact that it lists online, near-online, removable media, virtual media and adjunct storage device categories with multiple sub-categories for each, we will focus only on the online family of storage and save the rest for later.

SPC energy efficiency measures

As my readers should recall, the Storage Performance Council (SPC) also has benchmarks that measure energy use with their SPC-1/E and SPC-1C/E reports (see our SPC-1 IOPS per Watt post).  The interesting part about SPC-1/E results is that there are definite IOPS levels where storage power use undergoes significant transitions.

One can examine a SPC-1/E Executive Summary report and see power use at various IO intensity levels, i.e., 100%, 95%, 90%, 85%, 80%, 50%, 10% and 0% (or idle) for a storage subsystem under test.   SPC summarizes these detail power measurements by defining profiles for “Low”, “Medium” and “Heavy” storage system use.  But the devils often in the details and having all the above measurements allows one to calculate whatever activity profile works best for you.

Unfortunately, only a few SPC-1/E reports have been submitted to date and it has yet to take off.

SNIA alternative power efficiency metrics

Enter SNIA’s Emerald program, which is supposed to be an easier and quicker way to measure storage power use.  In addition to the specification, SNIA has established a website (see above) to hold SNIA approved storage power efficiency results and a certification program for auditors that can be used to verify vendor power efficiency testing meet all specification requirements.

What’s missing from the present SNIA power efficiency test specification are the following:

  • More strict IOPS level definitions – the specification refers to IO intensity but doesn’t provide an adequate definition from my perspective.  It says that subsystem response time cannot exceed 30msec and uses this to define 100% IO intensity for the workloads.  However given this definition it could apply to random read, random write, or mixed workloads and there is no separate specification for sequential or random (and/or mixed) workloads.  This could be tightened up
  • More IO intensity levels measured – the specification calls for power measurements at an IO intensity of 100% for all workloads and 25% for 70:30 R:W workloads for online storage.  However we would be more interested in also seeing 80% and 10%.  From a user perspective, 80% probably represents a heavy sustainable IO workload and 10% looks like a complete cache hit workload.  We would only measure these levels for the “Mixed workload” so as to minimize effort.
  • More write activity in “Mixed workloads” – the specification defines mixed workload as 70% read and 30% write random IO activity.  Given today’s O/S propensity to buffer read data, it would seem more prudent to use a 50:50 Read to Write mix.

Probably other items need more work as well, such as defining a standardized reporting format containing a detailed description of HW and SW of system under test, benchmark driver HW and SW, table for reporting all power efficiency metrics and inclusion of full benchmark report including input parameter specifications and all outputs, etc. but these are nits.

Finally, SNIA’s specification goes into much detail about capacity optimization testing which includes things like compression, deduplication, thin provisioning, delta-snapshotting, etc. with an intent to measure storage system power use when utilizing these capabilities.  This is a significant and complex undertaking to define how each of these storage features will be configured and used during power measurement testing.  Although SNIA should be commended for their efforts here, this seems to much to take on at the start.  We suggest capacity optimization testing definitions should be deferred to a later release and focus now on the more standard storage power efficiency measurements.

—-

I critique specifications at my peril.  Being wrong in the past has caused me to re-double efforts to insure a correct interpretation of any specification.  However, if there’s something I have misconstrued or missed here that are worthy of note please feel free to comment.

Database appliances!?

The Sun Oracle Database Machine by Oracle OpenWorld San Francisco 2009 (cc) (from Flickr)
The Sun Oracle Database Machine by Oracle OpenWorld San Francisco 2009 (cc) (from Flickr)

Was talking with Oracle the other day and discussing their Exadata database system.  They have achieved a lot of success with this product.  All of which got me to wondering whether database specific storage ever makes sense.  I suppose the ultimate arbiter of “making sense” is commercial viability and Oracle and others have certainly proven this, but from a technologist perspective I still wonder.

In my view, the Exadate system combines database servers and storage servers in one rack (with extensions to other racks).  They use an Infiniband bus between the database and storage servers and have a proprietary storage access protocol between the two.

With their proprietary protocol they can provide hints to the storage servers as to what’s coming next and how to manage the database data which make the Exadata system a screamer of a database machine.  Such hints can speed up database query processing, more efficiently store database structures, and overall speed up Oracle database activity.  Given all that it makes sense to a lot of customers.

Now, there are other systems which compete with Exadata like Teradata and Netezza (am I missing anyone?) that also support onboard database servers and storage servers.  I don’t know much about these products but they all seem targeted at data warehousing and analytics applications similar to Exadata but perhaps more specialized.

  • As far as I can tell Teradata has been around for years since they were spun out of NCR (or AT&T) and have enjoyed tremendous success.  The last annual report I can find for them shows their ’09 revenue around $772M with net income $254M.
  • Netezza started in 2000 and seems to be doing OK in the database appliance market given their youth.  Their last annual report for ’10 showed revenue of ~$191M and net income of $4.2M.  Perhaps not doing as well as Teradata but certainly commercially viable.

The only reason database appliances or machines exist is to speed up database processing.  If they can do that then they seem able to build a market place for themselves.

Database to storage interface standards

The key question from a storage analyst perspective is shouldn’t there be some sort of standards committee, like SNIA or others, that work to define a standard protocol between database servers and storage that can be adopted by other storage vendors.  I understand the advantage that proprietary interfaces can supply to an enlightened vendor’s equipment but there are more database vendors out there than just Oracle, Teradata and Netezza and there are (at least for the moment) many more storage vendors out there as well.

A decade or so ago, when I was with another storage company we created a proprietary interface for backup activity and it sold ok but in the end it didn’t sell enough to be worthwhile for either the backup or storage company to continue the approach.  At the time we were looking to support another proprietary interface for sorting but couldn’t seem to justify it.

Proprietary interfaces tend to lock customers in and most customers will only accept lockin if there is a significant advantage to your functionality.  But customer lock-in can lull vendors into not investing R&D funding in the latest technology and over time this affect will cause the vendor to lose any advantage they previously enjoyed.

It seems to me that the more successful companies (with the possible exception of Apple) tend to focus on opening up their interfaces rather than closing them down.  By doing so they introduce more competition which serves their customers better, in the long run.

I am not saying that if Oracle would standardize/publicize their database server to storage server interface that there would be a lot of storage vendors going after that market.  But the high revenues in this market, as evident from Teradata and Netezza, would certainly interest a few select storage vendors.  Now not all of Teradata’s or Netezza’s revenues derive from pure storage sales but I would wager a significant part do.

Nevertheless, a standard database storage protocol could readily be defined by existing database vendors in conjunction with SNIA.  Once defined, I believe some storage vendors would adopt this protocol along with every other storage protocol (iSCSI, FCoE, FC, FCIP, CIFS, NFS, etc.). Once that occurs, customers across the board would benefit from the increased competition and break away from the current customer lock-in with today’s database appliances.

Any significant success in the market from early storage vendor adopters of this protocol would certainly interest other  vendors inducing a cycle of increased adoption, higher competition, and better functionality.  In the end, database customers world wide will benefit from the increased price performance available in the open market.  And in the end that makes a lot more sense to me than the database appliances of today.

As to why Apple has excelled within a closed system environment, that will need to wait for a future post.