Microsoft ESRP database access latencies – chart of the month

sciesrp1030-001
The above chart was included in last month’s SCI e-Newsletter and depicts recent Microsoft Exchange (2013) Solution Reviewed Program (ESRP) results for database access latencies. Storage systems new to this 5000 mailbox and over category include, Oracle, Pure and Tegile. As all of these systems are all flash arrays we are starting to see significant reductions in database access latencies and the #4 system (Nimble Storage) was a hybrid (disk and flash) array.

As you recall, ESRP reports on three database access latencies: read database, write database and log write. All three are shown above but we sort and rank this list based on database read activity alone.

Hard to see above, but reading the ESRP reports, one finds that the top 3 systems had 1.04, 1.06 and 1.07 milliseconds. average read database latencies. So the separation between the top 3 is less than 40 microseconds.

The top 3 database write access latencies were 1.75, 1.62 and 3.07 milliseconds, respectively. So if we were ranking the above on write response times Pure would have come in #1.

The top 3 log write access latencies were 0.67, 0.41 and 0.82 milliseconds and once again if we were ranking based on log response times Pure would be #1.

It’s unclear whether Exchange customers would want to deploy AFAs for their database and log files but these three ESRP reports and Nimble’s show that there should be no problem with the performance of AFAs in these environments.

What about data reduction?

Unclear to me is how much data reduction technologies played in the AFA and hybrid solution ESRP performance. Data reduction advantages would most likely show up in database IOPS counts more so than response times but if present, may still reduce access latencies, as there would be potentially less data to be transferred to-from the backend of the storage system into-out of storage system cache.

ESRP reports do not officially report on a vendor’s data reduction effectiveness, so we are left with whatever the vendor decides to say.

In that respect, Pure FlashArray//m20 indicated in their ESRP report that their “data reduction is significantly higher” than what they see normally (4:1) because Jetstress (ESRP benchmark program) generates lots of duplicated data.

I couldn’t find anything that Tegile T3800 or Nimble Storage said similar to the above, indicating how well their data reduction technologies worked in Jetstress as compared to normal. However, they did make a reference to their compression effectiveness in database size but I have found this number to be somewhat less effective as it historically showed the amount of over provisioning used by disk-only systems and for AFA’s and hybrid – storage, it’s unclear how much is data reduction effectiveness vs. over provisioning.

For example, Pure, Tegile and Nimble also reported a “database capacity utilization” of 4.2%60% and 74.8% respectively. And Nimble did report that over their entire customer base, Exchange data has on average, a 61.2% capacity savings.

So you tell me what was the effective data reduction for their Pure’s, Tegile’s and Nimble’s respective Jetstress runs? From my perspective Pure’s report of 4.2% looks about right (that says that actual database data fit in 4.2% of SSD storage for a ~23.8:1 reduction effectiveness for Jetstress/ESRP data. I find it harder to believe what Tegile and Nimble have indicated as it doesn’t seem to be as believable as they would imply a 1.7:1 and 1.33:1 reduction effectiveness for Jetstress/ESRP data.

Oracle FS1-2 doesn’t seem to have any data reduction capabilities and reported a 100% storage capacity used by Exchange database.

So that’s it, Jetstress uses “significantly reducible” data for some AFAs systems. But in the field the advantage of data reduction techniques are much less so.

I think it’s time that ESRP stopped using significantly reducible data in their Jetstress program and tried to more closely mimic real world data.

Want more?

The October 2016 and our other ESRP reports have much more information on Microsoft Exchange performance. Moreover, there’s a lot more performance information, covering email and other (OLTP and throughput intensive) block storage workloads, in our SAN Storage Buying Guide, available for purchase on our website. More information on file and block protocol/interface performance is also included in SCI’s SAN-NAS Buying Guidealso available from our website. And if your interested in file system performance please consider purchasing our NAS Buying Guide also available on our website.

~~~~

The complete ESRP performance report went out in SCI’s October 2016 Storage Intelligence e-newsletter.  A copy of the report will be posted on our SCI dispatches (posts) page over the next quarter or so (if all goes well).  However, you can get the latest storage performance analysis now and subscribe to future free SCI Storage Intelligence e-newsletters, by just using the signup form in the sidebar or you can subscribe here.

As always, we welcome any suggestions or comments on how to improve our ESRP  performance reports or any of our other storage performance analyses.

 

Oracle (finally) releases StorageTek VSM6

[Full disclosure: I helped develop the underlying hardware for VSM 1-3 and also way back, worked on HSC for StorageTek libraries.]

Virtual Storage Manager System 6 (VSM6) is here. Not exactly sure when VSM5 or VSM5E were released but it seems like an awful long time in Internet years.  The new VSM6 migrates the platform to Solaris software and hardware while expanding capacity and improving performance.

What’s VSM?

Oracle StorageTek VSM is a virtual tape system for mainframe, System z environments.  It provides a multi-tiered storage system which includes both physical disk and (optional) tape storage for long term big data requirements for z OS applications.

VSM6 emulates up to 256 virtual IBM tape transports but actually moves data to and from VSM Virtual Tape Storage Subsystem (VTSS) disk storage and backend real tape transports housed in automated tape libraries.  As VSM data ages, it can be migrated out to physical tape such as a StorageTek SL8500 Modular [Tape] Library system that is attached behind the VSM6 VTSS or system controller.

VSM6 offers a number of replication solutions for DR to keep data in multiple sites in synch and to copy data to offsite locations.  In addition, real tape channel extension can be used to extend the VSM storage to span onsite and offsite repositories.

One can cluster together up to 256 VSM VTSSs  into a tapeplex which is then managed under one pane of glass as a single large data repository using HSC software.

What’s new with VSM6?

The new VSM6 hardware increases volatile cache to 128GB from 32GB (in VSM5).  Non-volatile cache goes up as well, now supporting up to ~440MB, up from 256MB in the previous version.  Power, cooling and weight all seem to have also gone up (the wrong direction??) vis a vis VSM5.

The new VSM6 removes the ESCON option of previous generations and moves to 8 FICON and 8 GbE Virtual Library Extension (VLE) links. FICON channels are used for both host access (frontend) and real tape drive access (backend).  VLE was introduced in VSM5 and offers a ZFS based commodity disk tier behind the VSM VTSS for storing data that requires longer residency on disk.  Also, VSM supports a tapeless or disk-only solution for high performance requirements.

System capacity moves from 90TB (gosh that was a while ago) to now support up to 1.2PB of data.  I believe much of this comes from supporting the new T10,000C tape cartridge and drive (5TB uncompressed).  With the ability of VSM to cluster more VSM systems to the tapeplex, system capacity can now reach over 300PB.

Somewhere along the way VSM started supporting triple redundancy  for the VTSS disk storage which provides better availability than RAID6.  Not sure why they thought this was important but it does deal with increasing disk failures.

Oracle stated that VSM6 supports up to 1.5GB/Sec of throughput. Presumably this is landing data on disk or transferring the data to backend tape but not both.  There doesn’t appear to be any standard benchmarking for these sorts of systems so, will take their word for it.

Why would anyone want one?

Well it turns out plenty of mainframe systems use tape for a number of things such as data backup, HSM, and big data batch applications.  Once you get past the sunk  costs for tape transports, automation, cartridges and VSMs, VSM storage can be a pretty competitive data storage solution for the mainframe environment.

The fact that most mainframe environments grew up with tape and have long ago invested in transports, automation and new cartridges probably makes VSM6 an even better buy.  But tape is also making a comeback in open systems with LTO-5 and now LTO-6 coming out and with Oracle’s 5TB T10000C cartridge and IBM’s 4TB 3592 JC cartridge.

Not to mention Linear Tape File System (LTFS) as a new tape format that provides a file system for tape data which has brought renewed interest in all sorts of tape storage applications.

Competition not standing still

EMC introduced their Disk Library for Mainframe 6000 (DLm6000) product that supports two different backends to deal with the diversity of tape use in the mainframe environment.  Moreover, IBM has continuously enhanced their Virtual Tape Server the TS7700 but I would have to say it doesn’t come close to these capacities.

Lately, when I talked with long time StorageTek tape mainframe customers they have all said the same thing. When is VSM6 coming out and when will Oracle get their act in gear and start supporting us again.  Hopefully this signals a new emphasis on this market.  Although who is losing and who is winning in the mainframe tape market is the subject of much debate, there is no doubt that the lack of any update to VSM has hurt Oracle StorageTek tape business.

Something tells me that Oracle may have fixed this problem.  We hope that we start to see some more timely VSM enhancements in the future, for their sake and especially for their customers.

~~~~

Comments?

~~~~

Image credit: Interior of StorageTek tape library at NERSC (2) by Derrick Coetzee

 

Tape vs. Disk, the saga continues

Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)
Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)

Was on a call late last month where Oracle introduced their latest generation T1000C tape system (media and drive) holding 5TB native (uncompressed) capacity. In the last 6 months I have been hearing about the coming of a 3TB SATA disk drive from Hitachi GST and others. And last month, EMC announced a new Data Domain Archiver, a disk only archive appliance (see my post on EMC Data Domain products enter the archive market).

Oracle assures me that tape density is keeping up if not gaining on disk density trends and capacity. But density or capacity are not the only issues causing data to move off of tape in today’s enterprise data centers.

“Dedupe Rulz”

A problem with the data density trends discussion is that it’s one dimensional (well literally it’s 2 dimensional). With data compression, disk or tape systems can easily double the density on a piece of media. But with data deduplication, the multiples start becoming more like 5X to 30X depending on frequency of full backups or duplicated data. And number’s like those dwarf any discussion of density ratios and as such, get’s everyone’s attention.

I can remember talking to an avowed tape enginerr, years ago and he was describing deduplication technology at the VTL level as being architecturally inpure and inefficient. From his perspective it needed to be done much earlier in the data flow. But what they failed to see was the ability of VTL deduplication to be plug-compatible with the tape systems of that time. Such ease of adoption allowed deduplication systems to build a beach-head and economies of scale. From there such systems have no been able to move up stream, into earlier stages of the backup data flow.

Nowadays, what with Avamar, Symantec Pure Disk and others, source level deduplication, or close by source level deduplication is a reality. But all this came about because they were able to offer 30X the density on a piece of backup storage.

Tape’s next step

Tape could easily fight back. All that would be needed is some system in front of a tape library that provided deduplication capabilities not just to the disk media but the tape media as well. This way the 30X density over non-deduplicated storage could follow through all the way to the tape media.

In the past, this made little sense because a deduplicated tape would require potentially multiple volumes in order to restore a particular set of data. However, with today’s 5TB of data on a tape, maybe this doesn’t have to be the case anymore. In addition, by having a deduplication system in front of the tape library, it could support most of the immediate data restore activity while data restored from tape was sort of like pulling something out of an archive and as such, might take longer to perform. In any event, with LTO’s multi-partitioning and the other enterprise class tapes having multiple domains, creating a structure with meta-data partition and a data partition is easier than ever.

“Got Dedupe”

There are plenty of places, that today’s tape vendors can obtain deduplication capabilities. Permabit offers Dedupe code for OEM applications for those that have no dedupe systems today. FalconStor, Sepaton and others offer deduplication systems that can be OEMed. IBM, HP, and Quantum already have tape libraries and their own dedupe systems available today all of which can readily support a deduplicating front-end to their tape libraries, if they don’t already.

Where “Tape Rulz”

There are places where data deduplication doesn’t work very well today, mainly rich media, physics, biopharm and other non-compressible big-data applications. For these situations, tape still has a home but for the rest of the data center world today, deduplication is taking over, if it hasn’t already. The sooner tape get’s on the deduplication bandwagon the better for the IT industry.

—-

Of course there are other problems hurting tape today. I know of at least one large conglomerate that has moved all backup off tape altogether, even data which doesn’t deduplicate well (see my previous Oracle RMAN posts). And at least another rich media conglomerate that is considering the very same move. For now, tape has a safe harbor in big science, but it won’t last long.

Comments?

Poor deduplication with Oracle RMAN compressed backups

Oracle offices by Steve Parker (cc) (from Flickr)
Oracle offices by Steve Parker (cc) (from Flickr)

I was talking with one large enterprise customer today and he was lamenting how poorly Oracle RMAN compressed backupsets dedupe. Apparently, non-compressed RMAN backup sets generate anywhere from 20 to 40:1 deduplication ratios but when they use RMAN backupset compression, their deduplication ratios drop down to 2:1.  Given that RMAN compression probably only adds another 2:1 compression ratio then the overall data reduction becomes something ~4:1.

RMAN compression

It turns out Oracle RMAN supports two different compression algorithms that can be used zlib (or gzip) and bzip2.  I assume the default is zlib and if you want to one can specify bzip2 for even higher compression rates with the commensurate slower or more processor intensive compression activity.

  • Zlib is pretty standard repeating strings elimination followed by Huffman coding which uses shorter bit strings to represent more frequent characters and longer bit strings to represent less frequent characters.
  • Bzip2 also uses Huffman coding but only after a number of other transforms such as run length encoding (changing duplicated characters to a count:character sequence), Burrows–Wheeler transform (changes data stream so that repeating characters come together), move-to-front transform (changes data stream so that all repeating character strings are moved to the front), another run length encoding step, huffman encoding, followed by another couple of steps to decrease the data length even more…

The net of all this is that a block of data that is bzip2 encoded may look significantly different if even one character is changed.  Similarly, even zlib compressed data will look different with a single character insertion, but perhaps not as much.  This will depend on the character and where it’s inserted but even if the new character doesn’t change the huffman encoding tree, adding a few bits to a data stream will necessarily alter its byte groupings significantly downstream from that insertion. (See huffman coding to learn more).

Deduplicating RMAN compressed backupsets

Sub-block level deduplication often depends on seeing the same sequence of data that may be skewed or shifted by one to N bytes between two data blocks.  But as discussed above, with bzip2 or zlib (or any huffman encoded) compression algorithm the sequence of bytes looks distinctly different downstream from any character insertion.

One way to obtain decent deduplication rates from RMAN compressed backupsets would be to decompress the data at the dedupe appliance and then run the deduplication algorithm on it – dedupe appliance ingestion rates would suffer accordingly.  Another approach is to not use RMAN compressed backupsets but the advantages of compression are very appealing such as less network bandwidth, faster backups (because they are not transferring as much data), and quicker restores.

Oracle RMAN OST

On the other hand, what might work is some form of Data Domain OST/Boost like support from Oracle RMAN which would partially deduplicate the data at the RMAN server and then send the deduplicated stream to the dedupe appliance.  This would provide less network bandwidth and faster backups but may not do anything for restores.  Perhaps a tradeoff worth investigating.

As for the likelihood that Oracle would make such services available to deduplicatione vendors, I would have said this was unlikely but ultimately the customers have a say here.   It’s unclear why Symantec created OST but it turned out to be a money maker for them and something similar could be supported by Oracle.  Once an Oracle RMAN OST-like capability was in place, it shouldn’t take much to provide Boost functionality on top of it.  (Although EMC Data Domain is the only dedupe vendor that has Boost yet for OST or their own Networker Boost version.)

—-

When I first started this post I thought that if the dedupe vendors just understood the format of the RMAN compressed backupsets they would be able to have the same dedupe ratios as seen for normal RMAN backupsets.  As I investigated the compression algorithms being used I became convinced that it’s a computationally “hard” problem to extract duplicate data from RMAN compressed backupsets and ultimately would probably not be worth it.

So, if you use RMAN backupset compression, probably ought to avoid deduplicating this data for now.

Anything I missed here?

Database appliances!?

The Sun Oracle Database Machine by Oracle OpenWorld San Francisco 2009 (cc) (from Flickr)
The Sun Oracle Database Machine by Oracle OpenWorld San Francisco 2009 (cc) (from Flickr)

Was talking with Oracle the other day and discussing their Exadata database system.  They have achieved a lot of success with this product.  All of which got me to wondering whether database specific storage ever makes sense.  I suppose the ultimate arbiter of “making sense” is commercial viability and Oracle and others have certainly proven this, but from a technologist perspective I still wonder.

In my view, the Exadate system combines database servers and storage servers in one rack (with extensions to other racks).  They use an Infiniband bus between the database and storage servers and have a proprietary storage access protocol between the two.

With their proprietary protocol they can provide hints to the storage servers as to what’s coming next and how to manage the database data which make the Exadata system a screamer of a database machine.  Such hints can speed up database query processing, more efficiently store database structures, and overall speed up Oracle database activity.  Given all that it makes sense to a lot of customers.

Now, there are other systems which compete with Exadata like Teradata and Netezza (am I missing anyone?) that also support onboard database servers and storage servers.  I don’t know much about these products but they all seem targeted at data warehousing and analytics applications similar to Exadata but perhaps more specialized.

  • As far as I can tell Teradata has been around for years since they were spun out of NCR (or AT&T) and have enjoyed tremendous success.  The last annual report I can find for them shows their ’09 revenue around $772M with net income $254M.
  • Netezza started in 2000 and seems to be doing OK in the database appliance market given their youth.  Their last annual report for ’10 showed revenue of ~$191M and net income of $4.2M.  Perhaps not doing as well as Teradata but certainly commercially viable.

The only reason database appliances or machines exist is to speed up database processing.  If they can do that then they seem able to build a market place for themselves.

Database to storage interface standards

The key question from a storage analyst perspective is shouldn’t there be some sort of standards committee, like SNIA or others, that work to define a standard protocol between database servers and storage that can be adopted by other storage vendors.  I understand the advantage that proprietary interfaces can supply to an enlightened vendor’s equipment but there are more database vendors out there than just Oracle, Teradata and Netezza and there are (at least for the moment) many more storage vendors out there as well.

A decade or so ago, when I was with another storage company we created a proprietary interface for backup activity and it sold ok but in the end it didn’t sell enough to be worthwhile for either the backup or storage company to continue the approach.  At the time we were looking to support another proprietary interface for sorting but couldn’t seem to justify it.

Proprietary interfaces tend to lock customers in and most customers will only accept lockin if there is a significant advantage to your functionality.  But customer lock-in can lull vendors into not investing R&D funding in the latest technology and over time this affect will cause the vendor to lose any advantage they previously enjoyed.

It seems to me that the more successful companies (with the possible exception of Apple) tend to focus on opening up their interfaces rather than closing them down.  By doing so they introduce more competition which serves their customers better, in the long run.

I am not saying that if Oracle would standardize/publicize their database server to storage server interface that there would be a lot of storage vendors going after that market.  But the high revenues in this market, as evident from Teradata and Netezza, would certainly interest a few select storage vendors.  Now not all of Teradata’s or Netezza’s revenues derive from pure storage sales but I would wager a significant part do.

Nevertheless, a standard database storage protocol could readily be defined by existing database vendors in conjunction with SNIA.  Once defined, I believe some storage vendors would adopt this protocol along with every other storage protocol (iSCSI, FCoE, FC, FCIP, CIFS, NFS, etc.). Once that occurs, customers across the board would benefit from the increased competition and break away from the current customer lock-in with today’s database appliances.

Any significant success in the market from early storage vendor adopters of this protocol would certainly interest other  vendors inducing a cycle of increased adoption, higher competition, and better functionality.  In the end, database customers world wide will benefit from the increased price performance available in the open market.  And in the end that makes a lot more sense to me than the database appliances of today.

As to why Apple has excelled within a closed system environment, that will need to wait for a future post.