One platform to rule them all – Compellent&EqualLogic&Exanet from Dell

Compellent drive enclosure (c) 2010 Compellent (from
Compellent drive enclosure (c) 2010 Compellent (from

Dell and Compellent may be a great match because Compellent uses commodity hardware combined with specialized software to create their storage subsystem. If there’s any company out there that can take advantage of commodity hardware it’s probably Dell. (Of course Commodity hardware always loses in the end, but that’s another story).

Similarly, Dell’s EqualLogic iSCSI storage system uses commodity hardware to provide its iSCSI storage services.  It doesn’t take a big leap of imagination to have one storage system that combines the functionality of EqualLogic’s iSCSI and Compellent’s FC storage capabilities.  Of course there are others already doing this including Compellent themselves which have their own iSCSI support already built into their FC storage system.

Which way to integrate?

Does EqualLogic survive such a merger?  I think so.  It’s easy to imagine that Equal Logic may have the bigger market share today. If that’s so, the right thing might be  to merge Compellent FC functionality into EqualLogic.  If Compellent has the larger market, the correct approach is the opposite. The answer lies probably with a little of both.  It seems easiest to add iSCSI functionality to a FC storage system than the converse but the FC to iSCSI approach may be the optimum path for Dell, because of the popularity of their EqualLogic storage.

What about NAS?

The only thing missing from this storage system is NAS.  Of course the Compellent storage offers a NAS option through the use of a separate Windows Storage Server (WSS) front end.  Dell’s EqualLogic does the much the same to offer NAS protocols for their iSCSI system.  Neither of these are bad solutions but they are not a fully integrated NAS offering such as available from NetApp and others.

However, there is a little discussed part, the Dell-Exanet acquisition which happened earlier this year. Perhaps the right approach is to integrate Exanet with Compellent first and target this at the high end enterprise/HPC market place, keeping Equal Logic at the SMB end of the marketplace.  It’s been a while since I have heard about Exanet, and nothing since the acquisition earlier this year.  Does it make sense to backend a clustered NAS solution with FC storage – probably.


Much of this seems doable to me, but it all depends on taking the right moves once the purchase is closed.   But if I look at where Dell is weakest (baring their OEM agreement with EMC), it’s in the highend storage space.  Compellent probably didn’t have much of a foot print there as possible due to their limited distribution and support channel.  A Dell acquisition could easily eliminate these problems and open up this space without having to do much other than start to marketing, selling and supporting Compellent.

In the end, a storage solution supporting clustered NAS, FC, and iSCSI that combined functionality equivalent to Exanet, Compellent and EqualLogic based on commodity hardware (ouch!) could make a formidable competitor to what’s out there today if done properly. Whether Dell could actually pull this off and in a timely manner even if they purchase Compellent, is another question.


Storage throughput vs. IO response time and why it matters

Fighter Jets at CNE by lifecreation (cc) (from Flickr)
Fighter Jets at CNE by lifecreation (cc) (from Flickr)

Lost in much of the discussions on storage system performance is the need for both throughput and response time measurements.

  • By IO throughput I generally mean data transfer speed in megabytes per second (MB/s or MBPS), however another definition of throughput is IO operations per second (IO/s or IOPS).  I prefer the MB/s designation for storage system throughput because it’s very complementary with respect to response time whereas IO/s can often be confounded with response time.  Nevertheless, both metrics qualify as storage system throughput.
  • By IO response time I mean the time it takes a storage system to perform an IO operation from start to finish, usually measured in milleseconds although lately some subsystems have dropped below the 1msec. threshold.  (See my last year’s post on SPC LRT results for information on some top response time results).

Benchmark measurements of response time and throughput

Both Standard Performance Evaluation Corporation’s SPECsfs2008 and Storage Performance Council’s SPC-1 provide response time measurements although they measure substantially different quantities.  The problem with SPECsfs2008’s measurement of ORT (overall response time) is that it’s calculated as a mean across the whole benchmark run rather than a strict measurement of least response time at low file request rates.  I believe any response time metric should measure the minimum response time achievable from a storage system although I can understand SPECsfs2008’s point of view.

On the other hand SPC-1 measurement of LRT (least response time) is just what I would like to see in a response time measurement.  SPC-1 provides the time it takes to complete an IO operation at very low request rates.

In regards to throughput, once again SPECsfs2008’s measurement of throughput leaves something to be desired as it’s strictly a measurement of NFS or CIFS operations per second.  Of course this includes a number (>40%) of non-data transfer requests as well as data transfers, so confounds any measurement of how much data can be transferred per second.  But, from their perspective a file system needs to do more than just read and write data which is why they mix these other requests in with their measurement of NAS throughput.

Storage Performance Council’s SPC-1 reports throughput results as IOPS and provide no direct measure of MB/s unless one looks to their SPC-2 benchmark results.  SPC-2 reports on a direct measure of MBPS which is an average of three different data intensive workloads including large file access, video-on-demand and a large database query workload.

Why response time and throughput matter

Historically, we used to say that OLTP (online transaction processing) activity performance was entirely dependent on response time – the better storage system response time, the better your OLTP systems performed.  Nowadays it’s a bit more complex, as some of todays database queries can depend as much on sequential database transfers (or throughput) as on individual IO response time.  Nonetheless, I feel that there is still a large component of response time critical workloads out there that perform much better with shorter response times.

On the other hand, high throughput has its growing gaggle of adherents as well.  When it comes to high sequential data transfer workloads such as data warehouse queries, video or audio editing/download or large file data transfers, throughput as measured by MB/s reigns supreme – higher MB/s can lead to much faster workloads.

The only question that remains is who needs higher throughput as measured by IO/s rather than MB/s.  I would contend that mixed workloads which contain components of random as well as sequential IOs and typically smaller data transfers can benefit from high IO/s storage systems.  The only confounding matter is that these workloads obviously benefit from better response times as well.   That’s why throughput as measured by IO/s is a much more difficult number to understand than any pure MB/s numbers.


Now there is a contingent of performance gurus today that believe that IO response times no longer matter.  In fact if one looks at SPC-1 results, it takes some effort to find its LRT measurement.  It’s not included in the summary report.

Also, in the post mentioned above there appears to be a definite bifurcation of storage subsystems with respect to response time, i.e., some subsystems are focused on response time while others are not.  I would have liked to see some more of the top enterprise storage subsystems represented in the top LRT subsystems but alas, they are missing.

1954 French Grand Prix - Those Were The Days by Nigel Smuckatelli (cc) (from Flickr)
1954 French Grand Prix - Those Were The Days by Nigel Smuckatelli (cc) (from Flickr)

Call me old fashioned but I feel that response time represents a very important and orthogonal performance measure with respect to throughput of any storage subsystem and as such, should be much more widely disseminated than it is today.

For example, there is a substantive difference a fighter jet’s or race car’s top speed vs. their maneuverability.  I would compare top speed to storage throughput and its maneuverability to IO response time.  Perhaps this doesn’t matter as much for a jet liner or family car but it can matter a lot in the right domain.

Now do you want your storage subsystem to be a jet fighter or a jet liner – you decide.

The future of data storage is MRAM

Core Memory by teclasorg
Core Memory by teclasorg

We have been discussing NAND technology for quite awhile now but this month I ran across an article in IEEE Spectrum titled “a SPIN to REMEMBER – Spintronic memories to revolutionize data storage“. The article discussed a form of magneto-resistive random access memory or MRAM that uses quantum mechanical spin effects or spintronics to record data. We have talked about MRAM technology before and progress has been made since then.

Many in the industry will recall that current GMR (Giant Magneto-resistance) heads and TMR (Tunnel magneto-resistance) next generation disk read heads already make use of spintronics to detect magnetized bit values in disk media. GMR heads detect bit values on media by changing its electrical resistance.

Spintronics however can also be used to record data as well as read it. These capabilities are being exploited in MRAM technology which uses a ferro-magnetic material to record data in magnetic spin alignment – spin UP, means 0; spin down, means 1 (or vice versa).

The technologists claim that when MRAM reaches its full potential it could conceivably replace DRAM, SRAM, NAND, and hard disk drives or all current electrical and magnetic data storage. Some of MRAM’s advantages include unlimited write passes, fast reads and writes and data non-volatilility.

MRAM reminds me of old fashioned magnetic core memory (in photo above) which used magnetic polarity to record non-volatile data bits. Core was a memory mainstay in the early years of computing before the advent of semi-conductor devices like DRAM.

Back to future – MRAM

However, the problems with MRAM today are that it is low-density, takes lots of power and is very expensive. But technologists are working on all these problems with the view that the future of data storage will be MRAM. In fact, researchers at the North Carolina State University (NCSU) Electrical Engineering department have been having some success with reducing power requirements and increasing density.

As for data density NCSU researchers now believe they can record data in cells approximating 20 nm across, better than current bit patterned media which is the next generation disk recording media. However reading data out of such a small cell will prove to be difficult and may require a separate read head on top of each cell. The fact that all of this is created with normal silicon fabrication methods make doing so at least feasible but the added chip costs may be hard to justify.

Regarding high power, their most recent design records data by electronically controlling the magnetism of a cell. They are using dilute magnetic semiconductor material doped with gallium maganese which can hold spin value alignment (see the article for more information). They are also using a semiconductor p-n junction on top of the MRAM cell. Apparently at the p-n junction they can control the magnetization of the MRAM cells by applying -5 volts or removing this. Today the magnetization is temporary but they are also working on solutions for this as well.

NCSU researchers would be the first to admit that none of this is ready for prime time and they have yet to demonstrate in the lab a MRAM memory device with 20nm cells, but the feeling is it’s all just a matter of time and lot’s of research.

Fortunately, NCSU has lots of help. It seems Freescale, Honeywell, IBM, Toshiba and Micron are also looking into MRAM technology and its applications.


Let’s see, using electron spin alignment in a magnetic medium to record data bits, needs a read head to read out the spin values – couldn’t something like this be used in some sort of next generation disk drive that uses the ferromagnetic material as a recording medium. Hey, aren’t disks already using a ferromagnetic material for recording media? Could MRAM be fabricated/layed down as a form of magnetic disk media?? Maybe there’s life in disks yet….

What do you think?

What’s wrong with the iPad?

Apple iPad (wi-fi) (from
Apple iPad (wi-fi) (from

We have been using the wi-fi iPad for just under 6 months now and I have a few suggestions to make it even easier to use.


Aside from the problem with lack of Flash support there are a few things that would make websurfing easier on the iPad:

  • Tabbed windows option – I use tabbed windows on my desktop/laptop all the time but for some reason on the iPad Apple chose to use a grid of distinct windows accessible via a Safari special purpose icon.  While this approach probably makes a lot of sense for the iPhone/iPod, there is little reason to only do this on the iPad.  There is ample screen real-estate to show tabs selectable with the touch of a finger.  As it is now, it takes two touches to select an alternate screen for web browsing, not to mention some time to paint the thumbnail screen when you have multiple web pages open.
  • Non-mobile mode – It seems that many websites nowadays detect whether one is accessing a web page from a mobile device or not and as such, shrink their text/window displays to accommodate their much smaller display screen.  With the iPad this shows up as a wasted screen space and takes more than necessary screen paging to get to data that retrievable on a single screen with a desktop/laptop.  Not sure whether the problem is in the web server or the iPad’s signaling what device it is, however it seems to me that if the iPad/Safari app could signal to web servers that it is a laptop/small-desktop, web browsing could be better.

Other Apps

There are a number of Apps freely available on the iPhone/iPod that are not available on the iPad without purchase.  For some reason, I find I can’t live without some of these:

  • Clock app – On the iPhone/iPod I use the clock app at least 3 times a day.  I time my kids use of video games, my own time to having to do something, how much time I am willing/able to spend on a task, and myriad other things.  It’s one reason why I keep the iPhone on my body or close by whenever I am at home.  I occasionally use the clock app as a stop watch and a world clock but what I really need on the iPad is a timer of some sort.  I really have been unable to find an equivalent app for the iPad that matches the functionality of the iPhone/iPod Clock app.
  • Calculator app – On the iPhone/iPod I use the calculator sporadically, mostly when I am away from my desktop/office (probably because I have a calculator on my desk).  However, I don’t have other calculators that are easily accessible throughout my household and having one on the iPad would just make my life easier.  BTW, I ended up purchasing a calculator app that Apple says is equal to the iPhone Calc App which works fine but it should have come free.
  • Weather app – This is probably the next most popular app on my iPhone.  I know this information is completely available on the web, but by the time I have to enter the url/scan my bookmarks it takes at least 3-4 touches to get the current weather forecast.  By having the Weather app available on the iPhone it takes just one touch to get this same information.  I believe there is some way to transform a web page into an app icon on the iPad but this is not the same.

IOS software tweaks

There are some things I think could make IOS much better from my standpoint and I assume all the stuff in IOS 4.2 will be coming shortly so I won’t belabor those items:

  • File access – This is probably heresy but, I would really like a way to be able to cross application boundaries to access all files on the iPad.  That is, have something besides Mail, iBook and Pages be able to access PDF file, and Mail, Photo, and Pages/Keynote be able to access photos. Specifically, some of the FTP upload utilities should be able to access any file on the iPad.  Not sure where this belongs but there should be some sort of data viewer at the IOS level that can allow access to any file on the iPad.
  • Dvorak soft keypad – Ok, maybe I am a bit weird, but I spent the time and effort to learn the Dvorak keyboard layout to be able to type faster and would like to see this same option available for the iPad soft keypad.  I currently use Dvorak with the iPad’s external BT keyboard hardware but I see no reason that it couldn’t work for the soft keypad as well.
  • Widgets – The weather app discussed above looks to me like the weather widget on my desktop iMac.  It’s unclear why IOS couldn’t also support other widgets so that the app developers/users could easily create use their desktop widgets on the iPad.

iPad hardware changes

There are some things that scream out to me for hardware changes.

  • Ethernet access – I have been burned before and wish not to be burned again but some sort of adaptor that would allow an Ethernet plug connection would make the tethered iPad a much more complete computing platform.  I don’t care if such a thing comes as a BlueTooth converter or has to use the same plug as the power adaptor but having this would just make accessing the internet (under some circumstances) that much easier.
  • USB access – This just opens up another whole dimension to storage access and information/data portability that is sorely missing from the iPad.  It would probably need some sort of “file access” viewer discussed above but it would make the iPad much more extensible as a computing platform.
  • Front facing camera – I am not an avid user of FaceTime (yet) but if I were, I would really need a front camera on the iPad.  Such a camera would also provide some sort of snapshot capability with the iPad (although a rear facing camera would make more sense for this).  In any event, a camera is a very useful device to record whiteboard notes, scan paper documents, and record other items of the moment and even a front-facing one could do this effectively.
  • Solar panels – Probably off the wall, but having to lug a power adaptor everywhere I go with the iPad is just another thing to misplace/loose.  Of course, when traveling to other countries, one also needs a plug adaptor for each country as well.  It seems to me having some sort of solar panel on the back or front could provide adequate power to charge the iPad would be that much simpler.


Well that’s about it for now.   We are planning on taking a vacation soon and we will be taking both a laptop and the iPad (because we can no longer live without it).  I would rather just leave the laptop home but can’t really do that given my problems in the past with the iPad.  Some changes described above could make hauling the laptop on vacation a much harder decision.

As for how the iPad fares on the beach, I will have to let you know…

Commodity hardware always loses

Herman Miller's Embody Chair by johncantrell (cc) (from Flickr)
A recent post by Stephen Foskett has revisted a blog discussion that Chuck Hollis and I had on commodity vs. special purpose hardware.  It’s clear to me that commodity hardware is a losing proposition for the storage industry and for storage users as a whole.  Not sure why everybody else disagrees with me about this.

It’s all about delivering value to the end user.  If one can deliver equivalent value with commodity hardware than possible with special purpose hardware then obviously commodity hardware wins – no question about it.

But, and it’s a big BUT, when some company invests in special purpose hardware, they have an opportunity to deliver better value to their customers.  Yes it’s going to be more expensive on a per unit basis but that doesn’t mean it can’t deliver commensurate benefits to offset that cost disadvantage.

Supercar Run 23 by VOD Cars (cc) (from Flickr)
Supercar Run 23 by VOD Cars (cc) (from Flickr)

Look around, one sees special purpose hardware everywhere. For example, just checkout Apple’s iPad, iPhone, and iPod just to name a few.  None of these would be possible without special, non-commodity hardware.  Yes, if one disassembles these products, you may find some commodity chips, but I venture, the majority of the componentry is special purpose, one-off designs that aren’t readily purchase-able from any chip vendor.  And the benefits it brings, aside from the coolness factor, is significant miniaturization with advanced functionality.  The popularity of these products proves my point entirely – value sells and special purpose hardware adds significant value.

One may argue that the storage industry doesn’t need such radical miniaturization.  I disagree of course, but even so, there are other more pressing concerns worthy of hardware specialization, such as reduced power and cooling, increased data density and higher IO performance, to name just a few.   Can some of this be delivered with SBB and other mass-produced hardware designs, perhaps.  But I believe that with judicious selection of special purposed hardware, the storage value delivered along these dimensions can be 10 times more than what can be done with commodity hardware.

Cuba Gallery: France / Paris / Louvre / architecture / people / buildings / design / style / photography by Cuba Gallery (cc) (from Flickr)
Cuba Gallery: France / Paris / Louvre / ... by Cuba Gallery (cc) (from Flickr)

Special purpose HW cost and development disadvantages denied

The other advantage to commodity hardware is the belief that it’s just easier to develop and deliver functionality in software than hardware.  (I disagree, software functionality can be much harder to deliver than hardware functionality, maybe a subject for a different post).  But hardware development is becoming more software like every day.  Most hardware engineers do as much coding as any software engineer I know and then some.

Then there’s the cost of special purpose hardware but ASIC manufacturing is getting more commodity like every day.   Several hardware design shops exist that sell off the shelf processor and other logic one can readily incorporate into an ASIC and Fabs can be found that will manufacture any ASIC design at a moderate price with reasonable volumes.  And, if one doesn’t need the cost advantage of ASICs, use FPGAs and CPLDs to develop special purpose hardware with programmable logic.  This will cut engineering and development lead-times considerably but will cost commensurably more than ASICs.

Do we ever  stop innovating?

Probably the hardest argument to counteract is that over time, commodity hardware becomes more proficient at providing the same value as special purpose hardware.  Although this may be true, products don’t have to stand still.  One can continue to innovate and always increase the market delivered value for any product.

If there comes a time when further product innovation is not valued by the market than and only then, does commodity hardware win.  However, chairs, cars, and buildings have all been around for many years, decades, even centuries now and innovation continues to deliver added value.  I can’t see where the data storage business will be any different a century or two from now…

Cloud storage replication does not suffice for backups – revisited

Free Whipped Cream Clouds on True Blue Sky Creative Commons by Pink Sherbet Photography (cc) (from Flickr)
Free Whipped Cream Clouds on True Blue Sky Creative Commons by Pink Sherbet Photography (cc) (from Flickr)

I was talking with another cloud storage gateway provider today and I asked them if they do any sort of backup for data sent to the cloud.  His answer disturbed me – they said they depend on backend cloud storage providers replication services to provide data protection – sigh. Curtis and I have written about this before (see my Does Cloud Storage need Backup? post and Replication is not backup by W. Curtis Preston).

Cloud replication is not backup

Cloud replication is not data protection for anything but hardware failures!   Much more common than hardware failures is mistakes by end-users who inadvertently delete files, overwrite files, corrupt files, or systems that corrupt files any of which would just be replicated in error throughout the cloud storage multi-verse.  (In fact, cloud storage itself can lead to corruption see Eventual data consistency and cloud storage).

Replication does a nice job of covering a data center or hardware failure which leaves data at one site inaccessible but allows access to a replica of the data from another site.  As far as I am concerned there’s nothing better than replication for these sorts of DR purposes but it does nothing for someone deleting the wrong file. (I one time did a “rm * *” command on a shared Unix directory – it wasn’t pretty).

Some cloud storage (backend) vendors delay the deletion of blobs/containers until sometime later  as one solution to this problem.  By doing this, the data “stays around” for “sometime” after being deleted and can be restored via special request to the cloud storage vendor. The only problem with this is that “sometime” is an ill-defined, nebulous concept which is not guaranteed/specified in any way.  Also, depending on the “fullness” of the cloud storage, this time frame may be much shorter or longer.  End-user data protection cannot depend on such a wishy-washy arrangement.

Other solutions to data protection for cloud storage

One way is to have a local backup of any data located in cloud storage.  But this kind of defeats the purpose of cloud storage and has the cloud data being stored both locally (as backups) and remotely.  I suppose the backup data could be sent to another cloud storage provider but someone/somewhere would need to support some sort of versioning to be able to keep multiple iterations of the data around, e.g., 90 days worth of backups.  Sounds like a backup package front-ending cloud storage to me…

Another approach is to have the gateway provider supply some sort of backup internally using the very same cloud storage to hold various versions of data.  As long as the user can specify how many days or versions of backups can be held this works great, as cloud replication supports availability in the face of hardware failures and multiple versions support availability in the face of finger checks/logical corruptions.

This problem can be solved in many ways, but just using cloud replication is not one of them.

Listen up folks, whenever you think about putting data in the cloud, you need to ask about backups among other things.  If they say we only offer data replication provided by the cloud storage backend – go somewhere else. Trust me, there are solutions out there that really backup cloud data.

Latest SPECsfs2008 results: NFSops vs. #disks – chart of the month

(c) 2010 Silverton Consulting, All Rights Reserved
(c) 2010 Silverton Consulting, All Rights Reserved

Above one can see a chart from our September SPECsfs2008 Performance Dispatch displaying the scatter plot of NFS Throughput Operations/Second vs. number of disk drives in the solution.  Over the last month or so there has been a lot of Twitter traffic on the theory that benchmark results such as this and Storage Performance Council‘s SPC-1&2 are mostly a measure of the number of disk drives in a system under test and have little relation to the actual effectiveness of a system.  I disagree.

As proof of my disagreement I offer the above chart.  On the chart we have drawn a linear regression line (supplied by Microsoft Excel) and displayed the resultant regression equation.  A couple of items to note on the chart:

  1. Regression Coefficient – Even though there are only 37 submissions which span anywhere from 1K to over 330K NFS throughput operations a second, we do not have a perfect correlation (R**2=~0.8 not 1.0) between #disks and NFS ops.
  2. Superior systems exist – Any of the storage systems above the linear regression line have superior effectiveness or utilization of their disk resources than systems below the line.

As one example, take a look at the two circled points on the chart.

  • The one above the line is from Avere Systems and is a 6-FXT 2500 node tiered NAS storage system which has internal disk cache (8-450GB SAS disks per node) and an external mass storage NFS server (24-1TB SATA disks) for data with each node having a system disk as well, totaling 79 disk drives in the solution.  The Avere system was able to attain ~131.5K NFS throughput ops/sec on SPECsfs2008.
  • The one below the line is from Exanet Ltd., (recently purchased by Dell) and is an 8-ExaStore node clusterd NAS system which has attached storage (576-146GB SAS disks) as well as mirrored boot disks (16-73GB disks) totaling 592 disks drives in the solution.  They were only able to attain ~119.6K NFS throughput ops/sec on the benchmark.

Now the two systems respective architectures were significantly different but if we just count the data drives alone, Avere Systems (with 72 data disks) was able to attain 1.8K NFS throughput ops per second per data disk spindle and Exanet (with 576 data disks) was able to attain only 0.2K NFS throughput ops per second per data disk spindle.  A 9X difference in per drive performance for the same benchmark.

As far as I am concerned this definitively disproves the contention that benchmark results are dictated by the number of disk drives in the solution.  Similar comparisons can be seen looking horizontally at any points with equivalent NFS throughput levels.

Rays reading: NAS system performance is driven by a number of factors and the number of disk drives is not the lone determinant of benchmark results.  Indeed, one can easily see differences in performance of almost 10X on a throughput ops per second per disk spindle for NFS storage without looking very hard.

We would contend that similar results can be seen for block and CIFS storage benchmarks as well which we will cover in future posts.

The full SPECsfs2008 performance report will go up on SCI’s website next month in our dispatches directory.  However, if you are interested in receiving this sooner, just subscribe by email to our free newsletter and we will send you the current issue with download instructions for this and other reports.

As always, we welcome any suggestions on how to improve our analysis of SPECsfs2008 performance information so please comment here or drop us a line.

Poor deduplication with Oracle RMAN compressed backups

Oracle offices by Steve Parker (cc) (from Flickr)
Oracle offices by Steve Parker (cc) (from Flickr)

I was talking with one large enterprise customer today and he was lamenting how poorly Oracle RMAN compressed backupsets dedupe. Apparently, non-compressed RMAN backup sets generate anywhere from 20 to 40:1 deduplication ratios but when they use RMAN backupset compression, their deduplication ratios drop down to 2:1.  Given that RMAN compression probably only adds another 2:1 compression ratio then the overall data reduction becomes something ~4:1.

RMAN compression

It turns out Oracle RMAN supports two different compression algorithms that can be used zlib (or gzip) and bzip2.  I assume the default is zlib and if you want to one can specify bzip2 for even higher compression rates with the commensurate slower or more processor intensive compression activity.

  • Zlib is pretty standard repeating strings elimination followed by Huffman coding which uses shorter bit strings to represent more frequent characters and longer bit strings to represent less frequent characters.
  • Bzip2 also uses Huffman coding but only after a number of other transforms such as run length encoding (changing duplicated characters to a count:character sequence), Burrows–Wheeler transform (changes data stream so that repeating characters come together), move-to-front transform (changes data stream so that all repeating character strings are moved to the front), another run length encoding step, huffman encoding, followed by another couple of steps to decrease the data length even more…

The net of all this is that a block of data that is bzip2 encoded may look significantly different if even one character is changed.  Similarly, even zlib compressed data will look different with a single character insertion, but perhaps not as much.  This will depend on the character and where it’s inserted but even if the new character doesn’t change the huffman encoding tree, adding a few bits to a data stream will necessarily alter its byte groupings significantly downstream from that insertion. (See huffman coding to learn more).

Deduplicating RMAN compressed backupsets

Sub-block level deduplication often depends on seeing the same sequence of data that may be skewed or shifted by one to N bytes between two data blocks.  But as discussed above, with bzip2 or zlib (or any huffman encoded) compression algorithm the sequence of bytes looks distinctly different downstream from any character insertion.

One way to obtain decent deduplication rates from RMAN compressed backupsets would be to decompress the data at the dedupe appliance and then run the deduplication algorithm on it – dedupe appliance ingestion rates would suffer accordingly.  Another approach is to not use RMAN compressed backupsets but the advantages of compression are very appealing such as less network bandwidth, faster backups (because they are not transferring as much data), and quicker restores.


On the other hand, what might work is some form of Data Domain OST/Boost like support from Oracle RMAN which would partially deduplicate the data at the RMAN server and then send the deduplicated stream to the dedupe appliance.  This would provide less network bandwidth and faster backups but may not do anything for restores.  Perhaps a tradeoff worth investigating.

As for the likelihood that Oracle would make such services available to deduplicatione vendors, I would have said this was unlikely but ultimately the customers have a say here.   It’s unclear why Symantec created OST but it turned out to be a money maker for them and something similar could be supported by Oracle.  Once an Oracle RMAN OST-like capability was in place, it shouldn’t take much to provide Boost functionality on top of it.  (Although EMC Data Domain is the only dedupe vendor that has Boost yet for OST or their own Networker Boost version.)


When I first started this post I thought that if the dedupe vendors just understood the format of the RMAN compressed backupsets they would be able to have the same dedupe ratios as seen for normal RMAN backupsets.  As I investigated the compression algorithms being used I became convinced that it’s a computationally “hard” problem to extract duplicate data from RMAN compressed backupsets and ultimately would probably not be worth it.

So, if you use RMAN backupset compression, probably ought to avoid deduplicating this data for now.

Anything I missed here?