Latest Publications

To iPad or not to iPad?

iPad (from wikipedia.org)

iPad (from wikipedia.org)

I am going to a big conference next week, 2 full days out of the office. In times of yore, I would haul my trusty Macbook along and lugging it with me on both days as I move from pavilion to briefing hall, from lunch back to pavilion and from beer hall to bed.

A couple of months ago, I tried using an iPad for a different conference. I purchased an Apple Bluetooth (BT) keyboard and carried it with the iPad for most of the show.  With the BT keypad, power input was just as fast as on the laptop and even faster as I didn’t need to boot anything up.

The other nice thing about the BT keyboard with the iPad is you have fine cursor controls (arrow keys) which can be used to position input pointer.  I did find having to take my hand off the keyboard and touch the screen for some clicking action disconcerting and there were some iPad applications that didn’t handle the arrow keys appropriately but other than that, it worked great for power input, answering emails, and web searches.

The internal, soft iPad keyboard worked ok but wasn’t nearly as fast and didn’t support Dvorak.  Also the soft keyboard in portrait mode only provides 6 lines of pages text which makes power input with feedback more difficult.  In any case, I would use it to rip off quick emails, tweets, and other short stuff which worked well enough. I still took notes on paper (probably to old now to take notes on the iPad/laptop).  Having the keyboard available with a moments delay, made it easy to decide to take it out to use it when I had the time or leave it in the backpack when I didn’t.

Another positive note was that the iPad took up very little desk space.  Most briefing halls nowadays have these smallish retractable desk tops that can barely hold a legal pad let alone a laptop.  The iPad fit these postage stamp desktops just fine.

Not sure how to quantify the weight advantage of the iPad+BT Keyboard vs. Macbook without weighing them but it is significant.  Given all the junk I carry along with the laptop vs. the iPad+BT keyboard, the iPad/BT keyboard wins hands down.  It’s almost like I am not carrying a computer at all.

Problems with using the iPad

There are a couple of web applications (e.g., Wordress visual editor) that seem dependent on flash to work properly, which made using the iPad to create blog posts problematic.  Also, scrolling in WordPress post editor seems to be a flash application as well which made dealing with any long post edits problematic at best.  Wordpress has an iPhone/iPad application which is just as good as the non-visual editor in web-based WordPress which comes in handy at these times.

Now in all honesty, I haven’t tried these in a while and these may not be flash issues as much as iPad issues. Nonetheless, I will guarantee that you will run into some websites that you use in your daily activities that use flash and won’t work.  With the iPad you just will need to forego these websites and find alternatives.

In the office I am a heavy TweetDeck user.  For some reason this application doesn’t work that well for the iPad. I have the latest version and all but find using Twitterific or the official Twitter App a better solution on the iPad.

I purchased the WiFi version of the iPad and iPad’s do not come with Ethernet plug-ins.   Now most conference centers these days have WiFi, but it may not always work that well.  Also some hotels only have WiFi in certain locations and not in the hotel rooms.  All this makes having internet access somewhat sporadic. But you can always buy the 3G version if you want to and I always have my iphone for internet access in a pinch (assuming ATT has adequate conference center/hotel coverage).

I was told that the iPad power converter and connection would also charge up my 3G iPhone but this turned out not to work.  Luckily, I brought along the power converter for the 3G iPhone by mistake and the cable connection between the power converter and iPad worked just fine for the iPhone.  Also the cable from the power adaptor to iPad is somewhat short, so bring the extension cord in order to be able to work with the iPad while its charging.

I ended up purchasing the Apple case for the iPad. I wanted to be able to have it upright portrait or landscape while I was typing on the keyboard, have it slant upward while using the soft keypad and otherwise lie flat. The Apple iPad case does all this without problem.

Microsoft Office documents

Word documents get converted into Pages documents pretty easily but you lose all change tracking, some of the formatting, and other esoteric stuff.  It’s probably ok for internal documents but I find putting together a final document using Pages still a problem. But  I must say I am a novice here.  Also converting Pages documents back into Word seems easy enough.

I have spent even less time with Numbers and Keynote but they seem adequate for minor stuffconvert .XLS and .PPT files to Numbers and Keynote files (but not back to .XLS and .PPT) and if I used them more probably ok for much more sophisticated work.  There are other applications that seem to provide better iPhone support for Microsoft Office editing but I have yet to try them on either the iPad or iPhone.  Also, beware that converting Numbers documents to Excel and Keynote to PowerPoint require Mac desktop versions of these programs.

Document availability is somewhat problematic.  I met one person who emailed work documents to themselves to solve this problem.  Email works ok as long as they don’t scroll out of iPad (iPad keeps the latest 200 emails max for any account which includes spam).  For this purpose, I used a not-so-well-known email address and emailed my current work documents to that account.  iTunes supports a way to copy files to and from the Mac or iPad which seems painless enough but the email interface worked just as well for me and I didn’t have to synch up to have the files transferred.

Beware of changing headers and footers in Pages and trying to alter them in Word once you get it back to the office.  It never worked for me.  I had to copy the text of the document to another fresh Word file and work the header/footers in that.

iPad security

Mac based passwords, logins, and security characteristics are a bit difficult and time-consumming to transfer to the iPad.  You can manually load them in for any websites and applications you need but there is no way to transfer a whole keychain from Mac to iPad.  As such, if you neglect to transfer security credentials for an important website to iPad your out of luck.  Now there are some apps that profess to being able to transfer and maintain keychains on the iPhone or the iPad but I haven’t tried them yet.

Other iPad security aspects are even more problematic.  The iPad can be setup to require entry of a 4 numeric character string to access it.  Another setting will erase the contents of the iPad after 10 failed logins attempts. And MobileMe probably supports some way to erase an iPad that’s out of your hands (it does this for iPhones so I would think the same service would be available for the iPad but I haven’t looked into it).

But despite all that, I don’t feel the iPad is as secure as the Macbook. For one thing, I encrypt the data on the Macbook and the system password can be alphanumeric and considerably longer than 4 characters.  In any case the harddrive can be removed from the Macbook but without the passkey, the data on the drive would be useless.  In contrast the SSD-Flash memory on the iPad could be pulled out and analyzed without any trouble whatsoever and with proper understanding of IOS storage formatting be read in the clear.

Also the fact that its smaller and lighter it could easily be forgotten and left behind making it more lose-able.  And it’s certainly more prone to being stolen because it’s smaller and lighter.

—–

At this point I will probably  use the iPad for the upcoming VMworld conference just to see if it works as well the 2nd time as it did the first.  It’s only two full days, what can go wrong?

Enterprise data storage defined and why 3PAR?

More SNW hall servers and storage

More SNW hall servers and storage

Recent press reports about a bidding war for 3PAR bring into focus the expanding need for enterprise class data storage subsystems.  What exactly is enterprise storage?

Defining enterprise storage is frought with problems but I will take a shot.  Enterprise class data storage has:

  • Enhanced reliability, high availability and serviceability – meaning it hardly ever fails, it keeps operating (on redundant components) when it does fail, and repairing the storage when the rare failure occurs can be accomplished without disrupting ongoing storage services
  • Extreme data integrity – goes beyond just RAID storage, meaning that these systems lose data very infrequently, provide the latest data written to a location when read and will tell you when data cannot be accessed.
  • Automated I/O performance – meaning sophisticated caching algorithms that try to keep ahead of sequential I/O streams, buffer actively read data, and buffer write data in non-volatile cache before destaging to disk or other media.
  • Multiple types of storage – meaning the system supports SATA, SAS and/or FC disk drives and SSDs or Flash storage
  • PBs of storage – meaning behind one enterprise class storage (sub-)system one can support over 1PB of storage
  • Sophisticated functionality – meaning the system supports multiple forms of offsite replication, thin provisioning, storage tiering, point-in-time copies, data cloning, administration GUIs/CLIs, etc.
  • Compatibility with all enterprise O/Ss – meaning the storage has been tested and is on hardware compatibility lists for every major operating system in use by the enterprise today.

As for storage protocol, it seems best to leave this off the list.  I wanted to just add block storage, but enterprises today probably have as much if not more external file storage (CIFS or NFS) as they have block storage (FC or iSCSI).  And the proportion in file systems seems to be growing (see IDC report referenced below).

In addition, while I don’t like the non-determinism of iSCSI or file access protocols, this doesn’t seem to stop such storage from putting up pretty impressive performance numbers (see our performance dispatches).  Anything that can crack 100K I/O or file operations per second probably deserves to call themselves enterprise storage as long as they meet the other requirements.  So, maybe I should add high-performance storage to the list above.

Why the sudden interest in enterprise storage?

Enterprise storage has been around arguably since the 2nd half of last century (for mainframe systems) but lately has become even more interesting as applications deploy to the cloud and server virtualization (from VMware, Microsoft Hyper-V and others) takes over the data center.

Cloud storage and cloud computing services are lowering the entry points for storage and processing, enabling application deployments which were heretofore unaffordable.  These new cloud applications consume storage at increasing rates and don’t seem to be slowing down any time soon.  Arguably, some cloud storage is not enterprise storage but as service levels go up for these applications, providers must ultimately turn to enterprise storage.

In addition, server virtualization transforms the enterprise data center from a single application per server to easily 5 or more applications per physical server.  This trend is raising server utilization, driving more I/O, and requiring higher capacity.  Such “multi-application” storage almost always requires high availability, reliability and performance to work well, generating even more demand for enterprise data storage systems.

Despite all the demand, world wide external storage revenues dropped 12% last year according to IDC.  Now the economy had a lot to do with this decline but another factor reducing external storage revenue is the ongoing drop in the price of storage on a $/GB basis.  To this point, that same IDC report stated that external storage capacity increased 33% last year.

Why Dell & HP wants 3PAR storage?

Margins on enterprise storage are good, some would say very good.  While raw disk storage can be had at under $0.50/GB, enterprise class storage is often 10 or more times that price.  Now that has to cover redundant hardware, software/firmware engineering and other characteristics, but this still leaves pretty good margins.

In my mind, Dell would see enterprise storage as a natural extension of their current enterprise server business.  They already sell and support these customers, including enterprise class storage just adds another product to the mix.  Developing enterprise storage from scratch is probably a 4-7 year journey with the right people, buying 3PAR puts them in the market today with a competitive product.

HP is already in the enterprise storage market today, with their XP and EVA storage subsystems.  However, having their own 3PAR enterprise class storage may get them better margins than their current XP storage OEMed from HDS.  But I think Chuck Hollis’s post on HP’s counter bid for 3PAR may have revealed another side to this discussion – sometime M&A is as much about constraining your competition as it is about adding new capabilities to a company.

——

What do you think?

Cloud storage, CDP & deduplication

Strange Clouds by michaelroper (cc) (from Flickr)

Strange Clouds by michaelroper (cc) (from Flickr)

Somebody needs to create a system that encompasses continuous data protection, deduplication and cloud storage.  Many vendors have various parts of such a solution but none to my knowledge has put it all together.

Why CDP, deduplication and cloud storage?

We have written about cloud problems in the past (eventual data consistency and what’s holding back the cloud) despite all that, backup is a killer app for cloud storage.  Many of us would like to keep backup data around for a very long time. But storage costs govern how long data can be retained.  Cloud storage with its low cost/GB/month can help minimize such concerns.

We have also blogged about dedupe in the past (describing dedupe) and have written in industry press and our own StorInt dispatches on dedupe product introductions/enhancements.  Deduplication can reduce storage footprint and works especially well for backup which often saves the same data over and over again.  By combining deduplication with cloud storage we can reduce the data transfers and data stored on the cloud, minimizing costs even more.

CDP is more troublesome and yets still worthy of discussion.  Continuous data protection has always been sort of a step child in the backup business.  As a technologist, I understand it’s limitations (application consistency) and understand why it has been unable to take off effectively (false starts).   But, in theory at some point CDP will work, at some point CDP will use the cloud, at some point CDP will embrace deduplication and when that happens it could be the start of an ideal backup environment.

Deduplicating CDP using cloud storage

Let me describe the CDP-Cloud-Deduplication appliance that I envision.  Whether through O/S, Hypervisor or storage (sub-)system agents, the system traps all writes (forks the write) and sends the data and meta-data in real time to another appliance.  Once in the CDP appliance, the data can be deduplicated and any unique data plus meta data can be packaged up, buffered, and deposited in the cloud.  All this happens in an ongoing fashion throughout the day.

Sometime later, a restore is requested. The appliance looks up the appropriate mapping for the data being restored, issues requests to read the data from the cloud and reconstitutes (un-deduplicates) the data before copying it to the restoration location.

Problems?

The problems with this solution include:

  • Application consistency
  • Data backup timeframes
  • Appliance throughput
  • Cloud storage throughput

By tieing the appliance to a storage (sub-)system one may be able to get around some of these problems.

One could configure the appliance throughput to match the typical write workload of the storage.  This could provide an upper limit as to when the data is at least duplicated in the appliance but not necessarily backed up (pseudo backup timeframe).

As for throughput, if we could somehow understand the average write and deduplication rates we could configure the appliance and cloud storage pipes accordingly.  In this fashion, we could match appliance throughput to the deduplicated write workload (appliance and cloud storage throughput)

Application consistency is more substantial concern.  For example, copying every write to a file doesn’t mean one can recover the file.  The problem is at some point the file is actually closed and that’s the only time it is in an application consistent state.  Recovering to a point before or after this, leaves a partially updated, potentially corrupted file, of little use to anyone without major effort to transform it into a valid and consistent file image.

To provide application consistency, one needs to somehow understand when files are closed or applications quiesced.  Application consistency needs would argue for some sort of O/S or hypervisor agent rather than storage (sub-)system interface.  Such an approach could be more cognizant of file closure or application quiesce, allowing a synch point could be inserted in the meta-data stream for the captured data.

Most backup software has long mastered application consistency through the use of application and/or O/S APIs/other facilities to synchronize backups to when the application or user community is quiesced.  CDP must take advantage of the same facilities.

Seems simple enough, tie cloud storage behind a CDP appliance that supports deduplication.  Something like this could be packaged up in a cloud storage gateway or similar appliance.  Such a system could be an ideal application for cloud storage and would make backups transparent and very efficient.

What do you think?

Microsoft Exchange Performance, ESRP v3.0 results – chart of the month

(c) 2010 Silverton Consulting, Inc.

(c) 2010 Silverton Consulting, Inc.

There have been a number of Microsoft ESRP submissions this past quarter, especially in the over 5K mailbox category and they now total 12 submissions in this category alone.

The above chart is one or a series of charts from our recent StorInt(tm) dispatch on Exchange performance.   This chart displays an Exchange email counterpart to last month’s SpecSFS 2008 CIFS ORT chart only this time depicting the Top 10 Exchange database read, write and log latencies (sorted by read latency).

Except for the HP Smart Array (at #4) and Dell PowerVault MD1200 (#7), all the remaining submissions are FC attached subsystems.  The HP Smart Array and Dell exceptions used SAS attached storage.

For some reason the HP Smart Array had an almost immeasurable log write response time (<~0.1msec.) and a very respectable database read response time of 8.4msec.

As log writes are essentially sequential, we would expect a SAS/JBOD to do well here. But the random database reads and writes seem indicative of a well tuned, caching (sub-)system, not a JBOD!?

One secret to good Exchange 2010 JBOD performance appears to be matching your Exchange email database and log LUN size to disk drive size.  This seems to be a significant difference between Dell’s SAS storage and HP’s SAS storage.  For instance, both systems had 15Krpm SAS drives at ~600GB, but Dell’s LUN size was 13.4TB while HP’s database and log LUN size was 558GB.   Database and log LUN size relative to disk size didn’t seem to significantly impact Exchange performance for FC subsystems.

The other secret to good SAS Exchange 2010 performance is to stick with relatively small mailbox counts.  Both the HP and Dell JBODs had the smallest mailbox counts of this category at 6K and 7.2K respectively.

Exchange database write latency

There appears to be little correlation between read and write latencies in this data.  All of these results used Exchange database resiliency or DAGs, so they had similar types of database activity to contend with. Also the number of DAGs typically increased with higher mailbox counts but this wasn’t universal, e.g, the HDS AMS 2100 (#1) with 17.2K mailboxes had four DAGs while the last two IBM XIVs (#9&10) with 40K mailboxes had one each.  But the number of database activity groups shouldn’t matter much to Exchange database latencies.

On the other hand, the number of DAG copies may matter to Exchange write performance.  It is unclear how DAG copy writes are measured/simulated in Jetstress, the program used to drive ESRP workloads.   But, the number of database copies stood between two (#1,2,5,8&10) and three (#3,4,6,7&9) for all these submissions with no significant advantage for fewer copies.  So that’s not the answer.

I will make a stand here and say that high variability between read and write database latencies has something to do with storage (sub-)system caching effectiveness and Exchange 2010′s larger block sizes but it’s not clear from the available data.   However, this could easily be an artifact of the limited data available.

Why we like database access latency metrics

In our view, database read latencies correlates well with average Microsoft Exchange user experience for email read/search activities.  Also, log write and database write times can be good substitutes for Exchange Server email send times.  We like to think of database latencies as a end-user view of Exchange email performance.

The full ESRP v3.0 performance report will go up on SCI’s website next month in our dispatches directory.  However, if you are interested in receiving this sooner, just subscribe by email to our free newsletter and we will send you the current issue with download instructions for this and other reports.

Exchange 2010 is just a year old now and everyone is still trying to figure out how to perform well within the new architecture, so I expect some significant revisions to this chart over time.  Nonetheless, the current crop clearly indicates that there is a wide disparity in Exchange storage performance.

As always, we welcome any constructive comments on how to improve our analysis of ESRP results.

Micron’s new P300 SSD and SSD longevity

Micron P300 (c) 2010 Micron Technology

Micron P300 (c) 2010 Micron Technology

Micron just announced a new SSD drive based on their 34nm SLC NAND technology with some pretty impressive performance numbers.  They used an independent organization, Calypso SSD testing, to supply the performance numbers:

  • Random Read 44,000 IO/sec
  • Random Writes 16,000 IO/sec
  • Sequential Read 360MB/sec
  • Sequential Write 255MB/sec

Even more impressive considering this performance was generated using SATA 6Gb/s and measuring after reaching “SNIA test specification – steady state” (see my post on SNIA’s new SSD performance test specification).

The new SATA 6Gb/s interface is a bit of a gamble but one can always use an interposer to support FC or SAS interfaces.  In addition,today many storage subsystems already support SATA drives so its interface may not even be an issue.  The P300 can easily support 3Gb/s SATA if that’s whats available and sequential performance suffers but random IOPs won’t be too impacted by interface speed.

The advantages of SATA 6Gb/sec is that it’s a simple interface and it costs less to implement than SAS or FC.  The downside is the loss of performance until 6Gb/sec SATA takes over enterprise storage.

P300′s SSD longevity

I have done many posts discussing SSDs and their longevity or write endurance but this is the first time I have heard any vendor describe drive longevity using “total bytes written” to a drive. Presumably this is a new SSD write endurance standard coming out of JEDEC but I was unable to find any reference to the standard definition.

In any case, the P300 comes in 50GB, 100GB and 200GB capacities and the 200GB drive has a “total bytes written” to the drive capability of 3.5PB with the smaller versions having proportionally lower longevity specs. For the 200GB drive, it’s almost 5 years of 10 complete full drive writes a day, every day of the year.  This seems enough from my perspective to put any SSD longevity considerations to rest.  Although at 255MB/sec sequential writes, the P300 can actually sustain ~10X that rate per day – assuming you never read any data back??

I am sure over provisioning, wear leveling and other techniques were used to attain this longevity. Nonetheless, whatever they did, the SSD market could use more of it.  At this level of SSD longevity the P300 could almost be used in a backup dedupe appliance, if there was need for the performance.

You may recall that Micron and Intel have a joint venture to produce NAND chips.  But the joint venture doesn’t include applications of their NAND technology.  This is why Intel has their own SSD products and why Micron has started to introduce their own products as well.

—–

So which would you rather see for an SSD longevity specification:

  • Drive MTBF
  • Total bytes written to the drive,
  • Total number of Programl/Erase cycles, or
  • Total drive lifetime, based on some (undefined) predicted write rate per day?

Personally I like total bytes written because it defines the drive reliability in terms everyone can readily understand but what do you think?

Why cloud, why now?

Moore’s Law by Marcin Wichary (cc) (from Flickr)

Moore’s Law by Marcin Wichary (cc) (from Flickr)

I have been struggling for sometime now to understand why cloud computing and cloud storage have suddenly become so popular.  We have previously discussed some of cloud problems (here and here) but we have never touched on why cloud has become so popular.

In my view, SaaS or ASPs and MSPs have been around for a decade or more now and have been renamed cloud computing and storage but they have rapidly taken over the IT discussion.  Why now?

At first I thought this new popularity was due to the prevalence of higher bandwidth today. But later I determined that this was too simplistic.  Now I would say the reasons cloud services have become so popular, include

  • Bandwidth costs have decreased substantially
  • Hardware costs have decreased substantially
  • Software costs remain flat

Given the above one would think that non-cloud computing/storage would also be more popular today and you would be right.  But, there is something about the pricing reduction available from cloud services which substantially increases interest.

For example, at $10,000 per widget, a market size may be ok, at $100/widget the market becomes larger still, and at $1/widget the market can be huge.  This is what seems to have happened to Cloud services.  Pricing has gradually decreased, brought about through hardware and bandwidth cost reductions and has finally reached a point where the market has grown significantly.

Take email for example:

Now with Google or Exchange Online you have to supply internet access or the bandwidth required to access the email account.  For Exchange, you would also need to provide the internet access to get email in and out of your environment, servers and storage to run Exchange server, and would use internal LAN resources to distribute that email to internally attached clients.  I would venture to say the similar pricing differences applies to CRM, ERP, storage etc. which could be hosted in your data center or used as a cloud service.  Also, over the last decade these prices have been coming down for cloud services but have remained (relatively) flat for on premises services.

How does such pricing affect market size?

Well, when it costs ~$1034 (+ server costs + admin time) to field 5 Exchange email accounts vs.  $250 for 5 Gmail ($300 for 5 Exchange Online) accounts the assumption is that the market will increase, maybe not ~12X but certainly 3X or more.  At ~$3000 or more, I need a substantially larger justification to introduce enterprise email services but at $250,  justification becomes much simpler.

Moreover, the fact that the entry pricing is substantially smaller, i.e.,  $~2800 for one Exchange Standard Edition account vs $50 for one (Gmail) email account, justification becomes almost a non-issue and the market size grows geometrically.  In the past, pricing for such services may have prohibited small business use, but today cloud pricing makes them very affordable and as such, more widely adopted.

I suppose there is another inflection point at  $0.50/mail user that would increase market size even more.  However, at some point anybody in the world with internet access could afford enterprise email services and I don’t think the market could grow much larger.

So there you have it.  Why cloud, why now – the reasons are hardware and bandwidth pricing have come down giving rise to much more affordable cloud services opening up more market participants at the low end.  But it’s not just SMB customers that can now take advantage of these lower priced services, large companies can also now afford to implement applications which were too costly to introduce before.

Yes, cloud services can be slow and yes, cloud services can be insecure but, the price can’t be beat.

As to why software pricing has remained flat must remain a mystery for now but may be treated in some future post.

Any other thoughts as to why cloud’s popularity has increased so much?

Strategy is dead, again

American War Cemetary - Remembering by tienvijftien (cc) (from Flickr)

American War Cemetary - Remembering by tienvijftien (cc) (from Flickr)

Was talking with a friend of mine this week and he said that strategic planning has been deemphasized these last few years mainly due to the economic climate.  We have discussed this before (see Strategy, as we know it, is dead). Most companies are in a struggle to survive and had little time or resources to spend on thinking about their long term future, let alone next year.

Yes, the last few years have been tough on everyone but the lack of strategic planning is hard for me to accept.  As I look around, products are still being developed, functionality is being enhanced, technology continues to move forward.  All of these exemplify some strategic planning/thinking, albeit prior efforts from 24 to 30 months ago.

Nonetheless, development has not ceased in the interim. New features and products are still being planned for introduction over the next year or so.  Development pipelines seem as full as ever.

One could read the apparent dichotomy between deemphasizing strategic planning but continuing to roll out new products/enhancements as indicating that strategic planning has little impact on product development.  Another, more subtle interpretation is that strategic thinking goes beyond near term product improvements to something longer term, perhaps outside the 3 year window we see for current product enhancements.

In that case, then the evidence for reduced strategic planning will not show up until a couple more years have passed.  Thus, we should eventually see a slow down in new or revolutionary technology offerings.

Such a slowdown is hard to view.  Apple seems to introduce a revolutionary product every 4.5 yrs or so (iPod ’01, iPhone ’07, Ipad ’10).  Other companies probably have longer cycles.  But any evidence for a strategic planning reduction may ultimately show up as a slowdown in the rate of new/revolutionary product introductions.

Other potential indicators of decreased strategic planning include margin erosion, loss of core competencies, reduction in market value, etc.  Some of these are objective, some subjective but they all sound like a better topic for an MBA thesis than a blog post or at least my blog posts.

For example, Kodak over the last 15 years or so comes to mind as a strategic corporate catastrophe playing out.  They almost invented digital photography/imaging.  But for whatever reason they failed to react to this coming transition until it was too late.  The result is a much diminished company of today, e.g., over the last 15 years their stock price has been reduced by a factor of 12X or more.

There are probably many more examples of both business strategy failure and success but from my perspective the choices are obvious:  Ignore strategic planning for too long and your company struggles to survive or implement strategic planning today and your company may thrive.

What other examples of strategic failure and successes can you think of?

Primary storage compression can work

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

Since IBM’s announced their intent to purchase StorWize there has been much discussion on whether primary storage data compression can be made to work.  As far as I know StorWize only offered primary storage compression for file data but there is nothing that prohibits doing something similar for block storage as long as you have some control over how blocks are laid down on disk.

Although secondary block  data compression has been around for years in enterprise tape and more recently with some deduplication appliances, primary storage compression pre-dates secondary storage compression.  STK delivered primary storage data compression with Iceberg in the early 90′s but it wasn’t until a couple of years later that they introduced compression on tape.

In both primary and secondary storage, data compression works to reduce the space needed to store data.  Of course, not all data compresses well, most notably image data (as it’s already compressed) but compression ratios of 2:1 were common for primary storage of that time and are normal for today’s secondary storage.  I see no reason why such ratios couldn’t be achieved for current primary storage block data.

Implementing primary block storage data compression

There is significant interest in implementing deduplication for primary storage as NetApp has done but supporting data compression is not much harder.  I believe much of the effort to deduplicate primary storage lies in creating a method to address partial blocks out of order, which I would call data block virtual addressing which requires some sort of storage pool.  The remaining effort to deduplicate data involves implementing the chosen (dedupe) algorithm, indexing/hashing, and other administrative activities.  These later activities aren’t readily transferable to data compression but the virtual addressing and space pooling should be usable by data compression.

Furthermore, block storage thin provisioning requires some sort of virtual addressing as does automated storage tiering.  So in my view, once you have implemented some of these advanced capabilities, implementing data compression is not that big a deal.

The one question that remains is does one implement compression with hardware or software (see Better storage through hardware for more). Considering that most deduplication is done via software today it seems that data compression in software should be doable.  The compression phase could run in the background sometime after the data has been stored.  Real time decompression using software might take some work, but would cost considerably less than any hardware solution.  Although the intensive bit fiddling required to perform data compression/decompression may argue for some sort of hardware assist.

Data compression complements deduplication

The problem with deduplication is that it needs duplicate data.  This is why it works so well for secondary storage (backing up the same data over and over) and for VDI/VMware primary storage (with duplicated O/S data).

But data compression is an orthogonal or complementary technique which uses the inherent redundancy in information to reduce storage requirements.  For instance, something like LZ compression takes advantage of the fact that in text some letters occur more often than others (see letter frequency). For instance, in English, ‘e’, ‘t’, ‘a’, ‘o’, ‘i’, and ‘n, represent over 50% of the characters in most text documents.  By using shorter bit combinations to encode these letters one can reduce the bit-length of any (English) text string substantially.  Another example is run length encoding which takes any repeated character and substitutes a trigger character, the character itself, and a count of the number of repetitions for the repeated string.

Moreover, the nice thing about data compression is that all these techniques can be readily combined to generate even better compression rates.  And of course compression could be applied after deduplication to reduce storage footprint even more.

Why would any vendor compress data?

For a couple of reasons:

  • Compression not only reduces storage footprint but with hardware assist it can also increase storage throughput. For example, if 10GB of data compresses down to 5GB, it should take ~1/2 the time to read.
  • Compression reduces the time it would take time to clone, mirror or replicate.
  • Compression increases the amount of data that could be stored which should incentivise them to pay more for your storage.

In contrast, with data compression vendors might may sell less storage.  But the advantages of enterprise storage is in the advanced functionality/features and higher reliability/availability/performance that are available.  I see data compression as just another advantages to enterprise class storage and as a feature, the user could enable or disable it and see how well it works for there data.

What do you think?