Hitachi’s VSP vs. VMAX

Today’s announcement of Hitachi’s VSP brings another round to the competition between EMC and Hitachi/HDS in the enterprise. VSP’s recent introduction which is GA and orderable today, takes the rivalry to a whole new level.

I was on SiliconANGLEs live TV feed earlier today discussing the merits of the two architectures with David Floyer and Dave Vellante from Wikibon. In essence, there seems to be a religious war going on between the two.

Examining VMAX, it’s obviously built around a concept of standalone nodes which all have cache, frontend, backend and processing components built in. Scaling the VMAX, aside from storage and perhaps cache, involves adding more VMAX nodes to the system. VMAX nodes talk to one another via an external switching fabric (RapidIO currently). The hardware although sophisticated packaging, IO connection technology and other internal stuff looks very much like a 2U server one could purchase from any number of vendors.

On the other hand, Hitachi’s VSP is a special built storage engine (or storage computer as Hu Yoshida says). While the architecture is not a radical revision of USP-V, it’s a major upleveling of all component technology from the 5th generation cross bar switch, the new ASIC driven Front-end and Back-end directors, the shared control L2 cache memory and the use of quad core Xenon Intel processors. Much of this hardware is unique, sophistication abounds and looks very much like a blade system for the storage controller community.

The VSP and VMAX comparison is sort of like a open source vs. closed source discussion. VMAX plays the role of open source champion that largely depends on commodity hardware, sophisticated packaging but with minimal ASICs technology. As evidence of the commodity hardware VPLEX EMC’s storage virtualization engine reportedly runs on VMAX hardware. Commodity hardware lets EMC ride the technology curve as it advances for other applications.

Hitachi VSP plays the role of closed source champion. Its functionality is locked inside proprietary hardware architecture, ASICS and interfaces. The functionality it provides is tightly coupled with their internal architecture and Hitachi probably believes that by doing so they can provide better performance and more tightly integrated functionality to the enterprise.

Perhaps this doesn’t do justice to either development team. There is plenty of unique proprietary hardware and sophisticated packaging in VMAX but they have taken the approach of separate but equal nodes. Whereas Hitachi has distributed this functionality out to various components like Front-end directors (FEDs), backend directors (BEDs), cache adaptors (CAs) and virtual storage directors (VSDs), each of which can scale independently, i.e., doesn’t require more BEDs to add FEDs or CAs. Ditto for VSDs. Each can be scaled separately up to the maximum that can fit inside a controller chasis and then if needed, you can add a whole another controller chasis.

One has an internal switching infrastructure (the VSP cross bar switch) and the other uses external switching infrastructure (the VMAX RapidIO). The promise of external switching like commodity hardware, is that you can share the R&D funding to enhance this technology with other users. But the disadvantage is that architecturally you may have more latency to propagate an IO to other nodes for handling.

With VSP’s cross bar switch, you may still need to move IO activity between VSDs but this can be done much faster and any VSD can access any CA, BED, FED resource required to perform the IO so the need to move IO is reduced considerably. Thus, providing a global pool of resources that any IO can take advantage of.

In the end, blade systems like VSP or separate server systems like VMAX, can all work their magic. Both systems have their place today and in the foreseeable future. Where blades servers shine is in dense packaging, high power cooling efficiency and bringing a lot of horse power to a small package. On the other hand, server systems are simple to deploy and connect together with minimal limitations on the number of servers that can be brought together.

In a small space blade systems probably can bring more compute (storage IO) power to bear within the same volume than multiple server systems but the hardware is much more proprietary and costs lots of R&D $s to maintain leading edge capabilities.

Typed this out after the show, hopefully I characterized the two products properly. If I am missing anything please let me know.

[Edited for readability, grammar and numerous misspellings – last time I do this on an iPhone. Thanks to Jay Livens (@SEPATONJay) and others for catching my errors.]

To iPad or not to iPad – part 2

iPad with BlueTooth Keyboard
iPad with BlueTooth Keyboard

(Length post warning – 1200+ words)

We had discussed using the iPad in a prior post and although, it was uncertain up to the last minute, I ended up taking the iPad to a conference early this month.  My uncertainty was all related to getting our monthly newsletter out.

The newsletter is mainly a text file  but it links to a number of Storage Intelligence (StorInt(tm) reports) PDFs which reside on my website.  Creating and editing these documents is done using Microsoft Word.  Oftentimes the edits to these documents involve tracked changes which aren’t handled very well by iPad’s Pages app (they’re all accepted).

In addition, these .DOC files are converted to .PDFs and uploaded to the website.  While Pages handles importing Doc files and publishing PDF files from them, I am still unclear how to upload a Pages PDF file to a website. There are many FTP apps for the iPad/iPhone but none seem able to upload a PDF file out of Pages App.

All this was going to require the use of a laptop but I finally got all the file edits in and before I left, was able to send out the newsletter.

Twitter troubles

While at the conference I noticed that there really isn’t a proper Twitter client for the iPad.  Most desktop/laptop Twitter clients allow one to see their Twitter stream while composing a Tweet.  But the free Twitter/TweetDeck/Twitteriffic Apps on the iPad all seem to want to obscure the Twitter stream(s) when one enter’s a new tweet – probably assuming one’s using the soft keypad which would obscure the stream anyway.  Nonetheless, such actions make responding to Twitter queries more difficult than necessary.

Docs debacle

As always, loading up my current working set (client information, office doc’s, PDFs, etc.) was cumbersome. I have taken to using a special email address, only used for this purpose and creating one email per client which works alright.

Working on a project with iPad Pages App worked ok, but:

  • The font/special characters changes between .Doc and Pages files seems awkward.  For example, I was using the large bullet on Pages and when I transformed this file to a DOC file, the bullet became HUGE.
  • Also the font that Pages uses defaults to something different than Microsoft Word’s defaults.
  • Watermark images didn’t seem to be as transparent when converting between Doc’s and Pages

Mostly these were nuisances that I had to deal with when importing a file from iPad to desktop or vice versa.

However, working on one project I realized I needed some metrics I normally keep in a spreadsheet on my desktop/laptop.  I ended up calling home office and walking my associate through accessing the information and telling me what I needed to know.  I also asked them to send that spreadsheet to me so that I would have it for future reference.

BlueTooth blessings/bunglings

At the conference I was blessed with a table to sit at during the keynotes (passing myself off as a blogger) which made using the BlueTooth (BT) keypad and iPad much easier.  I also used the combination on the airplane on the way home and found the combination much more flexible than a laptop.  Although it’s unclear whether this would work as well sitting on my lap in normal conference seating.

Also I really wish there was some sort of other indicators/light(s) on the BT keypad.  It only has one green led and this makes for rather limited communications.  I tried to connect it to the iPad on the plane ride out but it failed.  I thought perhaps the batteries had run down and needed to be replaced.  When I got to my destination I tried again after looking up what the BT keypad green led and it worked just fine.  FYI:

  • A flashing green led means the BT keypad is pairing with a target devicep
  • To turn the BT keypad on, push and hold the side button until the green led starts to blink.
  • To turn the BT keypad off, push and hold the side button until the green led comes on and eventually off.

For some reason this was difficult to find online but it was probably in the printed doc that came with the keyboard (filed away and never seen again).  More lights might help, like green for on/yellow for discoverable, red for (going) off.  Or maybe if I just need to use it more often. I may have tried to pair it with my iPhone which didn’t help  (can’t be sure, also unclear how to clear it’s prior pairing).

Nevertheless, it might make sense to carry some extra batteries and/or their battery charger for just these types of problems.  There were quite a few people who commented on the BT keyboard/iPad combination.  They seemed unaware that it could be used with the iPad

Spellcheck saga

The other problem I had was with the iPad’s spell checker.  It turns out there are two levels of spell checking in the iPad and they are both active within Pages.  One can be disabled at the Pages Tools=>Check Spelling and the other is under iPad settings at General=>Keyboard=>Auto-Correction.   I was able to quickly find the Pages version but it took some effort to uncover the Keyboard one.

Nonetheless, while pounding in conference notes, I often employ vendor acronyms.  Oftentimes the spell checker/auto-corrector would transform these acronyms to something completely different.  Of course my typing is not perfect, so my other issue is that I miss-type words, which after auto-correction had little relation to what I was trying to type.

I realize that this is an attribute of soft keypad corrections, probably coming from the iPhone where often people mis-type due to the size of the keys.  However, when using the iPad and especially when using the BT keypad it would be nice if auto-correction was turned off, by default.

Other iPad incredulity

I was surprised to see some analysts with both an iPad and a laptop (and probably an iPhone/Blackberry).  Personally, I can’t see why anyone would want both other than for more screen space.  But I was a bit jealous when I had to change Apps to tweet something or check email/websites while inputing notes in real time.

Also, I was afraid depending on hotel/conference WIFI would place me at a disadvantage to other analysts/bloggers.  Ultimately, I found that for my use of internet (mostly for Twitter and email) during conferences, WIFI was adequate and I always had my iPhone if it didn’t work.

After 2hrs+ of keynotes and another 2hrs+ of presentations, I was running low on iPad power.  So, I started to power the iPad off between notes and tweets.  Funny thing, all I had to do to power on the screen was to start typing on the BT keypad – cool.  As I recall, it occasionally missed the first key stroke or so but worked fine after that.  Following lunch about an hour later, I pulled out my power cord extension and plugged it into the table outlet and kept it on for the rest of the day.  Thankfully, I remembered to bring the extension cord (that came with the laptop charger).

Well that’s about it, I have another short conference next week and will probably try again to bring the iPad but that pesky monthly newsletter is due out again…

Data storage features for virtual desktop infrastructure (VDI) deployments

The Planet Data Center by The Planet (cc) (from Flickr)
The Planet Data Center by The Planet (cc) (from Flickr)

Was talking with someone yesterday about one of my favorite topics, data storage for virtual desktop infrastructure (VDI) deployments.  In my mind there are a few advanced storage features that help considerably with VDI implemetations:

  • Deduplication – almost every one of your virtual desktops will share 75-90% of their O/S disk data with every other virtual desktop.  Having sub-file/sub-block deduplication can be a godsend for all this replicated data and reduce O/S storage requirements considerably.
  • 0 storage snapshots/clones – another solution to the duplication of O/S data is to use some sort of space conserving snapshots.  For example, one creates a master (gold) disk image and makes 100s if not 1000s of snapshots of it, taking almost no additional space.
  • Highly available/highly reliable storage – when you have a lone desktop dependent on DAS for it’s O/S, it doesn’t impact a lot of users if that device fails. However, when you have 100s to 1000s of users dependent on DAS device(s) for their O/S software, any DAS failure could impact all of them at the same time.  As such, one needs to move off DAS and invest in highly reliable and available external storage of some kind to sustain reasonable uptime for your user community.

Those seem to me to be the most important attributes for VDI storage but there are a couple more features/facilities which can also:

  • NAS systems with NFS – VDI deployments will generate lots of VMDKs for all the user desktop C: drives.  Although this can be managed with block level storage as separate LUNs or multi-VMDK LUNs, who want’s to configure a 100 to 1000 LUNs.  NFS files can perform just as well and are much easier to create on the fly and thus, for VDI it’s hard to beat NFS storage.
  • Boot storm enhancements – Another problem with VDI is that everyone gets to work 8am Monday and proceeds to boot up their (virtual) machines, which drives an awful lot of IO to their virtual C: drives.  Deduplication and 0 storage snapshots can help manage the boot storm as long as these characteristics are retained throughout system cache, i.e, deduplication exists in cache as well as on backend disk.  But there are other approaches to the problem as well, available from various vendors to better manage boot storms.
  • Anti-Virus scan enhancements – Similar to boot storms, A-V scans also typically happen around the same time for many desktop users and can be just as bad for virtual C: drive performance.  Again, deduplication or 0 storage snapshots can help (with above caveats) but some vendor storage can offload these activities from the desktop alltogether.  Also last weeks VMworld release of VMware’s vShield Edge (see VMworld 2010 review) also supports some A-V scan enhancements. Any of these approaches should be able to help.

Regular “dumb” block storage will always work but it will require a lot more raw storage, performance will suffer just when everybody gets back to work, and the administrative burden will be much higher.

I may seem biased but enterprise class reliability&availability with some of the advanced storage features described above can help make your deployment of VDI that much better for you and all your knowledge workers.

Anything I missed?

VMworld 2010 review

The start of VMWorld2010's 1st keynote session
The start of VMWorld2010's 1st keynote session

Got back from VMworld last week and had a great time. Met a number of new and old friends and talked a lot about the new VMware technology coming online. Some highlights from the keynote sessions I attended,

vCloud Director

Previously known as Redwood, VMware is rolling out their support for cloud services and tieing it into their data center services. vCloud Director supports the definition of Virtual Data Centers with varying SLA characteristics. It is expected that virtual data centers would each support different service levels, something like “Gold”, “Silver” and “Bronze”. Virtual data centers now represent a class of VM service and aggregates all VMware data center resources together into massive resource pools which can now better managed and allocated to VMs that need them.

For example, by using vCLoud Director, one only needs to select which Virtual Data Center to specify the SLAs for a VM. New VMs will be allocated to the virtual data center that provides the requested service. This takes DRS, HA and FT to a whole new level.

Even more, it now allows vCloud Data Center Service partners to enter into the picture and provide a virtual data center class of service to the customer. In this way, a customer’s onsite data center could supply Gold and Silver virtual data center services while Bronze services could be provided at a service partner.

vShield

With all the advent of VM cloud capabilites coming online the need for VM security is becoming much more pressing. To address these concerns, VMware rolled out their vShield services which come in two levels today vShield Endpoint and vShield Edge.

  • Endpoint – offloads anti-virus scans from running in the VM and interfaces with standard anti-virus vendors to run the scan at the higher (ESX) levels.
  • Edge – provides for VPN and firewalls surrounding the virtual data center and interfaces with Cisco, Intel-McAffee, Symantec, and RSA to insure tight integration with these data center security providers.

The combination of vShield and vCloud Director allows approved vCloud Data Center Service providers to supply end-to-end data center security surrounding VMs and virtual data centers. Their are currently 5 approved vShield/vCloud Data Center Services partners today and they are Terramark, Verizon, Singtel, Colt, and Bluelock with more coming online shortly. Using vShield services, VMs could have secured access to onsite data center services even though they were executing offsite in the cloud.

VMware View

A new version of VMware’s VDI interface was released which now includes offline mode for those users that occasionally reside outside normal network access and need to use a standalone desktop environment. With the latest VMware View offline mode, one would checkout (download) a desktop virtual machine to your laptop and then be able to run all your desktop applications without network access.

 

vStorage API for Array Integration (VAAI)

VAAI supports advanced storage capabilities such as cloning, snapshot and thin provisioning and improves the efficiency of VM I/O. These changes should make thin provisioning much more efficient to use and should enable VMware to take advantage of storage hardware services such as snapshots and clones to offload VMware software services.

vSphere Essentials

Essentials is an SMB targeted VMware solution license-able for ~$18 per VM in an 8-core server, lowering the entry costs for VMware to very reasonable levels. The SMB data center’s number one problem is the lack of resources and this should enable more SMB shops to adopt VMware services at an entry level and grow up with VMware solutions in their environment.

VMforce

VMforce allows applications developed under Springsource, the enterprise java application development framework of the future, to run on the cloud via Salesforce.com cloud infrastructire. VMware is also working with Google and other cloud computing providers to provide similar services on their cloud infrastructure.

Other News

In addition to these feature/functionality announcements, VMware discussed their two most recent acquisitions of Integrien and TriCipher.

  • Integrien – is a both a visualization and resource analytics application. This will let administrators see at a glance how their VMware environment is operating with a dashboard and then allows one to drill down to see what is wrong with any items indicated by red or yellow lights. Integrien integrates with vCenter and other services to provide the analytics needed to determine resource status and details needed to understand how to resolve any flagged situation.
  • TriCipher – is a security service that will ultimately provide a single sign-on/login for all VMware services. As discussed above security is becoming ever more important in VMware environments and separate sign-ons to all VMware services would be cumbersome at best. However, with TriCipher, one only need sign-on once and then have access to any and all VMware services in a securely authenticated fashion.

VMWorld Lowlights

Most of these are nits and not worth dwelling on but the exhibitors and other non-high level sponsors/exhibitors all seemed to complain about the lack of conference rooms and were not allowed in the press&analyst rooms. Finding seating to talk to these vendors was difficult at best around the conference sessions, on the exhibit floor, or in the restuarants/cafe’s surrounding Moscone Conference Center. Although once you got offsite facilities were much more accommodating.

I would have to say another lowlight were all the late night parties that occurred – not that I didn’t partake in my fair share of partying. There were rumors of one incident where a conference goer was running around a hotel hall with only undergarments on blowing kisses to any female within sight. Some people shouldn’t be allowed to leave home.

The only other real negative in a pretty flawless show was the lines of people waiting to get into the technical sessions. They were pretty orderly but I have not seen anything like this amount of interest before in technical presentations. Perhaps, I have just been going to the wrong conferences. In any event, I suspect VMworld will need to change venues soon as their technical sessions seem to be outgrowing their session rooms although the exhibit floor could have used a few more exhibitors. Too bad, I loved San Francisco and Moscone Center was so easy to get to…

—-

But all in all a great conference, learned lot’s of new stuff, talked with many old friends, and met many new ones. I look forward to next year.

Anything I missed?

To iPad or not to iPad?

iPad (from wikipedia.org)
iPad (from wikipedia.org)

I am going to a big conference next week, 2 full days out of the office. In times of yore, I would haul my trusty Macbook along and lugging it with me on both days as I move from pavilion to briefing hall, from lunch back to pavilion and from beer hall to bed.

A couple of months ago, I tried using an iPad for a different conference. I purchased an Apple Bluetooth (BT) keyboard and carried it with the iPad for most of the show.  With the BT keypad, power input was just as fast as on the laptop and even faster as I didn’t need to boot anything up.

The other nice thing about the BT keyboard with the iPad is you have fine cursor controls (arrow keys) which can be used to position input pointer.  I did find having to take my hand off the keyboard and touch the screen for some clicking action disconcerting and there were some iPad applications that didn’t handle the arrow keys appropriately but other than that, it worked great for power input, answering emails, and web searches.

The internal, soft iPad keyboard worked ok but wasn’t nearly as fast and didn’t support Dvorak.  Also the soft keyboard in portrait mode only provides 6 lines of pages text which makes power input with feedback more difficult.  In any case, I would use it to rip off quick emails, tweets, and other short stuff which worked well enough. I still took notes on paper (probably to old now to take notes on the iPad/laptop).  Having the keyboard available with a moments delay, made it easy to decide to take it out to use it when I had the time or leave it in the backpack when I didn’t.

Another positive note was that the iPad took up very little desk space.  Most briefing halls nowadays have these smallish retractable desk tops that can barely hold a legal pad let alone a laptop.  The iPad fit these postage stamp desktops just fine.

Not sure how to quantify the weight advantage of the iPad+BT Keyboard vs. Macbook without weighing them but it is significant.  Given all the junk I carry along with the laptop vs. the iPad+BT keyboard, the iPad/BT keyboard wins hands down.  It’s almost like I am not carrying a computer at all.

Problems with using the iPad

There are a couple of web applications (e.g., Wordress visual editor) that seem dependent on flash to work properly, which made using the iPad to create blog posts problematic.  Also, scrolling in WordPress post editor seems to be a flash application as well which made dealing with any long post edits problematic at best.  Wordpress has an iPhone/iPad application which is just as good as the non-visual editor in web-based WordPress which comes in handy at these times.

Now in all honesty, I haven’t tried these in a while and these may not be flash issues as much as iPad issues. Nonetheless, I will guarantee that you will run into some websites that you use in your daily activities that use flash and won’t work.  With the iPad you just will need to forego these websites and find alternatives.

In the office I am a heavy TweetDeck user.  For some reason this application doesn’t work that well for the iPad. I have the latest version and all but find using Twitterific or the official Twitter App a better solution on the iPad.

I purchased the WiFi version of the iPad and iPad’s do not come with Ethernet plug-ins.   Now most conference centers these days have WiFi, but it may not always work that well.  Also some hotels only have WiFi in certain locations and not in the hotel rooms.  All this makes having internet access somewhat sporadic. But you can always buy the 3G version if you want to and I always have my iphone for internet access in a pinch (assuming ATT has adequate conference center/hotel coverage).

I was told that the iPad power converter and connection would also charge up my 3G iPhone but this turned out not to work.  Luckily, I brought along the power converter for the 3G iPhone by mistake and the cable connection between the power converter and iPad worked just fine for the iPhone.  Also the cable from the power adaptor to iPad is somewhat short, so bring the extension cord in order to be able to work with the iPad while its charging.

I ended up purchasing the Apple case for the iPad. I wanted to be able to have it upright portrait or landscape while I was typing on the keyboard, have it slant upward while using the soft keypad and otherwise lie flat. The Apple iPad case does all this without problem.

Microsoft Office documents

Word documents get converted into Pages documents pretty easily but you lose all change tracking, some of the formatting, and other esoteric stuff.  It’s probably ok for internal documents but I find putting together a final document using Pages still a problem. But  I must say I am a novice here.  Also converting Pages documents back into Word seems easy enough.

I have spent even less time with Numbers and Keynote but they seem adequate for minor stuffconvert .XLS and .PPT files to Numbers and Keynote files (but not back to .XLS and .PPT) and if I used them more probably ok for much more sophisticated work.  There are other applications that seem to provide better iPhone support for Microsoft Office editing but I have yet to try them on either the iPad or iPhone.  Also, beware that converting Numbers documents to Excel and Keynote to PowerPoint require Mac desktop versions of these programs.

Document availability is somewhat problematic.  I met one person who emailed work documents to themselves to solve this problem.  Email works ok as long as they don’t scroll out of iPad (iPad keeps the latest 200 emails max for any account which includes spam).  For this purpose, I used a not-so-well-known email address and emailed my current work documents to that account.  iTunes supports a way to copy files to and from the Mac or iPad which seems painless enough but the email interface worked just as well for me and I didn’t have to synch up to have the files transferred.

Beware of changing headers and footers in Pages and trying to alter them in Word once you get it back to the office.  It never worked for me.  I had to copy the text of the document to another fresh Word file and work the header/footers in that.

iPad security

Mac based passwords, logins, and security characteristics are a bit difficult and time-consumming to transfer to the iPad.  You can manually load them in for any websites and applications you need but there is no way to transfer a whole keychain from Mac to iPad.  As such, if you neglect to transfer security credentials for an important website to iPad your out of luck.  Now there are some apps that profess to being able to transfer and maintain keychains on the iPhone or the iPad but I haven’t tried them yet.

Other iPad security aspects are even more problematic.  The iPad can be setup to require entry of a 4 numeric character string to access it.  Another setting will erase the contents of the iPad after 10 failed logins attempts. And MobileMe probably supports some way to erase an iPad that’s out of your hands (it does this for iPhones so I would think the same service would be available for the iPad but I haven’t looked into it).

But despite all that, I don’t feel the iPad is as secure as the Macbook. For one thing, I encrypt the data on the Macbook and the system password can be alphanumeric and considerably longer than 4 characters.  In any case the harddrive can be removed from the Macbook but without the passkey, the data on the drive would be useless.  In contrast the SSD-Flash memory on the iPad could be pulled out and analyzed without any trouble whatsoever and with proper understanding of IOS storage formatting be read in the clear.

Also the fact that its smaller and lighter it could easily be forgotten and left behind making it more lose-able.  And it’s certainly more prone to being stolen because it’s smaller and lighter.

—–

At this point I will probably  use the iPad for the upcoming VMworld conference just to see if it works as well the 2nd time as it did the first.  It’s only two full days, what can go wrong?

Cloud storage, CDP & deduplication

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Somebody needs to create a system that encompasses continuous data protection, deduplication and cloud storage.  Many vendors have various parts of such a solution but none to my knowledge has put it all together.

Why CDP, deduplication and cloud storage?

We have written about cloud problems in the past (eventual data consistency and what’s holding back the cloud) despite all that, backup is a killer app for cloud storage.  Many of us would like to keep backup data around for a very long time. But storage costs govern how long data can be retained.  Cloud storage with its low cost/GB/month can help minimize such concerns.

We have also blogged about dedupe in the past (describing dedupe) and have written in industry press and our own StorInt dispatches on dedupe product introductions/enhancements.  Deduplication can reduce storage footprint and works especially well for backup which often saves the same data over and over again.  By combining deduplication with cloud storage we can reduce the data transfers and data stored on the cloud, minimizing costs even more.

CDP is more troublesome and yets still worthy of discussion.  Continuous data protection has always been sort of a step child in the backup business.  As a technologist, I understand it’s limitations (application consistency) and understand why it has been unable to take off effectively (false starts).   But, in theory at some point CDP will work, at some point CDP will use the cloud, at some point CDP will embrace deduplication and when that happens it could be the start of an ideal backup environment.

Deduplicating CDP using cloud storage

Let me describe the CDP-Cloud-Deduplication appliance that I envision.  Whether through O/S, Hypervisor or storage (sub-)system agents, the system traps all writes (forks the write) and sends the data and meta-data in real time to another appliance.  Once in the CDP appliance, the data can be deduplicated and any unique data plus meta data can be packaged up, buffered, and deposited in the cloud.  All this happens in an ongoing fashion throughout the day.

Sometime later, a restore is requested. The appliance looks up the appropriate mapping for the data being restored, issues requests to read the data from the cloud and reconstitutes (un-deduplicates) the data before copying it to the restoration location.

Problems?

The problems with this solution include:

  • Application consistency
  • Data backup timeframes
  • Appliance throughput
  • Cloud storage throughput

By tieing the appliance to a storage (sub-)system one may be able to get around some of these problems.

One could configure the appliance throughput to match the typical write workload of the storage.  This could provide an upper limit as to when the data is at least duplicated in the appliance but not necessarily backed up (pseudo backup timeframe).

As for throughput, if we could somehow understand the average write and deduplication rates we could configure the appliance and cloud storage pipes accordingly.  In this fashion, we could match appliance throughput to the deduplicated write workload (appliance and cloud storage throughput)

Application consistency is more substantial concern.  For example, copying every write to a file doesn’t mean one can recover the file.  The problem is at some point the file is actually closed and that’s the only time it is in an application consistent state.  Recovering to a point before or after this, leaves a partially updated, potentially corrupted file, of little use to anyone without major effort to transform it into a valid and consistent file image.

To provide application consistency, one needs to somehow understand when files are closed or applications quiesced.  Application consistency needs would argue for some sort of O/S or hypervisor agent rather than storage (sub-)system interface.  Such an approach could be more cognizant of file closure or application quiesce, allowing a synch point could be inserted in the meta-data stream for the captured data.

Most backup software has long mastered application consistency through the use of application and/or O/S APIs/other facilities to synchronize backups to when the application or user community is quiesced.  CDP must take advantage of the same facilities.

Seems simple enough, tie cloud storage behind a CDP appliance that supports deduplication.  Something like this could be packaged up in a cloud storage gateway or similar appliance.  Such a system could be an ideal application for cloud storage and would make backups transparent and very efficient.

What do you think?

Why cloud, why now?

Moore’s Law by Marcin Wichary (cc) (from Flickr)
Moore’s Law by Marcin Wichary (cc) (from Flickr)

I have been struggling for sometime now to understand why cloud computing and cloud storage have suddenly become so popular.  We have previously discussed some of cloud problems (here and here) but we have never touched on why cloud has become so popular.

In my view, SaaS or ASPs and MSPs have been around for a decade or more now and have been renamed cloud computing and storage but they have rapidly taken over the IT discussion.  Why now?

At first I thought this new popularity was due to the prevalence of higher bandwidth today. But later I determined that this was too simplistic.  Now I would say the reasons cloud services have become so popular, include

  • Bandwidth costs have decreased substantially
  • Hardware costs have decreased substantially
  • Software costs remain flat

Given the above one would think that non-cloud computing/storage would also be more popular today and you would be right.  But, there is something about the pricing reduction available from cloud services which substantially increases interest.

For example, at $10,000 per widget, a market size may be ok, at $100/widget the market becomes larger still, and at $1/widget the market can be huge.  This is what seems to have happened to Cloud services.  Pricing has gradually decreased, brought about through hardware and bandwidth cost reductions and has finally reached a point where the market has grown significantly.

Take email for example:

Now with Google or Exchange Online you have to supply internet access or the bandwidth required to access the email account.  For Exchange, you would also need to provide the internet access to get email in and out of your environment, servers and storage to run Exchange server, and would use internal LAN resources to distribute that email to internally attached clients.  I would venture to say the similar pricing differences applies to CRM, ERP, storage etc. which could be hosted in your data center or used as a cloud service.  Also, over the last decade these prices have been coming down for cloud services but have remained (relatively) flat for on premises services.

How does such pricing affect market size?

Well, when it costs ~$1034 (+ server costs + admin time) to field 5 Exchange email accounts vs.  $250 for 5 Gmail ($300 for 5 Exchange Online) accounts the assumption is that the market will increase, maybe not ~12X but certainly 3X or more.  At ~$3000 or more, I need a substantially larger justification to introduce enterprise email services but at $250,  justification becomes much simpler.

Moreover, the fact that the entry pricing is substantially smaller, i.e.,  $~2800 for one Exchange Standard Edition account vs $50 for one (Gmail) email account, justification becomes almost a non-issue and the market size grows geometrically.  In the past, pricing for such services may have prohibited small business use, but today cloud pricing makes them very affordable and as such, more widely adopted.

I suppose there is another inflection point at  $0.50/mail user that would increase market size even more.  However, at some point anybody in the world with internet access could afford enterprise email services and I don’t think the market could grow much larger.

So there you have it.  Why cloud, why now – the reasons are hardware and bandwidth pricing have come down giving rise to much more affordable cloud services opening up more market participants at the low end.  But it’s not just SMB customers that can now take advantage of these lower priced services, large companies can also now afford to implement applications which were too costly to introduce before.

Yes, cloud services can be slow and yes, cloud services can be insecure but, the price can’t be beat.

As to why software pricing has remained flat must remain a mystery for now but may be treated in some future post.

Any other thoughts as to why cloud’s popularity has increased so much?

Better storage through hardware

Apple's Xserve (from Apple.com)
Apple's Xserve (from Apple.com)

Chuck Hollis from EMC wrote a post last week on Storage is software about how hardware parts are becoming commoditized and so highly functional that future storage differentiation will only come from software.  I commented that hardware differentiation is also becoming much easier with FPGAs and their ilk.  Chuck replied that yes this may be so but will anyone pay the cost for such differentiation.

My reply deserves a longer discussion.  Chuck’s mentioned Apple as one company differentiating successfully in hardware but thought that this would not apply to storage

Better storage through hardware differentiation

I am a big fan of Apple and so, it’s hard for me to see why something similar could not apply to storage.  IMHO, what Apple has done better than the rest is to reconstruct the user experience, in totality, from one of frustration to one of delight.

Can such a thing be done for storage and if so “will it sell”? I believe Yes to both questions.

Will such a new storage product necessarily require hardware/FPGA development as much as software/firmware development?  Again, yes.

Will anyone create this “better” storage? No easy answers here.

Developing better storage

Such a task involves a complete remaking, from the ground up of a new storage product from the user/admin experience perspective.  But the hard part is that the O/Ss and virtualization systems govern/control much of the storage user/admin experience, not the storage.  As such, much functionality will necessarily be done in software, not hardware.

However, that doesn’t mean that hardware differentiation can’t help. For example, consider storage interfaces.  Today, it’s not unusual to have 6 or more interfaces for a storage system.  But for me it’s hard to see how this couldn’t be better served with 2-10GbE and 2-8GFC and WiFi for an alternate admin interface.  In a similar fashion, look at internal storage interfaces.  It’s hard for me to see any absolute requirement for cabling here.  Ditto for power cabling. And all this just improves the out-of-the-box experience.

Could something similar be done for the normal storage configuration, monitoring, and protection activities? Most certainly.  Even so, much of this differentiation would be done via software/firmware and O/S APIs being used.  However, perhaps some small portion can make use of hardware/packaging differentiation.

I like to think that “I will know it when I see it”.  But, when someone can take storage out of a box, “install, use and protect it” on any O/S, virtualization environment with nothing more than a poster with 5 to 7 blocks as an guide, such “Apple-like” storage will have arrived.

Until then, storage admins will need training, a “storage admin” will be part of a job description, and storage will be something “we do” rather than something “we use”.

So my final answer to Chuck is:  will anyone do it – don’t know.

What do you think?