New cloud storage and Hadoop managed service offering from Spring SNW

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Last week I posted my thoughts on Spring SNW in Dallas, but there were two more items that keep coming back to me (aside from the tornados).  The first was a new startup called Symform in cloud storage and the other was an announcement from SunGard about their new Hadoop managed services offering.


Symform offers an interesting alternative on cloud storage that avoids the build up of large multi-site data centers and uses your desktop storage as a sort of crowd-sourced storage cloud, sort of bit-torrent cloud storage.

You may recall I discussed such a Peer-to-Peer cloud storage and computing services in a posting a couple of years ago.  It seems Symform has taken this task on, at least for storage.

A customer downloads (Windows or Mac) software which is installed and executes on your desktop.  The first thing you have to do after providing security credentials is to identify which directories will be moved to the cloud and the second is to tell whether you wish to contribute to Symform’s cloud storage and where this storage is located.  Symform maintains a cloud management data center which records all the metadata about your cloud resident data and everyone’s contributed storage space.

Symform cloud data is split up into 64MB blocks and encrypted (AES-256) using a randomly generated key (known only to Symform). Then this block is broken up into 64 fragments with 32 parity fragments (using erasure coding) added to the stream which is then written to 96 different locations.  With this arrangement, the system could potentially lose 31 fragments out of the 96 and still reconstitute your 64MB of data.  The metadata supporting all this activity sits in Symform’s data center.

Unclear to me what you have to provide as far as ongoing access to your contributed storage.  I would guess you would need to provide 7X24 access to this storage but the 32 parity fragments are there for possible network/power failures outside your control.

Cloud storage performance is an outcome of the many fragments that are disbursed throughout their storage cloud world. It’s similar to a bit torrent stream with all 96 locations participating in reconstituting your 64MB of data.  Of course, not all 96 locations have to be active just some > 64 fragment subset but it’s still cloud storage so data access latency is on the order of internet time (many seconds).  Nonetheless, once data transfer begins, throughput performance can be pretty high, which means your data should arrive shortly thereafter.

Pricing seemed comparable to other cloud storage services with a monthly base access fee and a storage amount fee over that.  But, you can receive significant discounts if you contribute storage and your first 200GB is free as long as you contribute 200GB of storage space to the Symform cloud.

Sungard’s new Apache Hadoop managed service

Hadoop Logo (from website)
Hadoop Logo (from website)

We are well aware of Sungard’s business continuity/disaster recovery (BC/DR) services, an IT mainstay for decades now. But sometime within the last decade or so Sungard has been expanding outside this space by moving into managed availability services.

Apparently this began when Sungard noticed the number of new web apps being deployed each year exceeded the number of client server apps. Then along came virtualization, which reduced the need for lots of server and storage hardware for BC/DR.

As evident of this trend, last year Sungard announced a new enterprise class computing cloud service.  But in last week’s announcement, Sungard has teamed up with EMC Greenplum to supply an enterprise ready Apache Hadoop managed service offering.

Recall, that EMC Greenplum is offering their own Apache Hadoop supported distribution, Greenplum HD.  Sungard is basing there service on this distribution. But there’s more.

In conjunction with Hadoop, Sungard adds Greenplum appliances.  With this configuration Sungard can load Hadoop processed and structured data into a Greenplum relational database for high performance data analytics.  Once there, any standard SQL analytics and queries can be used against to analyze the data.

With these services Sungard is attempting to provide a unified analytics service that spans all structured, semi-structured and unstructured data.


Probably more to Spring SNW but given my limited time on the exhibition floor and time in vendor discussions these and my previously published post are what I seem of most interest to me.

Why EMC is doing Project Lightening and Thunder

Picture of atmospheric lightening striking ground near a building at night
rayo 3 by El Garza (cc) (from Flickr)

Although technically Project Lightening and Thunder represent some interesting offshoots of EMC software, hardware and system prowess,  I wonder why they would decide to go after this particular market space.

There are plenty of alternative offerings in the PCIe NAND memory card space.  Moreover, the PCIe card caching functionality, while interesting is not that hard to replicate and such software capability is not a serious barrier of entry for HP, IBM, NetApp and many, many others.  And the margins cannot be that great.

So why get into this low margin business?

I can see a couple of reasons why EMC might want to do this.

  • Believing in the commoditization of storage performance.  I have had this debate with a number of analysts over the years but there remain many out there that firmly believe that storage performance will become a commodity sooner, rather than later.  By entering the PCIe NAND card IO buffer space, EMC can create a beachhead in this movement that helps them build market awareness, higher manufacturing volumes, and support expertise.  As such, when the inevitable happens and high margins for enterprise storage start to deteriorate, EMC will be able to capitalize on this hard won, operational effectiveness.
  • Moving up the IO stack.  From an applications IO request to the disk device that actually services it is a long journey with multiple places to make money.  Currently, EMC has a significant share of everything that happens after the fabric switch whether it is FC,  iSCSI, NFS or CIFS.  What they don’t have is a significant share in the switch infrastructure or anywhere on the other (host side) of that interface stack.  Yes they have Avamar, Networker, Documentum, and other software that help manage, secure and protect IO activity together with other significant investments in RSA and VMware.   But these represent adjacent market spaces rather than primary IO stack endeavors.  Lightening represents a hybrid software/hardware solution that moves EMC up the IO stack to inside the server.  As such, it represents yet another opportunity to profit from all the IO going on in the data center.
  • Making big data more effective.  The fact that Hadoop doesn’t really need or use high end storage has not been lost to most storage vendors.  With Lightening, EMC has a storage enhancement offering that can readily improve  Hadoop cluster processing.  Something like Lightening’s caching software could easily be tailored to enhance HDFS file access mode and thus, speed up cluster processing.  If Hadoop and big data are to be the next big consumer of storage, then speeding cluster processing will certainly help and profiting by doing this only makes sense.
  • Believing that SSDs will transform storage. To many of us the age of disks is waning.  SSDs, in some form or another, will be the underlying technology for the next age of storage.  The densities, performance and energy efficiency of current NAND based SSD technology are commendable but they will only get better over time.  The capabilities brought about by such technology will certainly transform the storage industry as we know it, if they haven’t already.  But where SSD technology actually emerges is still being played out in the market place.  Many believe that when industry transitions like this happen it’s best to be engaged everywhere change is likely to happen, hoping that at least some of them will succeed. Perhaps PCIe SSD cards may not take over all server IO activity but if it does, not being there or being late will certainly hurt a company’s chances to profit from it.

There may be more reasons I missed here but these seem to be the main ones.  Of the above, I think the last one, SSD rules the next transition is most important to EMC.

They have been successful in the past during other industry transitions.  If anything they have shown similar indications with their acquisitions by buying into transitions if they don’t own them, witness Data Domain, RSA, and VMware.  So I suspect the view in EMC is that doubling down on SSDs will enable them to ride out the next storm and be in a profitable place for the next change, whatever that might be.

And following lightening, Project Thunder

Similarly, Project Thunder seems to represent EMC doubling their bet yet again on the SSDs.  Just about every month I talk to another storage startup coming out in the market providing another new take on storage using every form of SSD imaginable.

However, Project Thunder as envisioned today is not storage, but rather some form of external shared memory.  I have heard this before, in the IBM mainframe space about 15-20 years ago.  At that time shared external memory was going to handle all mainframe IO processing and the only storage left was going to be bulk archive or migration storage – a big threat to the non-IBM mainframe storage vendors at the time.

One problem then was that the shared DRAM memory of the time was way more expensive than sophisticated disk storage and the price wasn’t coming down fast enough to counteract increased demand.  The other problem was making shared memory work with all the existing mainframe applications was not easy.  IBM at least had control over the OS, HW and most of the larger applications at the time.  Yet they still struggled to make it usable and effective, probably some lesson here for EMC.

Fast forward 20 years and NAND based SSDs are the right hardware technology to make  inexpensive shared memory happen.  In addition, the road map for NAND and other SSD technologies looks poised to continue the capacity increase and price reductions necessary to compete effectively with disk in the long run.

However, the challenges then and now seem as much to do with software that makes shared external memory universally effective as with the hardware technology to implement it.  Providing a new storage tier in Linux, Windows and/or VMware is easier said than done. Most recent successes have usually been offshoots of SCSI (iSCSI, FCoE, etc).  Nevertheless, if it was good for mainframes then, it certainly good for Linux, Windows and VMware today.

And that seems to be where Thunder is heading, I think.




Latest SPECsfs2008 results, over 1 million NFS ops/sec – chart-of-the-month

Column chart showing the top 10 NFS througput operations per second for SPECsfs2008
(SCISFS111221-001) (c) 2011 Silverton Consulting, All Rights Reserved

[We are still catching up on our charts for the past quarter but this one brings us up to date through last month]

There’s just something about a million SPECsfs2008(r) NFS throughput operations per second that kind of excites me (weird, I know).  Yes it takes over 44-nodes of Avere FXT 3500 with over 6TB of DRAM cache, 140-nodes of EMC Isilon S200 with almost 7TB of DRAM cache and 25TB of SSDs or at least 16-nodes of NetApp FAS6240 in Data ONTAP 8.1 cluster mode with 8TB of FlashCache to get to that level.

Nevertheless, a million NFS throughput operations is something worth celebrating.  It’s not often one achieves a 2X improvement in performance over a previous record.  Something significant has changed here.

The age of scale-out

We have reached a point where scaling systems out can provide linear performance improvements, at least up to a point.  For example, the EMC Isilon and NetApp FAS6240 had a close to linear speed up in performance as they added nodes indicating (to me at least) there may be more there if they just throw more storage nodes at the problem.  Although maybe they saw some drop off and didn’t wish to show the world or potentially the costs became prohibitive and they had to stop someplace.   On the other hand, Avere only benchmarked their 44-node system with their current hardware (FXT 3500), they must have figured winning the crown was enough.

However, I would like to point out that throwing just any hardware at these systems doesn’t necessary increase performance.  Previously (see my CIFS vs NFS corrected post), we had shown the linear regression for NFS throughput against spindle count and although the regression coefficient was good (~R**2 of 0.82), it wasn’t perfect. And of course we eliminated any SSDs from that prior analysis. (Probably should consider eliminating any system with more than a TB of DRAM as well – but this was before the 44-node Avere result was out).

Speaking of disk drives, the FAS6240 system nodes had 72-450GB 15Krpm disks, the Isilon nodes had 24-300GB 10Krpm disks and each Avere node had 15-600GB 7.2Krpm SAS disks.  However the Avere system also had a 4-Solaris ZFS file storage systems behind it each of which had another 22-3TB (7.2Krpm, I think) disks.  Given all that, the 16-node NetApp system, 140-node Isilon and the 44-node Avere systems had a total of 1152, 3360 and 748 disk drives respectively.   Of course, this doesn’t count the system disks for the Isilon and Avere systems nor any of the SSDs or FlashCache in the various configurations.

I would say with this round of SPECsfs2008 benchmarks scale-out NAS systems have come out.  It’s too bad that both NetApp and Avere didn’t release comparable CIFS benchmark results which would have helped in my perennial discussion on CIFS vs. NFS.

But there’s always next time.


The full SPECsfs2008 performance report went out to our newsletter subscriber’s last December.  A copy of the full report will be up on the dispatches page of our site sometime later this month (if all goes well). However, you can see our full SPECsfs2008 performance analysis now and subscribe to our free monthly newsletter to receive future reports directly by just sending us an email or using the signup form above right.

For a more extensive discussion of file and NAS storage performance covering top 30 SPECsfs2008 results and NAS storage system features and functionality, please consider purchasing our NAS Buying Guide available from SCI’s website.

As always, we welcome any suggestions on how to improve our analysis of SPECsfs2008 results or any of our other storage system performance discussions.


EMCWorld day 3 …

Sometime this week EMC announced a new generation of Isilon NearLine storage which now includes HGST 3TB SATA disk drives.  With the new capacity the multi-node (144) Isilon cluster using the 108NL nodes can support 15PB of file data in a single file system.

Some of the booths along the walk to the solutions pavilion highlight EMC innovation winners. Two that caught my interest included:

  • Constellation computing – not quite sure how to define this but it’s distributed computing along with distributed data creation.  The intent is to move the data processing to the source of the data creation and keep the data there.  This might be very useful for applications that have many data sources and where data processing capabilities can be moved out to the nodes where the data was created. Seems highly scaleable but may depend on the ability to carve up the processing to work on the local data. I can see where compression, encryption, indexing and some statistical summarization can be done at the data creation site before it’s sent elsewhere. Sort of like both a sensor mesh with a processing nodes attached to the sensors configured as a sensor-proccessing grid.  Only one thing concerned me, there didn’t seem to be any central repository or control to this computing environment.  Probably what they intended, as the distributed solution is more adaptable and more scaleable than a centrally controlled environment.
  • Developing world healthcare cloud – seemed to be all about delivering healthcare to the bottom of the pyramid.  They won EMC’s social innovation award and are working with a group in Rwanda to try to provide better healthcare to remote villages.  It’s built around OpenMRS as a backend medical record archive hosted on EMC DC powered Iomega NAS storage and uses Google’s OpenDataKit to work with the data on mobile and laptop devices.  They showed a mobile phone which could be used to create, record and retrieve healthcare information (OpenMRS records) remotely and upload it sometime later when in range of a cell tower.  The solution also supports the download of a portion of the medical center’s health record database (e.g., a “cohort” slice, think a village’s healthcare records) onto a laptop, usable offline by a healthcare provider to update and record  patient health changes onsite and remotely.  Pulling all the technology together and delivering this as an application stack usable on mobile and laptop devices with minimal IT sophistication, storage and remote/mobile access are where the challenges lie.

Went to Sanjay’s (EMC’s CIO) keynote on EMC IT’s journey to IT-as-a-Service. As you can imagine it makes extensive use of VMware’s vSphere, vCloud, and vShield capabilities primarily in a private cloud infrastructure but they seem agnostic to a build-it or buy-it approach. EMC is about 75% virtualized today, and are starting to see significant and tangible OpEx and energy savings. They designed their North Carolina data center around the vCloud architecture and now are offering business users self service portals to provision VMs and business services…

Only caught the first section of BJ’s (President of BRS) keynote but he said recent analyst data (think IDC?) said that EMC was the overall leader (>64% market share) in purpose built backup appliances (Data Domain, Disk Library, Avamar data stores, etc.).  Too bad I had to step out but he looked like he was on a roll.


EMCWorld day 2

Day 2 saw releases for new VMAX  and VPLEX capabilities hinted at yesterday in Joe’s keynote. Namely,

VMAX announcements

VMAX now supports

  • Native FCoE with 10GbE support now VMAX supports directly FCoE, 10GbE iSCSI and SRDF
  • Enhanced Federated Live Migration supports other multi-pathing software, specifically it now adds MPIO to PowerPath and soon to come more multi-pathing solutions
  • Support for RSA’s external key management (RSA DPM) for their internal VMAX data security/encryption capability.

It was mentioned more than once that the latest Enginuity release 5875 is being adopted at almost 4x the rate of the prior generation code.  The latest release came out earlier this year and provided a number of key enhancements to VMAX capabilities not the least of which was sub-LUN migration across up to 3 storage tiers called FAST VP.

Another item of interest was that FAST VP was driving a lot of flash sales.  It seems its leading to another level of flash adoption. According to EMC they feel that almost 80-90% of customers can get by with 3% of their capacity in flash and still gain all the benefits of flash performance at significantly less cost.

VPLEX announcements

VPLEX announcements included:

  • VPLEX Geo – a new asynchronous VPLEX cluster-to-cluster communications methodology which can have the alternate active VPLEX cluster up to 50msec latency away
  • VPLEX Witness –  a virtual machine which provides adjudication between the two VPLEX clusters just in case the two clusters had some sort of communications breakdown.  Witness can run anywhere with access to both VPLEX clusters and is intended to be outside the two fault domains where the VPLEX clusters reside.
  • VPLEX new hardware – using the latest Intel microprocessors,
  • VPLEX now supports NetApp ALUA storage – the latest generation of NetApp storage.
  • VPLEX now supports thin-to-thin volume migration- previously VPLEX had to re-inflate thinly provisioned volumes but with this release there is no need to re-inflate prior to migration.


The new Geo product in conjuncton with VMware and Hyper V allows for quick migration of VMs across distances that support up to 50msec of latency.  There are some current limitations with respect to specific VMware VM migration types that can be supported but Microsoft Hyper-V Live Migration support is readily available at full 50msec latencies.  Note,  we are not talking about distance here but latency as the limiting factor to how far the VPLEX clusters can be apart.

Recall that VPLEX has three distinct use cases:

  • Infrastructure availability which proides fault tolerance for your storage and system infrastructure
  • Application and data mobility which means that applications can move from data center to data center and still access the same data/LUNs from both sites.  VPLEX maintains cache and storage coherency across the two clusters automatically.
  • Distributed data collaboration which means that data can be shared and accessed across vast distances. I have discussed this extensively in my post on Data-at-a-Distance (DaaD) post, VPLEX surfaces at EMCWorld.

Geo is the third product version for VPLEX, from VPLEX Local that supports within data center virtualization, to Vplex Metro which supports two VPLEX clusters which are up to 10msec latency away which generally is up to metropolitan wide distances apart, and Geo which moves to asynchronous cache coherence technologies. Finally coming sometime later is VPLEX Global which eliminates the restriction of two VPLEX clusters or data centers and can support 3-way or more VPLEX clusters.

Along with Geo, EMC showed some new partnerships such as with SilverPeak, Cienna and others used to reduce bandwidth requirements and cost for their Geo asynchronous solution.  Also announced and at the show were some new VPLEX partnerships with Quantum StorNext and others which addresses DaaD solutions

Other announcements today

  • Cloud tiering appliance – The new appliance is a renewed RainFinity solution which provides policy based migration to and from the cloud for unstructured data. Presumably the user identifies file aging criteria which can be used to trigger cloud migration for Atmos supported cloud storage.  Also the new appliance can support archiving file data to the Data Domain Archiver product.
  • Google enterprise search connector to VNX – Showing a Google search appliance (GSA) to index VNX stored data. Thus bringing enterprise class and scaleable search capabilities for VNX storage.

A bunch of other announcements today at EMCWorld but these seemed most important to me.


VMware disaster recovery

Thunderstorms over Alexandria, VA by mehul.antani (cc) (from Flickr)
Thunderstorms over Alexandria, VA by mehul.antani (cc) (from Flickr)

I did an article awhile ago for TechTarget on Virtual (machine) Disaster Recovery and discussed what was then the latest version of VMware Site Recovery Manager (SRM) v1.0 and some of it’s capabilities.

Well its been a couple of years since that came out and I thought it would be an appropriate time to discuss some updates to that product and other facilities that bear on virtual machine disaster recovery of today.

SRM to the rescue

Recall that VMware’s SRM is essentially a run book automation tool for system failover.  Using SRM, an administrator defines the physical and logical mapping between a primary site configuration of (protected site in SRM parlance) virtual machines, networking, and data stores and a secondary site (recovery site to SRM) configuration.

Once this mapping is complete, the administrator then creates recovery scripts (recovery plans to SRM) which take the recovery site in a step-by-step fashion from an “inactive” to an “active” state.  With the recovery scripts in hand, data replication can then be activated and monitoring (using storage replication adaptors, SRAs to SRM) can begin.  Once all that was ready and operating, SRM can provide one button failover to the recovery site.

SRM v4.1 supports the following:

  • NFS data stores can now be protected as well as iSCSI and FC LUN data stores.  Recall that a VMFS  (essentially a virtual machine device or drive letter) or a VM data store can be hosted on LUNs or as NFS files.  NFS data stores have recently become more popular with the proliferation of virtual machines under vSphere 4.1.
  • Raw device mode (RDM) LUNs can now be protected. Recall that RDM is another way to access devices directly for performance sensitive VMs eliminating the need to use a data store and  hyper-visor IO overhead.
  • Shared recovery sites are now supported. As such, one recovery site can now support multiple protected sites.  In this way a single secondary site can support failover from multiple primary sites.
  • Role based access security is now supported for recovery scripts and other SRM administration activities. In this way fine grained security roles can be defined that allow protection over unauthorized use of SRM capabilities.
  • Recovery site alerting is now supported. SRM now fully monitors recovery site activity and can report on and alert operations staff when problems occur which may impact failover to the recovery site.
  • SRM test and actual failover can now be initiated and monitored directly from vCenter serve. This provides the vCenter administrator significant control over SRM activities.
  • SRM automated testing can now use storage snapshots.  One advantage of SRM is the ability to automate DR testing which can be done onsite using local equipment. Snapshots eliminates the need for storage replication in local DR tests.

There were many other minor enhancements to SRM since v1.0 but these seem the major ones to me.

The only things lacking seem to be some form of automated failback and three way failover.  I’ll talk about 3-way failover later.

But without automated failback, the site administrator must reconfigure the two sites and reverse the designation of protected and recovery sites, re-mirror the data in the opposite direction and recreate recovery scripts to automate bringing the primary site back up.

However, failback is likely not to be as time sensitive as failover and could very well be a scheduled activity, taking place over a much longer time period. This can, of course all be handled automatically by SRM or be done in a more manual fashion.

Other DR capabilities

At last year’s EMCWorld VPLEX was announced which provided for a federation of data centers or as I called it at the time Data-at-a-Distance (DaaD).  DaaD together with VMware’s Vmotion could provide a level of  disaster avoidance (see my post on VPLEX surfaces at EMCWorld) previously unattainable.

No doubt cluster services from Microsoft Cluster Server (MSCS), Symantec Veritas Cluster Services (VCS)  and others have also been updated.  In some (mainframe) cluster services, N-way or cascaded failover is starting to be supported.  For example, a 3 way DR scenario has a primary site synchronously replicated to a secondary site which is asynchronously replicated to a third site.  If the region where the primary and secondary site is impacted by a disaster, the tertiary site can be brought online. Such capabilities are not yet available for virtual machine DR but it’s only a matter of time.


Disaster recovery technologies are not standing still and VMware SRM is no exception. I am sure a couple of years from now SRM will be even more capable and other storage vendors will provide DaaD capabilities to rival VPLEX.   What the cluster services folks will be doing by that time I can’t even imagine.



Services and products, a match made in heaven

wrench rust by HVargas (cc) (from Flickr)
wrench rust by HVargas (cc) (from Flickr)

In all the hoopla about company’s increasing services revenues what seems to be missing is that hardware and software sales automatically drive lots of services revenues.

A recent Wikibon post by Doug Chandler (see Can cloud pull services and technology together …) showed a chart of leading IT companies percent of revenue from services.  The percentages ranged from a high of 57% for  IBM to a low of 12% for Dell, with the median being ~26.5%.

In the beginning, …

It seems to me that services started out being an adjunct to hardware and software sales – i.e., maintenance, help to install the product, provide operational support, etc. Over time, companies like IBM and others went after service offerings as a separate distinct business activity, outside of normal HW and SW sales cycles.

This turned out to be a great revenue booster, and practically turned IBM around in the 90s.   However, one problem with hardware and software vendors reporting of service revenue is that they also embed break-fix, maintenance and infrastructure revenue streams in these line items.

The Wikibon blog mentioned StorageTek’s great service revenue business when Sun purchased them.  I recall that at the time, this was primarily driven by break-fix, maintenance and infrastructure revenues and not mainly from other non-product related revenues.

Certainly companies like EDS (now with HP), Perot Systems (now with Dell), and other pure service companies generate all their revenue from services not associated with selling HW or SW.  Which is probably why HP and Dell purchased them.

The challenge for analysts is to try to extract the more ongoing maintenance, break-fix and infrastructure revenues from other service activity in order to understand how to delineate portions of service revenue growth:

  • IBM seems to break out their GBS (consulting and application mgnt) from their GTS (outsourcing, infrastructure, and maint) revenues (see IBM’s 10k).  However extracting break-fix and maintenance revenues from the other GTS revenues is impossible outside IBM.
  • EMC has no breakdown whatsoever in their services revenue line item in their 10K.
  • HP similarly, has no breakdown for their service revenues in their 10K.

Some of this may be discussed in financial analyst calls, but I could locate nothing but the above in their annual reports/10Ks.

IBM and Dell to the rescue

So we are all left to wonder how much of reported services revenue is ongoing maintenance and infrastructure business versus other services business.  Certainly IBM, in reporting both GBS and GTS gives us some inkling of what this might be in their annual report: GBS is $18B and GTS is $38B. So that means maint and break-fix must be some portion of that GTS line item.

Perhaps we could use Dell as a proxy to determine break-fix, maintenance and infrastructure service revenues. Not sure where Wikibon got the reported service revenue % for Dell but their most recent 10K shows services are more like 19% of annual revenues.

Dell had a note in their “Results from operations” section that said Perot systems was 7% of this.  Which means previous services, primarily break-fix, maintenance and other infrastructure support revenues accounted for something like 12% (maybe this is what Wikibon is reporting).

Unclear how well Dell revenue percentages are representative of the rest of the IT industry but if we take their ~12% of revenues off the percentages reported by Wikibon then the new ranges are from 45% for IBM to 7% for Dell with an median around 14.5% for non-break fix, maintenance and infrastructure service revenues.

Why is this important?

Break-fix, maintenance revenues and most infrastructure revenues are entirely associated with product (HW or SW) sales, representing an annuity once original product sales close.  The remaining service revenues are special purpose contracts (which may last years), much of which are sold on a project basis representing non-recurring revenue streams.


So the next time some company tells you their service revenues are up 25% YoY, ask them how much of this is due to break-fix and maintenance.  This may tell you whether their product footprint expansion or their service offerings success is driving service revenue growth.


Real-time data analytics from customer interactions

the ghosts in the machine by MelvinSchlubman (cc) (From Flickr)
the ghosts in the machine by MelvinSchlubman (cc) (From Flickr)

At a recent EMC product launch in New York, there was a customer question and answer session for industry analysts with four of EMC’s leading edge customers. One customer, Marco Pacelli, was the CEO of ClickFox, a company providing real-time data analytics to retailers, telecoms, banks and other high transaction volume companies.

Interactions vs. transactions

Marco was very interesting, mostly because at first I didn’t understand what his company was doing or how they were doing it.  He made the statement that for every transaction (customer activity that generates revenue) companies encounter (and their are millions of them), there can be literally 10 to a 100 distinct customer interactions.  And it’s the information in these interactions which can most help companies maximize transaction revenue, volume and/or throughput.

Tracking and tracing through all these interactions in real-time, to try to make sense of the customer interaction sphere is a new and emerging discipline.  Apparently, ClickFox makes extensive use of GreenPlum, one of EMC’s recent acquisitions to do all this but I was more interested in what they were trying to achieve than the products used to accomplish this.

Banking interactions

For example, it seems that the websites, bank tellers, ATM machines and myriad of other devices one uses to interact with a bank are all capable of recording any interaction or actions we perform. What ClickFox seems to do is to track customer interactions across all these mechanisms to trace what transpired that led to any transaction, and determines how it can be done better. The fact that most banking interactions are authenticated to one account, regardless of origin, makes tracking interactions across all facets of customer activity possible.

By doing this, ClickFox can tell companies how to generate more transactions, faster.  If a bank can somehow change their interactions with a customer across websites, bank tellers, ATM machines, phone banking and any other touchpoint, so that more transactions can be done with less trouble, it can be worth lots of money.

How all that data is aggregated and sent offsite or processed onsite is yet another side to this problem but ClickFox is able to do all this with the help of GreenPlum database appliances.  Moreover, ClickFox can host interaction data and perform analytics at their own secure site(s) or perform their analysis on customer premises depending on company preference.


Marco’s closing comments were something like the days of offloading information to a data warehouse, asking a question and waiting weeks for an answer are over, the time when a company can optimize their customer interactions by using data just gathered, across every touchpoint they support, are upon us.

How all this works for non-authenticated interactions was another mystery to me.  Marco indicated in later discussions that it was possible to identify patterns of behavior that led to transactions and that this could be used instead to help trace customer interactions across company touchpoints for similar types of analyses!?  Sounds like AI on top of database machines…