More on data growth from NetApp analyst days customers

Installing a power line at the Tram. Pat, Allan and Chris by bossco (cc) (from Flickr)
Installing a power line at the Tram. Pat, Allan and Chris by bossco (cc) (from Flickr)

Some customers at NetApp’s Analyst Days were discussing deployments of NetApp storage with Dave Hitz the new storage efficiency czar and others but I was more interested in their comments on storage growth issues. Jonathan Bartes of Virginia Farm Bureau mentioned the “natural growth rate of unstructured data” seemed to be about 20% per year, but some of the other customers had even higher growth rates.

Tucson Electric Power

Christopher Jeffry Rima from Tucson Electric Power is dealing with 70% CAGR in data growth per year. What’s driving this is primarily regulations (Power companies are heavily regulated utilities in USA), high resolution imagery/GIS data and power management/smart metering. It turns out imagery has increased resolution by about 10X in a matter of years and they use such images as work plan overlays for field work to fix, upgrade or retire equipment. It seems they have hi-res images of all the power equipment and lines in their jurisdiction which are updated periodically via fly overs.

The other thing that’s driving their data growth is smart metering and demand power management. I have talked about smart metering data appetite before. But demand management was new to me.

Rima said that demand management is similar to smart metering but adds a real time modeling of  demand and capacity and bi-directional transmissions to request consumers to shed demand when required. Smart meters and real time generation data feeds the load management model used to predict peak demand over the next time period which is then used to determine whether to shed demand or not.   It turns out that at ~60% utilization the power grid is much more cost effective than at 80% due the need to turn on gas generators which cost more than coal. In any case, when their prediction model shows utilization will top ~60-70% they start shunting load.


Another customer, Neil Clover from Arup (a construction/engineering firm) started talking about 3D building/site modeling and fire simulation flow dynamics modeling. Clover lamented that it’s not unusual to have a TB of data show up out of nowhere for a project they just took on.

incendio en el edificio 04 by donrenexito (cc) (from Flickr)
incendio en el edificio 04 by donrenexito (cc) (from Flickr)

Clover said the fire flow modeling’s increasing resolution and multiple iterations under varying conditions were generating lots of data. The 3D models are also causing serious data growth and need to be maintained across the design, build, operate cycle of buildings.  TB of data showing up on your data center storage with no advance notice – incredible.  All this and more is causing Clover’s data growth to average around 70% per year.

University Hospitals Leuven, Belgium

The day before at the analyst meeting Reinoud Reynders from the University Hospital Leuven, Belgium mentioned some key drivers of data growth at their hospital as digital pathology studies that generate about 100GB each but which they do about 100 times a day and DNA studies that generate about 1TB of data each and take about a week to create.  This seems higher than I predicted, almost 16X higher.  However, Reynders said the DNA studies are still pretty expensive at $15K USD each but he forecasts costs decreasing drasmatically over the coming years and a commensurate volume increase.

But the more critical current issue might be the digital pathology exams at ~10TB per day.  The saving grace for pathology exams is that such studies can be archived when completed rather than kept online. Reynders also mentioned that digital radiology and imaging studies are also creating massive amounts of data but unfortunately this data must be kept online because they are re-referenced often and has no predictability about it.

While data growth was an understated concern during much of the conference sessions, how customers dealt with such (ab?)normal growth by using NetApp storage and Ontap functionality was the main topic of their presentations.  Explanation on this NetApp functionality and how effective they were at managing data growth will need to await another day.