At the Pacific Crest conference this week there was some lively discussion about the differences in the rates of data growth. Some believe that object storage is growing much faster than structured and unstructured data. For proof they point to the growth in Amazon S3 data objects and Microsoft Azure data objects.
- Azure data objects quadrupled between June 2011 and June 2012 from 0.93T to over 4.0T objects. Recently at the Microsoft Build Conference they indicated they are now storing over 8T objects which is doubling every six months. (See here and here).
- Amazon S3 has also been growing, in June of 2012 they had over 1T objects and in April of 2013 they were storing over 2T objects. (See here).
For comparison purposes an Amazon S3 object is not equivalent in size to an Azure data object. I believe Amazon S3 objects are significantly larger (10 to 1000X larger) than an Azure data object (but I have no proof for this statement).
Nonetheless, Azure and S3 object storage growth rates are going off the charts.
Comparing object storage growth to structured-unstructured data growth
How does the growth in objects compare to the growth in structured and unstructured storage. Most analysts claim that data is growing by 40-50% per year. And most of that is unstructured. However I would contend that when you dig deeper into unstructured aggregate, you find vastly different growth trajectories.
Historically, unstructured used to mean file data as well as object data, and it’s only recently that anyone considered tracking them differently. But if you start splitting out object data from the aggregate how fast is file data growing.
The key is file data growth
Latest IDC numbers tell us that NAS market revenue is declining while open-SAN (NAS and non-mainframe SAN) revenues were up slightly for 2Q2013 (See here for more information). Realize that revenue numbers aren’t necessarily equal to data growth and NAS doesn’t contain unified storage (NAS and SAN) combined (which is how most enterprise vendors sell file storage these days). The other consideration is that flash’s performance is potentially reducing storage overprovisioning and data reduction technologies (dedupe, compression, thin provisioning, etc.) are increasing capacity utilization which is driving down storage growth.
The other thing is that the amount of data in structured and unstructured forms is probably orders of magnitude larger than object data.
So objects storage is starting at much lower capacities. But Amazon S3 and Azure data objects are also only a part of the object storage space. Most pure object storage solutions only reach their stride at 1PB and or larger and may grow significantly from there.
Given all the foregoing what’s my take on the various growth rates of structured, unstructured and object storage, when in aggregate data is growing by 40-50% per year?
Assuming a baseline of 50% data growth rate, my best guess (and that’s all it is) is that,
- Structured data growth accounts for 15% of overall data growth
- Unstructured data growth accounts for 25% of overall data growth
- Object storage accounts for 10% of overall data growth
You could easily convince me that object storage is more like 5% today and divide the remainder across structured and unstructured.
So how much data is this?
IDC claimed that the world created and replicated 2.8ZB of data in 2012 and predict 4ZB of data will be created/replicated in 2013 (~43% growth rate). So of the 1.2ZB of data created in 2013, ~0.36ZB of that will be structured, 0.6ZB will be unstructured-file data and 0.24ZB will be unstructured-object storage data.
At first blush, the object storage component looks much too large until you start thinking about all the media, satellite and mobile data being created these days. And then it seems about right to me.
What do you think?