5 killer apps for $0.10/TB/year

iblioteca José Vasconcelos / Vasconcelos Library by * CliNKer * (from flickr) (cc)
iblioteca José Vasconcelos / Vasconcelos Library by * CliNKer * (from flickr) (cc)

Cloud storage keeps getting more viable and I see storage pricing going down considerably over time.  All of which got me thinking what could be done with a dime per TB per year storage ($0.10/TB/yr).  Now most cloud providers charge 10 cents or more per GB per month so this is at least 12,000 times less expensive but it’s inevitable at some point in time.

So here are my 5 killer apps for $0.10/TB/yr cloud storage:

  1. Photo record of life – something akin to glasses which would record a wide angle, high mega-pixel video record of everything I looked at, for every second of my waking life.  I think at a photo shot every second for 12hrs/day 365days/yr would be about ~16M photos and at 4 MB per photo this would be about ~64TB per person year.  For my 4 person family this would cost ~$26/year for each year of family life and for a 40 year family time span, the last payment for this would be ~$1040 or an average payment of $520/year.
  2. Audio recording of life – something akin to a always on bluetooth headset which would record an audio feed to go with the semi-video or photo record above.  By being an always on bluetooth headset it would automatically catch cell phone as well as  spoken conversations but it would need to plug to landlines as well.  As discussed in my YB by 2015 archive post, one minute of MP3 audio recording takes up roughly a MB of storage.  Lets say I converse with someone ~33% of my waking day.  So this would be about 4 hrs of MP3 audio/day 365days/yr or about 21TB per year per person.  For my family this would cost or ~$8.40/year for storage and for a 40 year family life span my last payment would be ~$336 or an average of $168/yr.
  3. Home security cameras – with ethernet based security cameras, it wouldn’t be hard to record a 360 degree outside as well as inside points of entry coverage video.  The quantities for the photo record of my life would suffice for here as well but one doesn’t need to retain the data for a whole year perhaps a rolling 30 day record would suffice but it would be recorded for 24 hours. Assuming 8 cameras outside and inside,  this could be stored in about 10TB of storage per camera, or  about 80TB of storage or $8/year but would not increase over time.
  4. No more deletes/version everything – if storage were cheap enough we would never delete data.  Normal data change activity is in the 5 to 10% per week rate, but this does not account for duplicating deleted data.  So let’s say we would need to store an additional 20% of your primary/active data per week for deleted data.  For a 1TB primary storage working set, a ~20% deletion rate per week would be 10TB of deleted data per year per person and for my family ~$4/yr and my last yearly payment would be ~$160.  If we were to factor in data growth rates of ~20%/year, this would go up substantially averaging ~$7.3k/yr over 40 years.
  5. Customized search engines – if storage AND bandwidth were cheap enough it would be nice to have my own customized search engine. Such a capability would follow all my web clicks, spawning a search spider for every website I traverse and provide customized “deep” searching for every web page I view.   Such an index might take 50% of the size of a page and on average my old website used ~18KB per page, so at 50% this index would require 9KB. Assuming, I look at ~250 web pages per business day of which maybe ~170 are unique and each unique page probably links to 2 more unique pages, which links to two more, which links to two more, … If we go 10 pages deep, then for 170 pages viewed, an average branching factor of 2,  we would need to index ~174K pages/day and for a year, this would represent about represent about 0.6TB of page index.  For my household, a customized search engine would cost  ~$0.25 of additional storage per year and for 40 years my last payment would be $10.

I struggled with coming with ideas that would cost between $10 and $500 a year as every other storage use came out significantly less than $1/year for a family of four.  This seems to say that there might be plenty of applications in the range of under a $10 per TB per year, still 1200X current cloud storage costs.

Any other applications out there that could take  advantage of a dime/TB/year?

What if there were no backup?

Data Center by Mathieu Ramage (Flickr)
Data Center by Mathieu Ramage (Flickr)
If backup didn’t exist and you had to start over to protect your data how would you do it today?

I think four things are important to protect data in today’s data center:

  • Any data ever created in the data center or on-the-road needs to be protected,
  • Data restores must be under end-user control,
  • Data needs to be copied/replicated/mirrored offsite to support disaster recovery,
  • Multiple data copies should exist only to satisfy some data protection policy – one copy is mandatory, two copies (not co-located) would be required to support higher availability, and
  • Data protection activities should not interfere with or interrupt ongoing data center operations

All this can and is being done with backup and other systems today but most of these products and features grew out of earlier phases of computing. With today’s technology many of these capabilities may no longer be necessary today if one could just rethink data protection from the ground up.

Data Versioning

I think some form of data/file/block easily versioning could easily support the requirement of restoring any data ever created. Versioning systems have existed in the past and could certainly be re-constituted today with some sort of standards. The cost of storing all that data might be a concern but storage costs continue to decrease and if multiple copies retained for data protection can be eliminated, it might just be a wash. Versioning could just as easily be provided for the labtop and once new versions of data are created old versions could be moved off the laptop to the data center for safekeeping and to free up space.

End-user visiblility

End-user restoration requires some facility to explore the end-users data protection file-name and block space. Once this is available, identifying which version needs to be restored and where to restore it should be straightforward. All backup applications provide a backup directory and a few even allow end-user access to perform data restores. While all this works well with files, having an end-user do this for block storage would require more sophistication. Nonetheless, both file and block restores seems entirely feasible once data versioning is in place.

Ubiquitous replication

The requirement to have data copies offsite is certainly feasible today. Replication can be done in hardware or software today, synchronously, semi-synchronously, and/or asynchronously. Replication today can solve this problem but replicating to separate data centers cost too much. Enter the storage cloud. With the storage cloud we could pay just for the data bandwidth and storage to support our data protection needs and no more. Old data versions could be replicated as new versions are created. Protecting data written to a new version is more problematic but some sort of write splitter (ala CDP) could be used to create a replica of this data as well.

Policy driven

Having a policy driven data protection system that only stores a minimal number of copies of data seems to be difficult to support. Yet, this seems to be what incremental-only backup software and archive products support today. For other backup software, if one uses a deduplicating VTL this can be very similar. Adding some policy sophistication to coordinate multiple data protection copies across multiple (potentially Cloud) nodes and deduplicating all the un-necessary copies seems entirely feasible.

Operationally transparent

Not interrupting ongoing operations also seems to be tough to crack. Yet, many storage vendors provide snapshot technologies that copy block and/or file data without interrupting operations. However, coordinating vendor snapshot technologies from some central data protection manager is an essential integration but continues to be lacking.

Can pieceparts solve the problem?

Yes, most of these features are purchasable as separate product offerings (except data versioning) but what’s missing is any one product that pulls all of this together and offers one integrated solution to data protection as I have described it.

The problem, of course, is that such functionality probably best belongs as part of the O/S or a hypervisor but they long ago relinquished any responsibility for data protection. Aside from the anti-trust and non-competitive nature of such a future data protection O/S offering, I only see isolated steps and no coordinated attack on today’s overall data protection problem.

Backup software vendors do a great job with what they have under their control, but they can’t do it all, ditto for VTL providers, CDP vendors, replication products, etc. Piecemeal solutions can only take us so far down this path but it’s all we have today and I fear for the forseeable future.

Dream time over for now, gotta backup some data…