A couple of years back I was talking with a storage person from PG&E and he was concerned about the storage performance aspects of installing smart meters in California. I saw a website devoted to another electric company in California installing 1.4M smart meters that send information every 15min to the electric company. Given that this must be only some small portion of California this represents ~134M electricity recording transactions per day and seems entirely doable. But even at only 128 bytes per transaction, ~17GB a day of electric metering data is ingested for this company’s service area. Naturally, this power company wants to extend smart metering to gas usage as well which should not quite double the data load.
According to US census data there were ~129M households in 2008. At that same 15 minute interval, smart metering for the whole US would generate 12B transactions a day and at 128 bytes per transaction, would represent ~ 1.5TB/day. Of course thats only households and only electricity usage.
That same census website indicates there were 7.7M businesses in the US in 2007. To smart meter these businesses at the same interval would take an additional ~740M transactions a day or ~95GB of data. But fifteen minute intervals may be too long for some companies (and their power suppliers), so maybe it should be dropped to every minute for businesses. At one minute intervals, businesses would add 1.4TB of electricity metering data to the household 1.5TB data or a total of ~3TB of data/day.
Storage multiplication tables:
- That 3TB of day must be backed up so that’s at least another 3TB of day of backup load (deduplication notwithstanding).
- That 3TB of data must be processed offline as well as online, so that’s another 3TB a day of data copies.
- That 3TB of data is probably considered part of the power company’s critical infrastructure and as such, must be mirrored to some other data center which is another 3TB a day of mirrored data.
So with this relatively “small” base data load of 3TB a day we are creating an additional 9TB/day of copies. Over the course of a year this 12TB/day generates ~4.4PB of data. A study done by StorageTek in the late ’90s showed that on average data was copied 6 times, so the 3 copies above may be conservative. If the study results held true today for metering data, it would generate ~7.7PB/year.
To paraphrase Senator E. Dirksen – a petabyte here, a petabyte there and pretty soon your talking real storage.
In prior posts we discussed the 1.5PB of data generated by CERN each year, the expectations for the world to generate an exabyte (XB) a day of data in 2009 and NSA’s need to capture and analyze a yottabyte (YB) a year of voice data by 2015. Here we show how another 4-8PB of storage could be created each year just by rolling out smart electricity metering to US businesses and homes.
As more and more aspects of home and business become digitized more data is created each day and it all must be stored someplace – data storage. Other technology arenas may also benefit from this digitization of life, leisure, and economy but today we would contend that storage benefits most from this trend. We must defer for now discussions as to why storage benefits more than other technological domains to some future post.