An Exabyte-a-day

snp microarray data by mararie (cc) (from flickr)
snp microarray data by mararie (cc) (from flickr)

At HPTechDay this week Jim Pownell, office of CTO, HP StorageWorks Division, reported on an IDC study that said this year the world is creating about an Exabyte of data each day.  An Exabyte (XB) is 10**18 bytes or 1000 PB of data.  Seems a bit high from my perspective.

Data creation by individuals

Population Growth and Income Level Chart by mattlemmon (cc) (from flickr)
Population Growth and Income Level Chart by mattlemmon (cc) (from flickr)

The US Census bureau estimates todays worldwide population at around 6.8 Billion people. Given that estimate, the XB/day number says that the average person is creating about 150MB/day.

Now I don’t know about you but we probably create that much data during our best week. That being said our family average over the last 3.5 years is more like 30.1MB/day. This average, over the last year, has been closer to 75.1MB/day (darn new digital camera).

If I take our 75.1 MB/day as a reasonable approximate average for our family and with 2 adults in our family, this would say each adult creates ~37.6MB of data per day.

Probably about 50% of todays world wide population probably has no access to create any data whatsoever. Of the remaining 50%, maybe 33% is at an age where data creation is insignificant. All this leaves about 2.3B people actively creating data at around 37.6MB/day. This would account for about 86.5PB of data creation a day.

Naturally, I would consider myself a power data creator but

  • We are not doing much with video production which takes creates gobs of data.
  • Also, my wife retains camera rights and I only take the occasional photo with my cell phone. So I wouldn’t say we are heavy into photography.

Nonetheless, 37.6MB/day on average seems exceptionally high, even for us.

Data creation by companies

However, that XB a day also accounts for corporate data generation as well as individuals. Hoovers, a US corporate database lists about 33M companies worldwide. These are probably the biggest 33M and no doubt creating lot’s of data each day.

Given the above that individuals probably account for 86.5PB/day, that leaves about ~913.5PB/day for the Hoover’s DB of 33M companies to create. By my calculations this would say each of these companies is generating about ~27.6GB/day. No doubt there are plenty of companies out there doing this each day but the average company generates 27.6GB a day?? I don’t think so.

Ok, my count of companies could be wildly off. Perhaps the 33M companies in Hoover’s DB represent only the top 20% of companies worldwide, which means that maybe there are another 132M smaller companies out there totaling 165M companies. Now the 913.5PB/day says the average company generates ~5.5GB/day. This still seems high to me, especially considering this is an average of all 165M companies world wide.

Most analysts predict data creation is growing by over 100% per year, so that XB/day number for this year will be 2XB/day next year.

Of course I have been looking at a new HD video camera for my birthday…

Sony_HDR-TG5V_Vanity350
Sony_HDR-TG5V_Vanity350

5 thoughts on “An Exabyte-a-day

  1. Using your family as an example is probably a bit flawed. As is HP’s language: I don’t think anyone’s creating megabytes of content a day, but we’re probably copying and storing many megabytes of content a day that you didn’t look at.

    Let’s look at it from another angle. My logging server at work (for a department at a major university) collects a couple dozen GB of data every day. We receive a couple GB of email, and store-analyze-and-discard a lot more than that in spam. I generally download an album or a movie once a week — that’s another 100 MB for an album, or several GB for a movie. And I’m just one person, and definitely not a power user at home.

    A better way to say it might be that ‘companies and consumers together need to purchase more storage or delete existing information at a rate of an exabyte a day worldwide.’ You need to account for automatically created content like spam, logging, and remotely-stored copies of emails, movies, and other entertainment when you’re talking the way HP is talking about data creation.

    Karl,

    Yes you are right. We aren’t actually creating an exabyte a day but we are all storing an exabyte a day in total. Nonetheless, the idea is right, we are having to store an exabyte a day in total. However, as you say not all of this data is new/unique data, much of it is created elsewhere and shows up on our storage as media or email from elsewhere. I hadn’t talked about that but that probably represents the bulk of the data being created (stored) on my families computer storage systems.
    Ray

  2. My company that has about 100,000 employees regularly generates about 10MB/employee/day = 1TB/day just in email. (Yes, there are ways to compress this, and I wish we would, but its still that amount per employee.) Just the data about our IP traffic (not content) is 1 TB/day. There are many companies larger than mine.

    Think also governmental agencies, that I just bet generate the highest amount of data with billions of sensors (water, nuclear, weather, etc) reporting data on second if not millisecond basis. Also, financial data (all stock indices) is generated and stored on a second, for financial analysis. So, it’s not only what individuals produce/consume but what all the automated instruments, computer servers, etc. produce. I can easily believe this number.

    And, according to http://www.itnews.com.au/News/156033,telescope-network-to-crunch-an-exabyte-of-data-a-day.aspx, just this one telescope will be producing 1XB/day by 2012.

    Donald,

    I hadn’t considered some of the scientific enterprises. CERN is said to be creating 15PB of data each year during operations and that’s just one example. You are quite right in saying that some of these experiments, in the years ahead, will be generating much more than that. And by then an XB/day will not be irrational. Although, it’s still pretty hard for me to get a handle on that much data from one experiment…
    Ray

  3. HI THERE WOW GUSS WHAT I HAVE A HP NETBOOK 110 WITH A 32 GB SSD SOLID STATE FLASH DRIVE SOO IF IT CHURNES OUT 27.5 GB PER DAY WITH OPERATING SYSTEMS AND FORMAT THAT LEVES ME WITH 28 GB OR SO FILLED UP WITH IN 24 HOURS OR 1 DAY TOO BE EXACT

    James,
    That’s interesting but a better question is what do you do after that.
    Ray

Comments are closed.