I read a news release the other day about a new space discovery called a “micro-quasar”. The scientists were awaiting analysis of a VLBI (very large baseline inferometery) study involving 20 telescopes to confirm their analysis. It was said that analyzing that much data takes a while. So how much data is this?
Another paper in IEEE Spectrum described an Earth sized telescope using e-VLBI test done in May of 2008. At that time, each of 7 radio telescopes around the world fed their observations into one supercomputer which analyzed the data. The article stated that each telescope delivered 1Gbs of data and that the supercomputer could analyze up to 100TB of data per observation.
- A single observation has all telescopes observe one point for a 24 hour period
- As the world turns that point will become visible (and non-visible) to 1/2 the telescopes situated around the globe.
Consequently with 16 telescopes, the combined array should generate ~70TB per day or per observation. With 20 telescopes that should be closer to ~86TB. If this network of 20 telescopes can be kept busy 1/2 the time it should generate around 15.7PB of data a year.
Prior to e-VLBI, observation data was sent via tapes or other magnetic media to a central repository and took weeks to gather data for one observation. However with the new e-VLBI system all this can now be done in real time. According to the paper getting the data to the supercomputer was a substantial undertaking and required multiple “network providers”. However, storing this data was not discussed.
Storing 15.7PB of data a year must not be much of a technological problem anymore. We’ve previously written on the 1.5PB from CERN, and 7.7PB from smart metering so another 15.7PB/year doesn’t seem that out of place. I am beginning to think that exabyte-a-day post was conservative and that today’s data deluge is larger than that. Can YB of data be that far away?