(Storage QoW 15-003): SMR disks in GA enterprise storage in 12 months? Yes@.85 probability

Hard Disk by Jeff Kubina (cc) (from Flickr)
Hard Disk by Jeff Kubina (cc) (from Flickr)

(Storage QoW 15-003): Will we see SMR (shingled magnetic recording) disks in GA enterprise storage systems over the next 12 months?

Are there two vendors of SMR?

Yes, both Seagate and HGST have announced and currently shipping (?) SMR drives, HGST has a 10TB drive and Seagate has an 8TB drive on the market since last summer.

One other interesting fact is that SMR will be the common format for all future disk head technologies including HAMR, MAMR, & BPMR (see presentation).

What would storage vendors have to do to support SMR drives?

Because of the nature of SMR disks, writes overlap other tracks so they must be written, at least in part, sequentially (see our original post on Sequential only disks). Another post I did reported on recent work by Garth Gibson at CMU (Shingled Magnetic Recording disks) which showed how multiple bands or zones on an SMR disk could be used some of which could be written randomly and others which could be written sequentially but all could be read randomly. With such an approach you could have a reasonable file system on an SMR device with a metadata partition (randomly writeable) and a data partition (sequentially writeable).

In order to support SMR devices, changes have been requested for the T10 SCSI  & T13 ATA command protocols. Such changes would include:

  • SMR devices support a new write cursor for each SMR sequential band.
  • SMR devices support sequential writes within SMR sequential bands at the write cursor.
  • SMR band write cursors can be read, statused and reset to 0. SMR sequential band LBA writes only occur at the band cursor and for each LBA written, the SMR device increments the band cursor by one.
  • SMR devices can report their band map layout.

The presentation refers to multiple approaches to SMR support or SMR drive modes:

  • Restricted SMR devices – where the device will not accept any random writes, all writes occur at a band cursor, random writes are rejected by the device. But performance would be predictable. 
  • Host Aware SMR devices – where the host using the SMR devices is aware of SMR characteristics and actively manages the device using write cursors and band maps to write the most data to the device. However, the device will accept random writes and will perform them for the host. This will result in sub-optimal and non-predictable drive performance.
  • Drive managed SMR devices – where the SMR devices acts like a randomly accessed disk device but maps random writes to sequential writes internally using virtualization of the drive LBA map, not unlike SSDs do today. These devices would be backward compatible to todays disk devices, but drive performance would be bad and non-predictable.

Unclear which of these drive modes are currently shipping, but I believe Restricted SMR device modes are already available and drive manufacturers would be working on Host Aware and Drive managed to help adoption.

So assuming Restricted SMR device mode availability and prototypes of T10/T13 changes are available, then there are significant but known changes for enterprise storage systems to support SMR devices.

Nevertheless, a number of hybrid storage systems already implement Log Structured File (LSF) systems on their backends, which mostly write sequentially to backend devices, so moving to a SMR restricted device modes would be easier for these systems.

Unclear how many storage systems have such a back end, but NetApp uses it for WAFL and just about every other hybrid startup has a LSF format for their backend layout. So being conservative lets say 50% of enterprise hybrid storage vendors use LSF.

The other 60% would have more of a problem implementing SMR restricted mode devices, but it’s only a matter of time before  all will need to go that way. That is assuming they still use disks. So, we are primarily talking about hybrid storage systems.

All major storage vendors support hybrid storage and about 60% of startups support hybrid storage, so adding these to together, maybe about 75% of enterprise storage vendors have hybrid.

Using analysis on QoW 15-001, about 60% of enterprise storage vendors will probably ship new hardware versions of their systems over the next 12 months. So of the 13 likely new hardware systems over the next 12 months, 75% have hybrid solutions and 50% have LSF, or ~4.9 new hardware systems will be released over the next 12 months that are hybrid and have LSF backends already.

What are the advantages of SMR?

SMR devices will have higher storage densities and lower cost. Today disk drives are running 6-8TB and the SMR devices run 8-10TB so a 25-30% step up in storage capacity is possible with SMR devices.

New drive support has in the past been relatively easy because command sets/formats haven’t changed much over the past 7 years or so, but SMR is different and will take more effort to support. The fact that all new drives will be SMR over time gives more emphasis to get on the band wagon as soon as feasible. So, I would give a storage vendor a 80% likelihood of implementing SMR, assuming they have new systems coming out, are already hybrid and are already using LSF.

So of the ~4.9 systems that are LSF/Hybrid/being released *.8, says ~3.9 systems will introduce SMR devices over the next 12 months.

For non-LSF hybrid systems, the effort seems much harder, so I would give the likelihood of implementing SMR about a 40% chance. So of the ~8.1 systems left that will be introduced in next year, 75% are hybrid or ~6.1 systems and they have a 40% likelihood of implementing SMR so ~2.4 of these non-LSF systems will probably introduce SMR devices.

There’s one other category that we need to consider and that would be startups in stealth. These could have been designing their hybrid storage for SMR from the get go. In QoW 15-001 analysis I assumed another ~1.8 startup vendors would emerge to GA over the next 12 months. And if we assume that 0.75% of these were hybrid then there’s ~1.4 startups vendors that could be using SMR technology in their hybrid storage for a (4.9+2.4+1.4(1.8*.75)= 8.7 systems have a high probability of SMR implementation over the next 12 months in GA enterprise storage products.


So my forecast of SMR adoption by enterprise storage is Yes for .85 probability (unclear what the probability should be, but it’s highly probable).



Microsoft Exchange database backup performance – chart of the month

Microsoft Exchange 1001-5000 mailboxes, top 10 database backup per server
In last month’s Storage Intelligence newsletter we discussed the latest Exchange storage system performance for 1001 to 5000 mailboxes. One  charts we updated was the above Exchange database backup on a per server basis. The were two new submissions for this quarter, and both the Dell PowerEdge R730xd (#2 above) and the HP D3600 drive shelf with P441 storage controller (#10) ranked well on this metric.

This ESRP reported metric only measures backup throughput at a server level. However, because these two new submissions only had one server, it’s not as much of a problem here.

The Dell system had a SAS connected JBOD with 14-4TB 7200RPM disks and the HP system had a SAS connected JBOD with 11-6TB 7200RPM disks. The other major difference is that the HP system had 4GB of “flash backed write cache” and the Dell system only had 2GB of  “flash backed cache”.

As far as I can tell the fact that the Dell storage managed ~2.3GB/sec. and the HP storage only managed ~1.1GB/sec is probably mostly due to their respective drive configurations than anything else.

RAID 0 vs. RAID 1

One surprising characteristic of the HP setup is that they used RAID 0 while the Dell system used RAID1. This would offer a significant benefit to the Dell system during heavy read activity, but as I understand it, the database backup activity is run with a standard email stress environment. So in this case, there is a healthy mix of reads/writes going on at the time the backup activity. So the Dell system would have an advantage for reads and a penalty for writes (writing two copies of all data). So Dell’s RAID advantage is probably a wash.

Whether RAID 0 vs. RAID 1 would have made any difference to other ESRP metrics (database transfers per second, read/write/log access latencies, log processing, etc.) is subject for another post.

Of course,  with Exchange DAG’s there’s built in database redundancy so maybe RAID 0 is an OK configuration for some customers. Software based redundancy does seem to be Microsoft’s direction, at least since Exchange 2010, so maybe I’m the one that’s out of touch.

Still for such a small configuration I’m not sure I would have gone with RAID 0…


Facebook down to 1.08 PUE and counting for cold storage

prineville-servers-470Read a recent article in ArsTechnica about Facebook’s cold storage archive and their sustainable data centers (How Facebook puts petabytes of old cat pix on ice in the name of sustainability). In the article there was a statement that Facebook had achieved a 1.08 PUE (Power Usage Effectiveness) for one of these data centers. This means for every 100 Watts used to power up racks, Facebook needed to add 8 Watts for other overhead.

Just last year I wrote a paper for a client where I interviewed the CEO of an outsourced data center provider (DuPont Fabros Technology) whose state of the art new data centers were achieving a PUE of from 1.14 to 1.18. For Facebook to run their cold storage data centers at 1.08 PUE is even better.

At the moment, Facebook has two cold storage data centers one at Prineville, OR and the other at Forest City, NC (Forest City achieved the 1.08 PUE). The two cold data storage sites add to the other Facebook data centers that handle everything else in the Facebook universe.

MAID to the rescue

First off these are just cold storage data centers, over an EB of data, but still archive storage, racks and racks of it. How they decide something is cold or hot seems to depend on last use. For example, if a picture has been referenced recently then it’s warm, if not then it’s cold.

Second, they have taken MAID (massive array of idle disks) to a whole new data center level. That is each 1U (Knox storage tray) shelf has 30 4TB drives and a rack has 16 of these storage trays, holding 1.92PB of data. At any one time, only one drive in each storage tray is powered up at a time. The racks have dual servers and only one power shelf (due to the reduced power requirements).

They also use pre-fetch hints provided by the Facebook application to cache user data.  This means they will fetch some images ahead of time,when users areis paging through photos in stream in order to have them in cache when needed. After the user looks at or passes up a photo, it is jettisoned from cache, the next photo is pre-fetched. When the disks are no longer busy, they are powered down.

Less power conversions lower PUE

Another thing Facebook is doing is reducing the number of power conversions that need to happen to power racks. In a typical data center power comes in at 480 Volts AC,  flows through the data center UPS and then is dropped down to 208 Volts AC at the PDU which flows to the rack power supply which is then converted to 12 Volts DC.  Each conversion of electricity generally sucks up power and in the end only 85% of the energy coming in reaches the rack’s servers and storage.

In Facebooks data centers, 480 Volts AC is channeled directly to the racks which have an in rack battery backup/UPS and rack’s power bus converts the 480 Volt AC to 12 Volt DC or AC directly as needed. By cutting out the data center level UPS and the PDU energy conversion they save lots of energy overhead which can be used to better power the racks.

Free air cooling helps

Facebook data centers like Prineville also make use of “fresh air cooling” that mixes data center air with outside air, that flows through through “wetted media” to cool which is then sent down to cool the racks by convection.  This process keeps the rack servers and storage within the proper temperature range but probably run hotter than most data centers this way. How much fresh air is brought in depends on outside temperature, but during most months, it works very well.

This is in contrast to standard data centers that use chillers, fans and pumps to keep the data center air moving, conditioned and cold enough to chill the equipment. All those fans, pumps and chillers can consume a lot of energy.

Renewable energy, too

Lately, Facebook has made obtaining renewable energy to power their data centers a high priority. One new data center close to the Arctic Circle was built there because of hydro-power, another in Iowa and one in Texas were built in locations with wind power.

All of this technology, open sourced

Facebook has open sourced all of it’s hardware and data center systems. That is the specifications for all the hardware discussed above and more is available from the Open Compute Organization, including the storage specification(s), open rack specification(s) and data center specification(s) for these data centers.

So if you want to build your own cold storage archive that can achieve 1.08 PUE, just pick up their specs and have at it.


Picture Credits: DataCenterKnowledge.Com


3D NAND, how high can it go?

450_x_492_3d_nand_32_layer_stackI was at the Flash Memory Summit a couple of weeks ago and a presenter (from Hynix, I think) got up and talked about how 3D NAND was going to be the way forward for all NAND technology. I always thought we were talking about a handful of layers. But on the slide he had what looked to be a skyscraper block with 20-40 layers of NAND.

Currently shipping 3D NAND

It seems all the major NAND fabs are shipping 30+ layer 3D NAND. Samsung last year said they were shipping 32-layer 3D (V-)NANDToshiba announced earlier this year that they had 48-layer 3D NANDHynix is shipping 36-layer 3D NAND.  Micron-Intel is also shipping 32-layer 3D NAND. Am I missing anyone?

Samsung also said that they will be shipping a 32GB, 48-layer V-NAND chip later this year. Apparently, Samsung is also working on 64-layer V-NAND in their labs and are getting good results.  In an article on Samsung’s website they mentioned the possibility of 100 layers of NAND in a 3D stack.

The other NAND fabs are also probably looking at adding layers to their 3D NAND but aren’t talking as much about it. i5QVjaOmlEZHmjM34GrH3NFORjU9A-xAk_JUvkzS8Os

Earlier this year on a GreyBeards on Storage Podcast we talked with Jim Handy, Director at Objective Analysis on what was going on in NAND fabrication. Talking with Jim was fascinating but one thing he said was that with 3D NAND, building a hole with the right depth, width and straight enough was a key challenge. At the time I was thinking a couple of layers deep. Boy was I wrong.

How high/deep can 3D NAND go?

On the podcast, Jim said he thought that 3D NAND would run out of gas around 2023. Given current press releases, it seems NAND fabs are adding ~16 layers a year to their 3D-NAND.

So if 32 to 48 layers is todays 3D-NAND and we can keep adding 16 layers/year through 2023 that’s 8 years *16 layers or an additional 128 layers  to the 32  to 48 layers currently shipping. With that rate we should get to 160 to 176 layer 3D NAND chips. And if 48 layers is 32GB then we maybe we could see  ~+100GB  3D NAND chips.

This of course means that there is no loss in capacity as we increase layers. Also that the industry can continue to add 16 layers/year to 3D-NAND chips.

I suppose there’s one other proviso, that nothing else comes along that is less expensive to fabricate while still providing ever increasing capacity of lightening fast, non-volatile storage (see a recent post on 3D XPoint NVM technology).

Photo Credit(s):

  1. Micron’s press release on 3D NAND, (c) 2015 Micron
  2. Toshiba’s press release as reported by AnandTech, (c) 2015 Toshiba

Next generation NVM, 3D XPoint from Intel + Micron

cross_point_image_for_photo_capsuleEarlier this week Intel-Micron announced (see webcast here and here)  a new, transistor-less NVM with 1000 time the speed (10µsec access time for NAND) of NAND [~10ns (nano-second) access times] and at 10X the density of DRAM (currently 16Gb/DRAM chip). They call the new technology 3D XPoint™ (cross-point) NVM (non-volatile memory).

In addition to the speed and density advantages, 3D XPoint NVM also doesn’t have the endurance problems associated with todays NAND. Intel and Micron say that it has 1000 the endurance of today’s NAND (MLC NAND endurance is ~3000 write (P/E) cycles).

At that 10X current DRAM density it’s roughly equivalent to todays MLC/TLC NAND capacities/chip. And at 1000 times the speed of NAND, it’s roughly equivalent in performance to DDR4 DRAM. Of course, because it’s non-volatile it should take much less power to use than current DRAM technology, no need for power refresh.

We have talked about the end of NAND before (see The end of NAND is here, maybe). If this is truly more scaleable than NAND it seems to me that the it does signal the end of NAND. It’s just a matter of time before endurance and/or density growth of NAND hits a wall and then 3D XPoint can do everything NAND can do but better, faster and more reliably.

3D XPoint technology

The technology comes from a dual layer design which is divided into columns and at the top and bottom of the columns are accessor connections in an orthogonal pattern that together form a grid to access a single bit of memory.  This also means that 3D Xpoint NVM can be read and written a bit at a time (rather than a “page” at a time with NAND) and doesn’t have to be initialized to 0 to be written like NAND.

The 3D nature of the new NVM comes from the fact that you can build up as many layers as you want of these structures to create more and more NVM cells. The microscopic pillar  between the two layers of wiring include a memory cell and a switch component which allows a bit of data to be accessed (via the switch) and stored/read (memory cell). In the photo above the yellow material is a switch and the green material is a memory cell.

A memory cell operates by a using a bulk property change of the material. Unlike DRAM (floating gates of electrons) or NAND (capacitors to hold memory values). As such it uses all of the material to hold a memory value which should allow 3D XPoint memory cells to scale downwards much better than NAND or DRAM.

Intel and Micron are calling the new 3D XPoint NVM storage AND memory. That is suitable for fast access, non-volatile data storage and non-volatile processor memory.

3D XPoint NVM chips in manufacturing today

First chips with the new technology are being manufactured today at Intel-Micron’s joint manufacturing fab in Idaho. The first chips will supply 128Gb of NVM and uses just two layers of 3D XPoint memory.

Intel and Micron will independently produce system products (read SSDs or NVM memory devices) with the new technology during 2016. They mentioned during the webcast that the technology is expected to be attached (as SSDs) to a PCIe bus and use NVMe as an interface to read and write it. Although if it’s used in a memory application, it might be better attached to the processor memory bus.

The expectation is that the 3D XPoint cost/bit will be somewhere in between NAND and DRAM, i.e. more expensive than NAND but less expensive than DRAM. It’s nice to be the only companies in the world with a new, better storage AND memory technology.


Over the last 10 years or so, SSDs (solid state devices) all used NAND technologies of one form or another, but after today SSDs can be made from NAND or 3D XPoint technology.

Some expected uses for the new NVM is in gaming applications (currently storage speed and memory constrained) and for in-memory databases (which are memory size constrained).  There was mention on the webcast of edge analytics as well.

Welcome to the dawn of a new age of computer storage AND memory.

Photo Credits: (c) 2015 Intel and Micron, from Intel’s 3D XPoint website

Seagate releases 4TB Backup Plus drive with Microsoft OneDrive cloud storage

backup-pr-1000px-wSeagate today announced the release of a single platter, 4TB Backup Plus drive. The new (20.5mm) thin device offers USB 3.0 and is targeted for PC backup applications. The drive has a MSRP of $239.99 US and is expected to be available for sale in July.

It comes with 200GB of Microsoft OneDrive cloud storage. Microsoft recently opened up their OneDrive cloud service API’s for developers and other manufacturers. It’s unclear whether the Backup Plus OneDrive offer is available as a coupon/key code or takes advantage of the new OneDrive APIs to offer its services.

Seagate already has a 4TB Backup Plus drive but it is a multi-platter device. This new drive will be considerably slimmer and more suitable for portable applications.

There are other semi-portable 4TB drives on the market but none as sleek as this one and none with Microsoft OneDrive tie in.

Now if they just had one that worked with Mac OSX.

Photo Credits: Seagate website



Two dimensional magnetic recording (TDMR)

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)
A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

I attended a Rocky Mountain IEEE Magnetics Society meeting a couple of weeks ago where Jonathan Coker, HGST’s Chief Architect and an IEEE Magnetics Society Distinguished Lecturer was discussing HGST’s research into TDMR heads.

It seems that disk track density is getting so high, track pitch is becoming so small, that the magnetic read heads have become wider than the actual data track width.  Because of this, read heads are starting to pick up more inter-track noise and it’s getting more difficult to obtain a decent signal to noise ratio (SNR) off of a high-density disk platter with a single read head.

TDMR read heads can be used to counteract this extraneous noise by using multiple read heads per data track and as such, help to create a better signal to noise ratio during read back.

What are TDMR heads?

TDMR heads are any configuration of multiple read heads used in reading a single data track. There seemed to be two popular configurations of HGST’s TDMR heads:

  • In-series, where one head is directly behind another head. This provides double the signal for the same (relative) amount of random (electronic) noise.
  • In-parallel (side by side), where three heads were configured in-parallel across the data track and the two inter-track bands. That is, one head was configured directly over the data track with portions spanning the inter-track gap to each side, one head was half way across the data track and the next higher track, and a third head was placed half way across the data track and the next lower track.

At first, the in-series configuration seemed to make the most sense to me. You could conceivably average the two signals coming off the heads and be able to filter out the random noise.  However, the “random noise” seemed to be mostly coming from the inter-track zone and this wasn’t as much random electronics noise as random magnetic noise, coming off of the disk platter, between the data tracks.

In-parallel wins the SNR race

So, much of the discussion was on the in-parallel configuration. The researcher had a number of simulated magnetic recordings which were then read by simulated, in parallel, tripartite read heads.  The idea here was that the information read from each of the side band heads that included inter-track noise could be used as noise information to filter the middle head’s data track reading. In this way they could effectively increase the SNR across the three signals, and thus, get a better data signal from the data track.

Originally, TDMR was going to be the technology that was needed to get the disk industry to 100Tb/sqin. But, what they are finding at HGST and elsewhere, is even today, at “only” ~5Tb/sqin (HGST helium drives), there seems to be an increasing need to help reduce noise coming from read heads.

Disk density increase has been slowing lately but is still on a march to double density every 2 years or so. As such,  1TB platter today will be a 2TB platter in 2 years and a4TB platter in 4 years, etc. TDMR heads may be just the thing that gets the industry to that 4TB platter (20Tb/sqin) in 4 years.

The only problem is what’s going to get them to 100Tb/sqin now?



Optical discs for Facebook cold storage

I heard last week that Facebook is implementing Blu Ray libraries for cold storage. Each BluRay disk holds ~100GB and they figure they can store 10,000 discs or ~1PB in a rack.

They bundle 12 discs in a cartridge and 36 cartridges in a magazine, placing 24 magazines in a cabinet, with BluRay drives and a robotic arm. The robot arm sits in the middle of the cabinet with the magazines/cartridges located on each side.

It’s unclear what Amazon Glacier uses for its storage but a retrieval time of 3-5 hours indicates removable media of some type.  I haven’t seen anything on Windows Azure offering a similar service but Google has released Durable Reduced Availability (DRA) storage which could potentially be hosted on removable media as well.  I was unable to find any access times specifications for Google DRA.

Why the interest in cold storage?

The article mentioned that Facebook is testing the new technology first on its compliance data. After that Facebook will start using it for cold photo storage. Facebook also said that it will be using different storage technologies for it’s cold storage repository mentioning “bad flash” as another alternative.

BluRay supports both a re-writeable as well as WORM (write once, read many times) technology. As such, WORM discs cannot be modified, only destroyed.  WORM technology would be very useful for anyone’s compliance data. The rewritable Blu Ray discs might be more effective for cold photo storage, however the fact that people on Facebook rarely delete photos, says WORM would work well here too.

100GB is a pretty small storage bucket these days but for compliance documents, such as email, invoices, contracts, etc. it’s plenty large.

Can Blu Ray optical provide data center cold storage?

Facebook didn’t discuss the specs on the robot arm that they were planning to use but with 10K cartridges it has a lot of work to do. Tape library robots move a single cartridge in about 11 seconds or so. If the optical robot could do as well (no information to the contrary) one robot arm could support ~4K disc moves per day. But that would be enterprise class robotics and 100% duty cycle, more likely 1/2 to 1/4 of this would be considered good for an off the shelf system like this. So maybe a 1000 to 2000 disc picks per day.

If we use 22 seconds per disc swap (two disc moves), a single robot/rack could support a maximum of 100 to 200TB of data writes per day (assuming robot speed was the only bottleneck).  In the video (see about 30 minutes in) the robot didn’t look all that fast as compared to a tape library robot, but maybe I am biased.

Near as I can tell a 12x BluRay drive can write at ~35MB/sec (SATA drive, writing single layer, 25GB disc, we assume this can be sustained for a 4-layer or dual-sided 2-layer 100GB disc). So to be able to write a full 100GB disk would take ~48 minutes and if you add to that the 22 seconds of disc swap time, one SATA drive running 100% flat out could maybe write 30 discs per day or ~3TB/day.

In the video, the BluRay drives appear to be located in an area above the disc magazines along each side. There appears to be two drives per column with 6 columns per side, so a maximum of 24 drives. With 24 drives, one rack could write about 72TB/day or 720 discs per day which would fit into our 22 seconds per swap.  At 72TB/day it’s going to take ~14 days to fill up a cabinet. I could be off on the drive count, they didn’t show the whole cabinet in the video, so it’s possible they have 12 columns per side, 48 drives per cabinet and 144TB/day.

All this assumes 100% duty cycle on the drives which is unreasonable for an enterprise class tape drive let alone a consumer class BluRay drive. This is also write speed, I assume that read speed is the same or better. Also, I didn’t see any servers in the cabinet and I assume that something has to be reading, writing and controlling the optical library. So these other servers need to be somewhere close by, but they could easily be located in a separate rack somewhere near to the library.

So it all makes some amount of sense from a system throughput perspective. Given what we know about the drive speed, cartridge capacity and robot capabilities, it’s certainly possible that the system could sustain the disc swaps and data transfer necessary to provide data center cold storage archive.

And the software

But there’s plenty of software that has to surround an optical library to make it useful. Somehow we would want to be able to identify a file as a candidate for cold storage then have it moved to some cold storage disc(s), cataloged, and then deleted from the non-cold storage repository.  Of course, we probably want 2 or more copies to be written, maybe these redundant copies should be written to different facilities or at least different cabinets.  The catalog to the cold storage repository is all important and needs to be available 24X7 so this needs to be redundant/protected, updated with extreme care, and from my perspective on some sort of high-speed storage to handle archives of 3EB.

What about OpenStack? Although there have been some rumblings by Oracle and others to provide tape support in OpenStack, nothing seems to be out yet. However, it’s not much of a stretch to see removable media support in OpenStack, if some large company were to put some effort into it.

Other cold storage alternatives

In the video, Facebook says they currently have 30PB of cold storage at one facility and are already in the process of building another. They said that they should have 150PB of cold storage online shortly and that each cold storage facility is capable of holding 3EB or 3,000PB of cold storage.

A couple of years back at Hitachi in Japan, we were shown a Blu Ray optical disc library using 50GB discs. This was just a prototype but they were getting pretty serious about it then. We also saw an update of this at an analyst meeting at HDS, a year or so later. So there’s at least one storage company working on this technology.

Facebook, seems to have decided they were better off developing their own approach. It’s probably more dense/space efficient and maybe even more power efficient but to tell that would take some spec comparisons which aren’t available from Facebook or HDS just yet.

Why not magnetic tape?

I see these large storage repository sizes and wonder if Facebook might not be better off using magnetic tape. It has a much larger capacity and I believe magnetic tape (LTO or enterprise) would supply better volumetric (bytes/in**3) density than the Blu Ray cabinet they showed in the video.

Facebook said that BluRay discs had a 50 year lifetime.  I believe enterprise and LTO tape vendors say their cartridges have a 30 year lifetime. And that might be one consideration driving them to optical.

The reality is that new LTO technology is coming out every 2-3 years or so, and new drives read only 2 generations back and write only the current technology. With that quick a turnover, a data center would probably have to migrate data from old to new tape technology every decade or so before old tape drives go out of warranty.

I have not seen any Blu Ray technology roadmaps so it’s hard to make a comparison, but to date, PC based Blu Ray drives typically can read and write CDs, DVDs, and current Blu Ray disks (which is probably 4 to 5 generations back). So they have a better reputation for backward compatibility over time.

Tape technology roadmaps are so quick because tape competes with disk, which doubles capacity every 18 months or so. I am sure tape drive and media vendors would be happy not to upgrade their technology so fast but then disk storage would take over more and more tape storage applications.

If Blu Ray were to become a data center storage standard, as Facebook seems to want, I believe that Blu Ray technology would fall under similar competitive pressures from both disk and tape to upgrade optical technology at a faster rate. When that happens, it would be interesting to see how quickly optical drives stop supporting the backward compatibility that they currently support.


Photo Credit: [73/366] Grooves by Dwayne Bent [Ed. note, picture of DVD, not Blu Ray disc]