Facebook down to 1.08 PUE and counting for cold storage

prineville-servers-470Read a recent article in ArsTechnica about Facebook’s cold storage archive and their sustainable data centers (How Facebook puts petabytes of old cat pix on ice in the name of sustainability). In the article there was a statement that Facebook had achieved a 1.08 PUE (Power Usage Effectiveness) for one of these data centers. This means for every 100 Watts used to power up racks, Facebook needed to add 8 Watts for other overhead.

Just last year I wrote a paper for a client where I interviewed the CEO of an outsourced data center provider (DuPont Fabros Technology) whose state of the art new data centers were achieving a PUE of from 1.14 to 1.18. For Facebook to run their cold storage data centers at 1.08 PUE is even better.

At the moment, Facebook has two cold storage data centers one at Prineville, OR and the other at Forest City, NC (Forest City achieved the 1.08 PUE). The two cold data storage sites add to the other Facebook data centers that handle everything else in the Facebook universe.

MAID to the rescue

First off these are just cold storage data centers, over an EB of data, but still archive storage, racks and racks of it. How they decide something is cold or hot seems to depend on last use. For example, if a picture has been referenced recently then it’s warm, if not then it’s cold.

Second, they have taken MAID (massive array of idle disks) to a whole new data center level. That is each 1U (Knox storage tray) shelf has 30 4TB drives and a rack has 16 of these storage trays, holding 1.92PB of data. At any one time, only one drive in each storage tray is powered up at a time. The racks have dual servers and only one power shelf (due to the reduced power requirements).

They also use pre-fetch hints provided by the Facebook application to cache user data.  This means they will fetch some images ahead of time,when users areis paging through photos in stream in order to have them in cache when needed. After the user looks at or passes up a photo, it is jettisoned from cache, the next photo is pre-fetched. When the disks are no longer busy, they are powered down.

Less power conversions lower PUE

Another thing Facebook is doing is reducing the number of power conversions that need to happen to power racks. In a typical data center power comes in at 480 Volts AC,  flows through the data center UPS and then is dropped down to 208 Volts AC at the PDU which flows to the rack power supply which is then converted to 12 Volts DC.  Each conversion of electricity generally sucks up power and in the end only 85% of the energy coming in reaches the rack’s servers and storage.

In Facebooks data centers, 480 Volts AC is channeled directly to the racks which have an in rack battery backup/UPS and rack’s power bus converts the 480 Volt AC to 12 Volt DC or AC directly as needed. By cutting out the data center level UPS and the PDU energy conversion they save lots of energy overhead which can be used to better power the racks.

Free air cooling helps

Facebook data centers like Prineville also make use of “fresh air cooling” that mixes data center air with outside air, that flows through through “wetted media” to cool which is then sent down to cool the racks by convection.  This process keeps the rack servers and storage within the proper temperature range but probably run hotter than most data centers this way. How much fresh air is brought in depends on outside temperature, but during most months, it works very well.

This is in contrast to standard data centers that use chillers, fans and pumps to keep the data center air moving, conditioned and cold enough to chill the equipment. All those fans, pumps and chillers can consume a lot of energy.

Renewable energy, too

Lately, Facebook has made obtaining renewable energy to power their data centers a high priority. One new data center close to the Arctic Circle was built there because of hydro-power, another in Iowa and one in Texas were built in locations with wind power.

All of this technology, open sourced

Facebook has open sourced all of it’s hardware and data center systems. That is the specifications for all the hardware discussed above and more is available from the Open Compute Organization, including the storage specification(s), open rack specification(s) and data center specification(s) for these data centers.

So if you want to build your own cold storage archive that can achieve 1.08 PUE, just pick up their specs and have at it.

Comments?

Picture Credits: DataCenterKnowledge.Com

 

Interesting sessions at SNIA DSI Conference 2015

I attended the SNIA Data Storage Innovation (DSI) Conference in Santa Clara, CA last week and ran into a number of old friends and met a few new ones. While attending the conference, there were a few sessions that seemed to bring the conference to life for me.

Microsoft Software Defined Storage Journey

Jose Barreto, Principal Program Manager – Microsoft, spent a little time on what’s currently shipping with Scale-out File Service, Storage Spaces and other storage components of Windows software defined storage solutions. Essentially, what Microsoft is learning from Azure cloud deployments it is slowly but surely being implemented in Windows Server software and other solutions.

Microsoft ‘s vision is that customers can have their own private cloud storage with partner storage systems (SAN & NAS), with Microsoft SDS (Scale-out File Server with Storage Spaces), with hybrid cloud storage (StorSimple with Azure storage) and public cloud storage (Azure storage).

Jose also mentioned other recent innovations like the Cloud Platform System using Microsoft software, Dell compute, Force 10 networking and JBOD (PowerVault MD3060e) storage in a rack.

Some recent Microsoft SDS innovations include:

  • HDD and SSD storage tiering;
  • Shared volume storage;
  • System Center volume and unified storage management;
  • PowerShell integration;
  • Multi-layer redundancy across nodes, disk enclosures, and disk devices; and
  • Independent scale-out of compute or storage.

Probably a few more I’m missing here but these will suffice.

Then, Jose broke some news on what’s coming next in Windows Server storage offerings:

  • Quality of service (QoS) – Windows Server provides QoS capabilities which allows one to limit the IO activity and can be used to specify min and max IOPS or latency at a VM or VHD level. The scale-out storage service will balance the IO activity across the cluster to meet this QoS specification. Apparently the balancing algorithm came from Microsoft Research but Jose didn’t go into great detail on what it did differently other than being “fairer” applying QoS constraints.
  • Rolling upgrades – Windows Server now supports a cluster running different versions of software. Now one can take a cluster node down and update its software and re-activate it into the same cluster. Previously, code upgrades had to take a whole cluster down at a time.
  • Synchronous replication – Windows Server now supports synchronous Storage Replicast the volume level. Previously Storage Replicas were limited to asynch.
  • Higher VM storage resiliency – Windows will now pause a VM rather than terminate it during transient storage interruptions. This allows VMs to sustain operations across transient outages. VMs are in PausedCritical state until the storage comes back and then they are restarted automatically.
  • Shared-nothing Storage Spaces – Windows Storage Spaces can be configured across cluster nodes without shared storage. Previously, Storage Spaces required shared JBOD storage between cluster nodes. This feature removes this configuration constraint and allows JBOD storage to only be accessible fro a single node.

Jose did not name what this  “Vnext” was going to be called and didn’t provide a specific time frame other than it’s coming out shortly.

Archival Disc Technology

Yasumori  Hino from Panasonic and Jun Nakano from Sony presented information on a brand new removable media technology or Cold Storage. Previous to there session there was another one from HDS Federal Corporation on their BluRay jukebox but Yasumori’s and Jun’s session was more noteworthy.The  new Archive Disc is the next iteration in optical storage beyond BlueRay and targeted at long term archive or “cold” storage.

As a prelude to the Archive Disc discussion Yasumori played a CD that was pressed in 1982 (52nd Street, Billy Joel album) on his current generation laptop to show the inherent downward compatibility in optical disc technology.

In 1980 IBM 3480 disk drives were refrigerator sized, multi $10K devices, and held 2.3GB. As far as I know there aren’t any of these still in operation. And IBM/STK tape was reel to reel and took up a whole rack. There may be a few of these devices still operating these days but not many.  I still have a CD collection (but then I am a GreyBeard 🙂 that I still listen to occasionally.

IMG_4399The new Archive Disc includes:

  • More resilient media to high humidity, high temperature, salt water, and EMP and other magnetic disturbances. As proof, a BlueRay disk was submerged in sea water for 5 weeks and was still able to be read. Data on BlueRay and the new Archive disk is recorded without using electro magnetics and is recorded in a very stable oxide recording material layer. They project that the new Archive disc has a media life of 50 years at 50C and 1000 years at 25C under high humidity conditions.
  • Dual sided, triple layered which uses land and groove recording to provide 300GB of data storage. BlueRay also uses a land and groove disk format but only records on the land portion of the disc. Track pitch for BlueRay is 320nm whereas for the Archive disc it’s only 225nm.
  • Data transfer speeds of 90MB/sec with two optical heads, one per side. Each head can read/write data at 45MB/sec. They project double or quadrouple this data transfer rate by using more pairs of optical heads .

They also presented a roadmap for a 2nd gen 500GB and 3rd gen 1TB Archive disc using higher linear density changes and better signal processing technology.

Cold storage is starting to get some more interest these days what with all the power consumption going into today’s data centers and the never ending data tsunami. Archive and BluRay optical storage consume no power at rest and only consume power when mounting/dismounting and reading/writing/spinning. Also with optical discs imperviousness to high temp and humidity, optical storage could be stored outside of air conditioned data centers.

The Storage Revolution

The final presentation of interest to me was by Andrea Nelson from Intel. Intel has lately been focusing on helping partners and vendors provide more effective storage offerings. These aren’t storage solutions but rather storage hardware, components and software developed in collaboration with storage vendors and partners that make it easier for them to offer storage solutions using Intel hardware. One example of this collaboration is IBM hardware assist Real Time Compression available on new V7000 and FlashSystem V9000 storage hardware.

As the world turns to software defined storage, Intel wants those solutions to make use of their hardware. (Although, at the show I heard from one another new SDS vendor that was planning to use X86 as well as ARM servers).

Intel has:

  • QuickAssist Acceleration technology – such as hardware assist data compression,
  • Storage Acceleration software libraries – open source erasure coding and other low-level compute intensive functions, and
  • Cache Acceleration software – uses Intel SSDs as a data cache for storage applications.

There wasn’t of a technical description of these capabilities as in other DSI sessions but with the industry moving more and more to SDS, Intel’s got a vested interest in seeing it be implemented on their hardware.

~~~~

That’s about it. I sat in on quite a number of other sessions but nothing else stuck out as significant or interesting to me as these threes sessions.

Comments?