Surprises in disk reliability from Microsoft’s “free cooled” datacenters

HH5At Usenix ATC’16 last week, there was a “best of the rest” session which repeated selected papers presented at FAST’16 earlier this year. One that caught my interest was discussing disk reliability in free cooled data centers at Microsoft (Environmental conditions and disk reliability in free-cooled datacenters, see pp. 53-66).

The paper discusses disk reliability at 9 different datacenters in Microsoft for over 1M drives over the course of 1.5 to 4 years vs. how datacenters were cooled.
Continue reading “Surprises in disk reliability from Microsoft’s “free cooled” datacenters”

Facebook down to 1.08 PUE and counting for cold storage

prineville-servers-470Read a recent article in ArsTechnica about Facebook’s cold storage archive and their sustainable data centers (How Facebook puts petabytes of old cat pix on ice in the name of sustainability). In the article there was a statement that Facebook had achieved a 1.08 PUE (Power Usage Effectiveness) for one of these data centers. This means for every 100 Watts used to power up racks, Facebook needed to add 8 Watts for other overhead.

Just last year I wrote a paper for a client where I interviewed the CEO of an outsourced data center provider (DuPont Fabros Technology) whose state of the art new data centers were achieving a PUE of from 1.14 to 1.18. For Facebook to run their cold storage data centers at 1.08 PUE is even better.

At the moment, Facebook has two cold storage data centers one at Prineville, OR and the other at Forest City, NC (Forest City achieved the 1.08 PUE). The two cold data storage sites add to the other Facebook data centers that handle everything else in the Facebook universe.

MAID to the rescue

First off these are just cold storage data centers, over an EB of data, but still archive storage, racks and racks of it. How they decide something is cold or hot seems to depend on last use. For example, if a picture has been referenced recently then it’s warm, if not then it’s cold.

Second, they have taken MAID (massive array of idle disks) to a whole new data center level. That is each 1U (Knox storage tray) shelf has 30 4TB drives and a rack has 16 of these storage trays, holding 1.92PB of data. At any one time, only one drive in each storage tray is powered up at a time. The racks have dual servers and only one power shelf (due to the reduced power requirements).

They also use pre-fetch hints provided by the Facebook application to cache user data.  This means they will fetch some images ahead of time,when users areis paging through photos in stream in order to have them in cache when needed. After the user looks at or passes up a photo, it is jettisoned from cache, the next photo is pre-fetched. When the disks are no longer busy, they are powered down.

Less power conversions lower PUE

Another thing Facebook is doing is reducing the number of power conversions that need to happen to power racks. In a typical data center power comes in at 480 Volts AC,  flows through the data center UPS and then is dropped down to 208 Volts AC at the PDU which flows to the rack power supply which is then converted to 12 Volts DC.  Each conversion of electricity generally sucks up power and in the end only 85% of the energy coming in reaches the rack’s servers and storage.

In Facebooks data centers, 480 Volts AC is channeled directly to the racks which have an in rack battery backup/UPS and rack’s power bus converts the 480 Volt AC to 12 Volt DC or AC directly as needed. By cutting out the data center level UPS and the PDU energy conversion they save lots of energy overhead which can be used to better power the racks.

Free air cooling helps

Facebook data centers like Prineville also make use of “fresh air cooling” that mixes data center air with outside air, that flows through through “wetted media” to cool which is then sent down to cool the racks by convection.  This process keeps the rack servers and storage within the proper temperature range but probably run hotter than most data centers this way. How much fresh air is brought in depends on outside temperature, but during most months, it works very well.

This is in contrast to standard data centers that use chillers, fans and pumps to keep the data center air moving, conditioned and cold enough to chill the equipment. All those fans, pumps and chillers can consume a lot of energy.

Renewable energy, too

Lately, Facebook has made obtaining renewable energy to power their data centers a high priority. One new data center close to the Arctic Circle was built there because of hydro-power, another in Iowa and one in Texas were built in locations with wind power.

All of this technology, open sourced

Facebook has open sourced all of it’s hardware and data center systems. That is the specifications for all the hardware discussed above and more is available from the Open Compute Organization, including the storage specification(s), open rack specification(s) and data center specification(s) for these data centers.

So if you want to build your own cold storage archive that can achieve 1.08 PUE, just pick up their specs and have at it.


Picture Credits: DataCenterKnowledge.Com


eBay cools Phoenix data center with hot water from the desert

Two people talking to one another in a data center hallway about one person wide with bunches of racks and cabling on either side
Microsoft Bing Maps' datacenter by Robert Scoble

Read a report today about how eBay was cooling their new data center outside Phoenix with hot water at desert warmed 86F (30C) temperatures (see Breaking new ground on data center efficiency).

And to literally top it all off, they are running data center containers on the roof which they claim have a Green Grid’s PUE™ (Power Use Efficiency) of 1.044 in summer with servers at maximum load.  Now this doesn’t count some of the transformers and other power conditioning that is needed but is still impressive nevertheless.

The average for the whole data center a PUE of 1.35 is not the best in the industry but considerably better than average.  We have talked about green data centers before with a NetApp data center having an expected PUE of 1.2 (see Building a green data center).  One secret to these PUE’s is running the servers at hotter than normal temperatures.

New data center designed, servers and other equipment selected with PUE in mind

This is a data center consolidation project so they were also able to start with a blank sheet of paper.  They started by reducing the number of server types down to two, one for high performance computing and the other for big data analytics (Hadoop cluster).  Both sets of servers were selected with power efficiency in mind.  Another server capability requested by eBay was the ability to dynamically change server clock speed so it could idle or speed up servers as demand dictated. In this way they could turn down servers sheding power consumption and/or turn up servers to peak performance, remotely.

The data center cooling was designed with two independent loops, one a traditional  standard air conditioned loop that delivered water at 55F(13C) and the other, a hot water loop that delivered hot water 86F(30C), using water from a cooling tower exposed to the desert air.

eBay started out thinking they would use the air conditioned loop more often in the summer months and less often in winter. But in the end they found they could get by with just using the hot water loop year round and use the cold water loop for some spot cooling, where necessary.

Data center containers on a hot roof

Also the building was specially built to be able to support up to 12-data center containers on the roof.  There were over 4920 servers deployed in three containers currently on the roof and one container of 1500 servers was lifted from the truck and in place in 22 minutes. The containers were designed for direct exposure the desert environment (up tho 122F or 50C) and were cooled using adiabatic cooling.

More details are available in the Green Grid report.


I wonder what they do when they have to swap out components, especially in the containers – maybe they only do this in winter;)


Building a green data center

Diversity in the Ecological Soup by jurvetson (cc) (from Flickr)
Diversity in the Ecological Soup by jurvetson (cc) (from Flickr)

At NetApp’s Analyst Days last week David Robbins, CTO Information Technology, reported on a new highly efficient Global Dynamic Lab (GDL) data center which they built in Raleigh, North Carolina.  NetApp predicts this new data center  will have a power use effectiveness (PUE) ratio of 1.2.  Most data centers today do well if they can attain a PUE of 2.0.

Recall that PUE is the ratio of all power required by the data center (includes such things as IT power, chillers, fans, UPS, transformers, humidifiers, lights, etc.) over just IT power (for racks, storage, servers, and networking gear).  A PUE of 2 says that there is as much power used by IT equipment as is used to power and cool the rest of the data center.  An EPA report on Server and Data Center Efficiency said that data centers could reach a PUE of 1.4 if they used state of the art techniques outlined in the report.  A PUE of 1.2 is a dramatic improvement in data center power efficiency and reduces non-IT power in half.

There were many innovations used by NetApp to reach the power effectiveness at GDL. The most important ones were:

  • Cooling at higher temperatures which allowed for the use of ambient air
  • Cold-room, warm aisle layout which allowed finer control over cooling delivery to the racks
  • Top-down cooling which used physics to reduce fan load.

GDL was designed to accommodate higher rack power densities coming from today’s technology. GDL supports an average of 12kW per rack and can handle a peak load of 42kW per rack.  In addition, GDL uses 52U tall racks which helps reduce data center foot print.  Such high powered/high density racks requires rethinking data center cooling.

Cooling at higher temperatures

Probably the most significant factor that improved PUE was planning for the use much warmer air temperatures.  By using warmer air 70-80F/21.1-26.7C, much of the cooling could now be based on ambient air rather than chilled air.  NetApp estimates that they can use ambient air 75% of the year in Raleigh, a fairly warm and humid location.  As such, GDL chiller use is reduced significantly which generates significant energy savings from the number 2 power consumer in most data centers.

Also, NetApp is able to use ambient air for partial cooling for the much of the rest of the year when used in conjunction with chillers.  Air handlers were purchased that could use outside air, chillers or a combination of the two.  GDL chillers also operate more efficiently at the higher temperatures, reducing power requirements yet again.

Given the temperature rise of typical IT equipment cooling of ~20-25F/7.6-9.4C one potential problem is that the warm aisles can exceed 100F/37.8C which is about the upper limit for human comfort. Fortunately, by detecting lighting use in the hot aisles, GDL can increase cold room equipment cooling to bring temperatures in adjacent hot aisles down to a more comfortable level when humans are present.

One other significant advantage to using warmer temperatures is that warmer air is easier to move than colder air.  This provides savings by allowing lowered powered fans to cool the data center.

Cold rooms-warm aisles

GDL built cold rooms at the front side of racks and a relatively open warm aisle on the other side of the racks.  Such a design provides uniform cooling from the top to the bottom of a rack.  With a more open air design, hot air often accumulates and is trapped at the top of the rack which requires more cooling to compensate.  By sealing the cold room, GDL insures a more equilateral cooling of the rack and thus, more efficient use of cooling.

Another advantage provided by cold-rooms, warm aisles is that cooling activity can be regulated by pressure differentials between the two aisles rather than flow control or spot temperature sensors.  Such regulation effectiveness, allows GDL to reduce air supply to match rack requirements.  As such, GDL reduces excess cooling that is required by more open designs using flow or temperature sensors.

Top down cooling

I run into this every day at my office, cool air is dense and flows downward, hot air is light and flows upward.  NetApp designed GDL to have air handlers on top of the computer room rather than elsewhere.  This eliminates much of the ductwork which often reduces air flow efficiency requiring increased fan power to compensate.  Also by piping the cooling in from above, physics helps get that cold air to the racked equipment that needs it.  As for the hot aisles, warm air will naturally rise to the air return above the aisles and can then be vented to the outside, mixed with outside ambient air or chilled before it’s returned to the cold room.

For normal data centers cooled from below, fan power must be increased to move the cool air up to the top of the rack.  GDL’s top down cooling reduces the fan power requirements substantially from below the floor cooling.


There were other approaches which helped GDL reduce power use such as using hot air for office heating but these seemed to be the main ones.  Much of this was presented at NetApp’s Analyst Days last week.  Robbins has written a white paper which goes into much more detail on GDL’s PUE savings and other benefits that accrued to NetApp when the built this data center.

One nice surprise was the capital cost savings generated by using GDL’s power efficient data center design.  This was also detailed in the white paper.  But at the time this post was published the paper was not available.

Now that summer’s here in the north, I think I want a cold room-warm aisle for my office…