Facebook down to 1.08 PUE and counting for cold storage

prineville-servers-470Read a recent article in ArsTechnica about Facebook’s cold storage archive and their sustainable data centers (How Facebook puts petabytes of old cat pix on ice in the name of sustainability). In the article there was a statement that Facebook had achieved a 1.08 PUE (Power Usage Effectiveness) for one of these data centers. This means for every 100 Watts used to power up racks, Facebook needed to add 8 Watts for other overhead.

Just last year I wrote a paper for a client where I interviewed the CEO of an outsourced data center provider (DuPont Fabros Technology) whose state of the art new data centers were achieving a PUE of from 1.14 to 1.18. For Facebook to run their cold storage data centers at 1.08 PUE is even better.

At the moment, Facebook has two cold storage data centers one at Prineville, OR and the other at Forest City, NC (Forest City achieved the 1.08 PUE). The two cold data storage sites add to the other Facebook data centers that handle everything else in the Facebook universe.

MAID to the rescue

First off these are just cold storage data centers, over an EB of data, but still archive storage, racks and racks of it. How they decide something is cold or hot seems to depend on last use. For example, if a picture has been referenced recently then it’s warm, if not then it’s cold.

Second, they have taken MAID (massive array of idle disks) to a whole new data center level. That is each 1U (Knox storage tray) shelf has 30 4TB drives and a rack has 16 of these storage trays, holding 1.92PB of data. At any one time, only one drive in each storage tray is powered up at a time. The racks have dual servers and only one power shelf (due to the reduced power requirements).

They also use pre-fetch hints provided by the Facebook application to cache user data.  This means they will fetch some images ahead of time,when users areis paging through photos in stream in order to have them in cache when needed. After the user looks at or passes up a photo, it is jettisoned from cache, the next photo is pre-fetched. When the disks are no longer busy, they are powered down.

Less power conversions lower PUE

Another thing Facebook is doing is reducing the number of power conversions that need to happen to power racks. In a typical data center power comes in at 480 Volts AC,  flows through the data center UPS and then is dropped down to 208 Volts AC at the PDU which flows to the rack power supply which is then converted to 12 Volts DC.  Each conversion of electricity generally sucks up power and in the end only 85% of the energy coming in reaches the rack’s servers and storage.

In Facebooks data centers, 480 Volts AC is channeled directly to the racks which have an in rack battery backup/UPS and rack’s power bus converts the 480 Volt AC to 12 Volt DC or AC directly as needed. By cutting out the data center level UPS and the PDU energy conversion they save lots of energy overhead which can be used to better power the racks.

Free air cooling helps

Facebook data centers like Prineville also make use of “fresh air cooling” that mixes data center air with outside air, that flows through through “wetted media” to cool which is then sent down to cool the racks by convection.  This process keeps the rack servers and storage within the proper temperature range but probably run hotter than most data centers this way. How much fresh air is brought in depends on outside temperature, but during most months, it works very well.

This is in contrast to standard data centers that use chillers, fans and pumps to keep the data center air moving, conditioned and cold enough to chill the equipment. All those fans, pumps and chillers can consume a lot of energy.

Renewable energy, too

Lately, Facebook has made obtaining renewable energy to power their data centers a high priority. One new data center close to the Arctic Circle was built there because of hydro-power, another in Iowa and one in Texas were built in locations with wind power.

All of this technology, open sourced

Facebook has open sourced all of it’s hardware and data center systems. That is the specifications for all the hardware discussed above and more is available from the Open Compute Organization, including the storage specification(s), open rack specification(s) and data center specification(s) for these data centers.

So if you want to build your own cold storage archive that can achieve 1.08 PUE, just pick up their specs and have at it.

Comments?

Picture Credits: DataCenterKnowledge.Com

 

eBay cools Phoenix data center with hot water from the desert

Two people talking to one another in a data center hallway about one person wide with bunches of racks and cabling on either side
Microsoft Bing Maps' datacenter by Robert Scoble

Read a report today about how eBay was cooling their new data center outside Phoenix with hot water at desert warmed 86F (30C) temperatures (see Breaking new ground on data center efficiency).

And to literally top it all off, they are running data center containers on the roof which they claim have a Green Grid’s PUE™ (Power Use Efficiency) of 1.044 in summer with servers at maximum load.  Now this doesn’t count some of the transformers and other power conditioning that is needed but is still impressive nevertheless.

The average for the whole data center a PUE of 1.35 is not the best in the industry but considerably better than average.  We have talked about green data centers before with a NetApp data center having an expected PUE of 1.2 (see Building a green data center).  One secret to these PUE’s is running the servers at hotter than normal temperatures.

New data center designed, servers and other equipment selected with PUE in mind

This is a data center consolidation project so they were also able to start with a blank sheet of paper.  They started by reducing the number of server types down to two, one for high performance computing and the other for big data analytics (Hadoop cluster).  Both sets of servers were selected with power efficiency in mind.  Another server capability requested by eBay was the ability to dynamically change server clock speed so it could idle or speed up servers as demand dictated. In this way they could turn down servers sheding power consumption and/or turn up servers to peak performance, remotely.

The data center cooling was designed with two independent loops, one a traditional  standard air conditioned loop that delivered water at 55F(13C) and the other, a hot water loop that delivered hot water 86F(30C), using water from a cooling tower exposed to the desert air.

eBay started out thinking they would use the air conditioned loop more often in the summer months and less often in winter. But in the end they found they could get by with just using the hot water loop year round and use the cold water loop for some spot cooling, where necessary.

Data center containers on a hot roof

Also the building was specially built to be able to support up to 12-data center containers on the roof.  There were over 4920 servers deployed in three containers currently on the roof and one container of 1500 servers was lifted from the truck and in place in 22 minutes. The containers were designed for direct exposure the desert environment (up tho 122F or 50C) and were cooled using adiabatic cooling.

More details are available in the Green Grid report.

~~~~~~

I wonder what they do when they have to swap out components, especially in the containers – maybe they only do this in winter;)

Comments?

SNIA illuminates storage power efficiency

Untitled by johnwilson1969 (cc) (from Flickr)
Untitled by johnwilson1969 (cc) (from Flickr)

At SNW, a couple of weeks back, SNIA annouced the coming out of their green storage initiative’s new SNIA Emerald Program and the first public draft release of their storage power efficiency test  specification.  Up until now, other than SPC and some pronouncements from EPA there hasn’t been much standardization activity on how to measure storage power efficiency.

SNIA’s Storage Power Efficiency Specification

As such, SNIA felt there was a need for an industry standard on how to measure storage power use.  SNIA’s specification supplies a taxonomy for storage systems that can be used to define and categorize various storage systems. Their extensive taxonomy should minimize problems like comparing consumer storage power use against data center storage power use.  Also, the specification identifies storage use attributes such as deduplication and thin provisioning or capacity optimization features that can impact power efficiency.

In addition, the specification has two appendices:

  • Appendix A specifies the valid power and environmental meters that are to be used to measure power efficiency of the system under test.
  • Appendix B specifies the benchmark tool that is used to drive the system under test while its power efficiency is being measured.

Essentially, there are two approved benchmark drivers used to drive IOs in the online storage category Iometer and vdbench both of which are freely available.  Iometer has been employed for quite awhile now in vendor benchmarking activity.  In contrast, vdbench is a relative newcomer but I have worked with its author, Henk Vandenbergh, over many years now and he is a consummate performance analyst.  I look forward to seeing how Henk’s vdbench matures over time.

Given the spec’s taxonomy and the fact that it lists online, near-online, removable media, virtual media and adjunct storage device categories with multiple sub-categories for each, we will focus only on the online family of storage and save the rest for later.

SPC energy efficiency measures

As my readers should recall, the Storage Performance Council (SPC) also has benchmarks that measure energy use with their SPC-1/E and SPC-1C/E reports (see our SPC-1 IOPS per Watt post).  The interesting part about SPC-1/E results is that there are definite IOPS levels where storage power use undergoes significant transitions.

One can examine a SPC-1/E Executive Summary report and see power use at various IO intensity levels, i.e., 100%, 95%, 90%, 85%, 80%, 50%, 10% and 0% (or idle) for a storage subsystem under test.   SPC summarizes these detail power measurements by defining profiles for “Low”, “Medium” and “Heavy” storage system use.  But the devils often in the details and having all the above measurements allows one to calculate whatever activity profile works best for you.

Unfortunately, only a few SPC-1/E reports have been submitted to date and it has yet to take off.

SNIA alternative power efficiency metrics

Enter SNIA’s Emerald program, which is supposed to be an easier and quicker way to measure storage power use.  In addition to the specification, SNIA has established a website (see above) to hold SNIA approved storage power efficiency results and a certification program for auditors that can be used to verify vendor power efficiency testing meet all specification requirements.

What’s missing from the present SNIA power efficiency test specification are the following:

  • More strict IOPS level definitions – the specification refers to IO intensity but doesn’t provide an adequate definition from my perspective.  It says that subsystem response time cannot exceed 30msec and uses this to define 100% IO intensity for the workloads.  However given this definition it could apply to random read, random write, or mixed workloads and there is no separate specification for sequential or random (and/or mixed) workloads.  This could be tightened up
  • More IO intensity levels measured – the specification calls for power measurements at an IO intensity of 100% for all workloads and 25% for 70:30 R:W workloads for online storage.  However we would be more interested in also seeing 80% and 10%.  From a user perspective, 80% probably represents a heavy sustainable IO workload and 10% looks like a complete cache hit workload.  We would only measure these levels for the “Mixed workload” so as to minimize effort.
  • More write activity in “Mixed workloads” – the specification defines mixed workload as 70% read and 30% write random IO activity.  Given today’s O/S propensity to buffer read data, it would seem more prudent to use a 50:50 Read to Write mix.

Probably other items need more work as well, such as defining a standardized reporting format containing a detailed description of HW and SW of system under test, benchmark driver HW and SW, table for reporting all power efficiency metrics and inclusion of full benchmark report including input parameter specifications and all outputs, etc. but these are nits.

Finally, SNIA’s specification goes into much detail about capacity optimization testing which includes things like compression, deduplication, thin provisioning, delta-snapshotting, etc. with an intent to measure storage system power use when utilizing these capabilities.  This is a significant and complex undertaking to define how each of these storage features will be configured and used during power measurement testing.  Although SNIA should be commended for their efforts here, this seems to much to take on at the start.  We suggest capacity optimization testing definitions should be deferred to a later release and focus now on the more standard storage power efficiency measurements.

—-

I critique specifications at my peril.  Being wrong in the past has caused me to re-double efforts to insure a correct interpretation of any specification.  However, if there’s something I have misconstrued or missed here that are worthy of note please feel free to comment.

Building a green data center

Diversity in the Ecological Soup by jurvetson (cc) (from Flickr)
Diversity in the Ecological Soup by jurvetson (cc) (from Flickr)

At NetApp’s Analyst Days last week David Robbins, CTO Information Technology, reported on a new highly efficient Global Dynamic Lab (GDL) data center which they built in Raleigh, North Carolina.  NetApp predicts this new data center  will have a power use effectiveness (PUE) ratio of 1.2.  Most data centers today do well if they can attain a PUE of 2.0.

Recall that PUE is the ratio of all power required by the data center (includes such things as IT power, chillers, fans, UPS, transformers, humidifiers, lights, etc.) over just IT power (for racks, storage, servers, and networking gear).  A PUE of 2 says that there is as much power used by IT equipment as is used to power and cool the rest of the data center.  An EPA report on Server and Data Center Efficiency said that data centers could reach a PUE of 1.4 if they used state of the art techniques outlined in the report.  A PUE of 1.2 is a dramatic improvement in data center power efficiency and reduces non-IT power in half.

There were many innovations used by NetApp to reach the power effectiveness at GDL. The most important ones were:

  • Cooling at higher temperatures which allowed for the use of ambient air
  • Cold-room, warm aisle layout which allowed finer control over cooling delivery to the racks
  • Top-down cooling which used physics to reduce fan load.

GDL was designed to accommodate higher rack power densities coming from today’s technology. GDL supports an average of 12kW per rack and can handle a peak load of 42kW per rack.  In addition, GDL uses 52U tall racks which helps reduce data center foot print.  Such high powered/high density racks requires rethinking data center cooling.

Cooling at higher temperatures

Probably the most significant factor that improved PUE was planning for the use much warmer air temperatures.  By using warmer air 70-80F/21.1-26.7C, much of the cooling could now be based on ambient air rather than chilled air.  NetApp estimates that they can use ambient air 75% of the year in Raleigh, a fairly warm and humid location.  As such, GDL chiller use is reduced significantly which generates significant energy savings from the number 2 power consumer in most data centers.

Also, NetApp is able to use ambient air for partial cooling for the much of the rest of the year when used in conjunction with chillers.  Air handlers were purchased that could use outside air, chillers or a combination of the two.  GDL chillers also operate more efficiently at the higher temperatures, reducing power requirements yet again.

Given the temperature rise of typical IT equipment cooling of ~20-25F/7.6-9.4C one potential problem is that the warm aisles can exceed 100F/37.8C which is about the upper limit for human comfort. Fortunately, by detecting lighting use in the hot aisles, GDL can increase cold room equipment cooling to bring temperatures in adjacent hot aisles down to a more comfortable level when humans are present.

One other significant advantage to using warmer temperatures is that warmer air is easier to move than colder air.  This provides savings by allowing lowered powered fans to cool the data center.

Cold rooms-warm aisles

GDL built cold rooms at the front side of racks and a relatively open warm aisle on the other side of the racks.  Such a design provides uniform cooling from the top to the bottom of a rack.  With a more open air design, hot air often accumulates and is trapped at the top of the rack which requires more cooling to compensate.  By sealing the cold room, GDL insures a more equilateral cooling of the rack and thus, more efficient use of cooling.

Another advantage provided by cold-rooms, warm aisles is that cooling activity can be regulated by pressure differentials between the two aisles rather than flow control or spot temperature sensors.  Such regulation effectiveness, allows GDL to reduce air supply to match rack requirements.  As such, GDL reduces excess cooling that is required by more open designs using flow or temperature sensors.

Top down cooling

I run into this every day at my office, cool air is dense and flows downward, hot air is light and flows upward.  NetApp designed GDL to have air handlers on top of the computer room rather than elsewhere.  This eliminates much of the ductwork which often reduces air flow efficiency requiring increased fan power to compensate.  Also by piping the cooling in from above, physics helps get that cold air to the racked equipment that needs it.  As for the hot aisles, warm air will naturally rise to the air return above the aisles and can then be vented to the outside, mixed with outside ambient air or chilled before it’s returned to the cold room.

For normal data centers cooled from below, fan power must be increased to move the cool air up to the top of the rack.  GDL’s top down cooling reduces the fan power requirements substantially from below the floor cooling.

—-

There were other approaches which helped GDL reduce power use such as using hot air for office heating but these seemed to be the main ones.  Much of this was presented at NetApp’s Analyst Days last week.  Robbins has written a white paper which goes into much more detail on GDL’s PUE savings and other benefits that accrued to NetApp when the built this data center.

One nice surprise was the capital cost savings generated by using GDL’s power efficient data center design.  This was also detailed in the white paper.  But at the time this post was published the paper was not available.

Now that summer’s here in the north, I think I want a cold room-warm aisle for my office…