Western Digital at SFD15: ActiveScale object storage

Phill Bullinger and his staff from Western Digital presented at Storage Field Day 15 (SFD15) on a number of their enterprise products including Tegile and IntelliFlash but the one that caught my interest was their ActiveScale object store acquired from Amplidata back in 2015.

ActiveScale is an onprem, object storage system that provides cloud-like  economics for customer data.

ActiveScale Hardware

ActiveScale systems can both scale up and scale out within a single site. ActiveScale systems have both  storage and system nodes. Storage nodes perform erasure coding and System nodes are control points and metadata managers for the object store.

ActiveScale comes in two appliance configurations that contain both storage and system nodes and storage required.  The two appliances are:

  • ActiveScale P100 is a 7U 720TB pod system and A full rack of P100s can read 8GB/sec and can have 17-9s data availability. The P100 can scale up to 2.1PB in a single rack and up to 18PB in the same namespace. The P100 is a higher performing solution with better performing storage and system nodes
  • ActiveScale X100 is a 42U rack scale solution that holds up to 588 12TB drives or 5.8PB per rack. The X100 can scale up to 9 racks or 52PB in the same namespace. The X100 is a denser configuration with only 6 storage nodes and as such, has a better $/GB than the P100 above.

As WDC is both the supplier of the ActiveScale appliance and a supplier of disk storage they can be fairly aggressive with pricing on appliance systems.

Data integrity in ActiveScale

They make a point of saying that ActiveScale object metadata and data are stored separately. By separating data and metadata, they claim to be  more resilient to system failures. Object metadata is 3 way replicated, in a replicated database, residing in system nodes. Other object systems often store metadata and object data in the same way.

Object data can be erasure coded. That is, object data is chunked, erasure coding protected and then spread across multiple disk drives for data protection. ActiveScale erasure coding is called BitSpread. With BitSpread customers identify the number of disk drives to spread object data across and the number of drive failures the system should recover from without data loss.

A typical BitSpread configuration splits object data into 18 chunks and spreads these chunks across storage columns. A storage column is from 6-18 storage nodes. There’s no pre-allocated space in BitSpread. Object data chunks are allocated to disk storage based on current capacity and performance of the system, within redundancy constraints.

In addition, ActiveScale has a background task called BitDynamics that scans  erasure coded chunks and does a mathematical health check on the object data. If a chunk is bad, the object data chunk can be recovered and re-erasure coded back to proper health.

WDC performance testing shows that BitDynamics has 0 performance degradation when performing re-erasure coding. Indeed, they took out 98 drives in an ActiveScale cluster and BitDynamics re-coded all that data onto other disk drives and detected no performance impact. No indication how long  re-encoding 98 disk drives of data took nor the % of object store capacity utilization at the time of the test but presumably there’s a report someplace to back this up

Unlike many public cloud based object storage systems, ActiveScale is strongly consistent. That is object puts (writes) are not responded back to the entity doing the put,  until the object metadata and object data are properly and safely recorded in the object store.

ActiveScale also supports 3 site erasure coding. GeoSpread is their approach to erasure coding across sites. In this case, object metadata is replicated across 3 system nodes across the sites. Object data and erasure coded information is split into 20 chunks which are then spread across the three sites.  This way if any one site goes down, the other two sites have sufficient metadata, object data chunks and erasure coded information to reconstruct the data.

ActiveScale 5.2 now supports asynch replication. That is any one ActiveScale cluster can replicate to any other ActiveScale cluster located continent distances away.

Unclear how GeoSpread and asynch replication would interact together, but my guess is that each of the 3 GeoSpread sites could be asynchronously replicated to 3 other sites for maximum redundancy.

Both GeoSpread and ActiveScale replication impact performance,  depending on how far the sites are from one another and the speed and bandwidth of the links between sites.

ActiveScale markets

ActiveScale’s biggest market is media and entertainment (M&E), mostly used for media archive or tape replacement/augmentation. WDC showed one customer case study for the Montreaux Jazz Festival, which migrated 49 years of performance videos up to ActiveScale and can now stream any performance, on request, without delay. Montreax media is GeoSpread across 3 sites in France. Another option is to perform transcoding on the object media in realtime and stream the transcoded media.

Another large market is Bio/Life Sciences. Medical & biological scanners are transitioning to higher resolution scans which take more data space. And this sort of medical information needs to be kept a long time

Data analytics on ActiveScale

One other emerging market is data analytics. With the new S3A (S3 adapter), Hadoop clusters can now support object storage as a 2nd tier. One problem with data analytics is that they have lots of data and storing it in triplicate, costs an awful lot.

In big data world, datasets can get very large very quickly. Indeed PB sizes data sets aren’t that unusual. And with triple replication (in native HDFS). When HDFS runs out of space you have to delete data. Before S3A, the only way you could increase storage you had to scale out (with compute and storage and networking) in order to add capacity.

Using Hadoop’s S3A, ActiveScale’s can provide cold archive for data analytics.  From a Hadoop user/application perspective, S3A ActiveScale storage looks like just another directory under HDFS (Hadoop Data File System). You can run MapReduce or other Hadoop application directly against object buckets. But a more realistic approach is to move inactive or cold data from an disk resident HDFS directory to a S3A directory

HDFS and MapReduce are tightly coupled and were designed to have data close to where computation happens. So,  as long as the active data or working set data is on HDFS disk storage or directly in memory the rest of the (inactive) data could all be placed on S3A object storage. Inactive data is normally historical data no longer being actively analyzed while newer data would be actively analyzed. Older, inactive data can be manually or automatically archived off to S3A. With HIVE you can partition your database to have active data in HDFS disk storage and inactive data in S3A.

Another approach is if the active, working set data can all fit directly in memory then the data can reside on S3A object storage. This way the data is read from S3A storage into memory, analyzed there and output be done back to object store or HDFS disk. Because the data is only read (loaded) once, there’s only a minimal performance penalty to use S3A storage.

Western Digital is an active contributor to Hadoop S3A and have recently added performance improvements to S3A, such as better caching, partial object reading, and core XML performance tuning options.

If your interested in learning more about Western Digital ActiveScale, check out the videos referenced earlier and their website.

Also you may be interested in these other posts on the WD sessions at SFD15:

The A is for Active, The S is for Scale by Dan Firth (@PenguinPunk)


Disk capacity growing out-of-sight

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)
A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

Last week, Hitachi Global Storage Division(acquired by Western Digital, closing in 4Q2011) and Seagate announced some higher capacity disk drives for desk top applications over the past week.

Most of us in the industry have become somewhat jaded with respect to new capacity offerings. But last weeks announcements may give one pause.

Hitachi announced that they are shipping over 1TB/disk platter using 3.5″ platters shipping with 569Gb/sqin technology.  In the past 4-6 platter disk drives were available in shipped disk drives using full height, 3.5″ drives.  Given the platter capacity available now, 4-6TB drives are certainly feasible or just around the corner. Both Seagate and Samsung beat HGST to 1TB platter capacities which they announced in May of this year and began shipping in drives in June.

Speaking of 4TB drives, Seagate announced a new 4TB desktop external disk drive.  I couldn’t locate any information about the number of platters, or Gb/sqin of their technology, but 4 platters are certainly feasible and as a result, a 4TB disk drive is available today.

I don’t know about you, but 4TB disk drives for a desktop seem about as much as I could ever use. But when looking seriously at my desktop environment my CAGR for storage (revealed as fully compressed TAR files) is ~61% year over year.  At that rate, I will need a 4TB drive for backup purposes in about 7 years and if I assume a 2X compression rate then a 4TB desktop drive will be needed in ~3.5 years, (darn music, movies, photos, …).  And we are not heavy digital media consumers, others that shoot and edit their own video probably use orders of magnitude more storage.

Hard to believe, but given current trends inevitable,  a 4TB disk drive will become a necessity for us within the next 4 years.







When will disks become extinct?

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)
A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

Yesterday, it was announced that Hitachi General Storage Technologies (HGST) is being sold to Western Digital for $4.3B and after that there was much discussion in the tweeterverse about the end of enterprise disk as we know it.  Also, last week I was at a dinner at an analyst meeting with Hitachi, where the conversation turned to when disks will no longer be available. This discussion was between Mr. Takashi Oeda of Hitachi RSD, Mr. John Webster of Evaluator group and myself.

Why SSDs will replace disks

John was of the opinion that disks would stop being economically viable in about 5 years time and will no longer be shipping in volume, mainly due to energy costs.  Oeda-san said that Hitachi had predicted that NAND pricing on a $/GB basis would cross over (become less expensive than) 15Krpm disk pricing sometime around 2013.  Later he said that NAND pricing had not come down as fast as projected and that it was going to take longer than anticipated.  Note that Oeda-san mentioned density price cross over for only 15Krpm disk not 7200rpm disk.  In all honesty, he said SATA disk would take longer, but he did not predict when

I think both arguments are flawed:

  • Energy costs for disk drives drop on a Watts/GB basis every time disk density increases. So the energy it takes to run a 600GB drive today will likely be able to run a 1.2TB drive tomorrow.  I don’t think energy costs are going to be the main factor to drives disks out of the enterprise.
  • Density costs for NAND storage are certainly declining but cost/GB is not the only factor in technology adoption. Disk storage has cost more than tape capacity since the ’50s, yet they continue to coexist in the enterprise. I contend that disks will remain viable for at least the next 15-20 years over SSDs, primarily because disks have unique functional advantages which are vital to enterprise storage.

Most analysts would say I am wrong, but I disagree. I believe disks will continue to play an important role in the storage hierarchy of future enterprise data centers.

NAND/SSD flaws from an enterprise storage perspective

All costs aside, NAND based SSDs have serious disadvantages when it comes to:

  • Data retention – the problem with NAND data cells is that they can only be written so many times before they fail.  And as NAND cells become smaller, this rate seems to be going the wrong way, i.e,  today’s NAND technology can support 100K writes before failure but tomorrow’s NAND technology may only support 15K writes before failure.  This is not a beneficial trend if one is going to depend on NAND technology for the storage of tomorrow.
  • Sequential access – although NAND SSDs perform much better than disk when it comes to random reads and less so, random writes, the performance advantage of sequential access is not that dramatic.  NAND sequential access can be sped up by deploying multiple parallel channels but it starts looking like internal forms of wide striping across multiple disk drives.
  • Unbalanced performance – with NAND technology, reads operate quicker than writes. Sometimes 10X faster.  Such unbalanced performance can make dealing with this technology more difficult and less advantageous than disk drives of today with much more balanced performance.

None of these problems will halt SSD use in the enterprise. They can all be dealt with through more complexity in the SSD or in the storage controller managing the SSDs, e.g., wear leveling to try to prolong data retention, multi-data channels for sequential access, etc. But all this additional complexity increases SSD cost, and time to market.

SSD vendors would respond with yes it’s more complex, but such complexity is a one time charge, mostly a one time delay, and once done, incremental costs are minimal. And when you come down to it, today’s disk drives are not that simple either with defect skipping, fault handling, etc.

So why won’t disk drives go away soon.  I think other major concern in NAND/SSD ascendancy is the fact that the bulk NAND market is moving away from SLC (single level cell or bit/cell) NAND to MLC (multi-level cell) NAND due to it’s cost advantage.  When SLC NAND is no longer the main technology being manufactured, it’s price will not drop as fast and it’s availability will become more limited.

Some vendors also counter this trend by incorporating MLC technology into enterprise SSDs. However, all the problems discussed earlier become an order of magnitude more severe with MLC NAND. For example, rather than 100K write operations to failure with SLC NAND today, it’s more like 10K write operations to failure on current MLC NAND.  The fact that you get 2 to 3 times more storage per cell with MLC doesn’t help that much when one gets 10X less writes per cell. And the next generation of MLC is 10X worse, maybe getting on the order of 1000 writes/cell prior to failure.  Similar issues occur for write performance, MLC writes are much slower than SLC writes.

So yes, raw NAND may become cheaper than 15Krpm Disks on a $/GB basis someday but the complexity to deal with such technology is also going up at an alarming rate.

Why disks will persist

Now something similar can be said for disk density, what with the transition to thermally assisted recording heads/media and the rise of bit-patterned media.  All of which are making disk drives more complex with each generation that comes out.  So what allows disks to persist long after $/GB is cheaper for NAND than disk:

  • Current infrastructure supports disk technology well in enterprise storage. Disks have been around so long, that storage controllers and server applications have all been designed around them.  This legacy provides an advantage that will be difficult and time consuming to overcome. All this will delay NAND/SSD adoption in the enterprise for some time, at least until this infrastructural bias towards disk is neutralized.
  • Disk technology is not standing still.  It’s essentially a race to see who will win the next generations storage.  There is enough of an eco-system around disk that will keep pushing media, heads and mechanisms ever forward into higher densities, better throughput, and more economical storage.

However, any infrastructural advantage can be overcome in time.  What will make this go away even quicker is the existance of a significant advantage over current disk technology in one or more dimensions. Cheaper and faster storage can make this a reality.

Moreover, as for the ecosystem discussion, arguably the NAND ecosystem is even larger than disk.  I don’t have the figures but if one includes SSD drive producers as well as NAND semiconductor manufacturers the amount of capital investment in R&D is at least the size of disk technology if not orders of magnitude larger.

Disks will go extinct someday

So will disks become extinct, yes someday undoubtedly, but when is harder to nail down. Earlier in my career there was talk of super-paramagnetic effect that would limit how much data could be stored on a disk. Advances in heads and media moved that limit out of the way. However, there will come a time where it becomes impossible (or more likely too expensive) to increase magnetic recording density.

I was at a meeting a few years back where a magnetic head researcher predicted that such an end point to disk density increase would come in 25 years time for disk and 30 years for tape.  When this occurs disk density increase will stand still and then it’s a certainty that some other technology will take over.  Because as we all know data storage requirements will never stop increasing.

I think the other major unknown is other, non-NAND semiconductor storage technologies still under research.  They have the potential for  unlimited data retention, balanced performance and sequential performance orders of magnitude faster than disk and can become a much more functional equivalent of disk storage.  Such technologies are not commercially available today in sufficient densities and cost to even threaten NAND let alone disk devices.


So when do disks go extinct.  I would say in 15 to 20 years time we may see the last disks in enterprise storage.  That would give disks an almost an 80 year dominance over storage technology.

But in any event I don’t see disks going away anytime soon in enterprise storage.


WD’s new SiliconEdge Blue SSD data write spec

Western Digital's Silicon Edge Blue SSD SATA drive (from their website)
Western Digital's SiliconEdge Blue SSD SATA drive (from their website)

Western Digital (WD) announced their first SSD drive for the desktop/laptop market space today.  Their drive offers the typical256, 128, and 64GB capacity points over a SATA interface.  Performance looks ok at 5K random read or write IO/s with sustained transfers at 250 and 140MB/s for read and write respectively.  But what caught my eye was a new specification I hadn’t seen before indicating Maximum GB written per day of 17.5, 35 and 70GB/d for their drives using WD’s Operational Lifespan – LifeEST(tm) definition.

I couldn’t find anywhere that said which NAND technology was used in the device but it likely uses MLC NAND.  In a prior posting we discussed a Toshiba study that said a “typical” laptop user writes about 2.4GB/d and a “heavy” laptop user writes about 9.2GB/d.  This data would indicate that WD’s new 64GB drive can handle almost 2X the defined “heavy” user workload for laptops and their other drives would handle it just fine.  A data write rate for desktop work, as far as I can tell, has not been published, but presumably it would be greater than laptop users.

From my perspective more information on the drives underlying NAND technology, on what a LifeEST specification actually means, and a specification as to how much NAND storage was actually present would be nice, but these are all personal nits.  All that aside, I applaud WD for standing up and saying what data write rate their drives can support.  This needs to be a standard part of any SSD specification sheet and I look forward to seeing more information like this coming from other vendors as well.