Western Digital at SFD15: ActiveScale object storage

Phill Bullinger and his staff from Western Digital presented at Storage Field Day 15 (SFD15) on a number of their enterprise products including Tegile and IntelliFlash but the one that caught my interest was their ActiveScale object store acquired from Amplidata back in 2015.

ActiveScale is an onprem, object storage system that provides cloud-like  economics for customer data.

ActiveScale Hardware

ActiveScale systems can both scale up and scale out within a single site. ActiveScale systems have both  storage and system nodes. Storage nodes perform erasure coding and System nodes are control points and metadata managers for the object store.

ActiveScale comes in two appliance configurations that contain both storage and system nodes and storage required.  The two appliances are:

  • ActiveScale P100 is a 7U 720TB pod system and A full rack of P100s can read 8GB/sec and can have 17-9s data availability. The P100 can scale up to 2.1PB in a single rack and up to 18PB in the same namespace. The P100 is a higher performing solution with better performing storage and system nodes
  • ActiveScale X100 is a 42U rack scale solution that holds up to 588 12TB drives or 5.8PB per rack. The X100 can scale up to 9 racks or 52PB in the same namespace. The X100 is a denser configuration with only 6 storage nodes and as such, has a better $/GB than the P100 above.

As WDC is both the supplier of the ActiveScale appliance and a supplier of disk storage they can be fairly aggressive with pricing on appliance systems.

Data integrity in ActiveScale

They make a point of saying that ActiveScale object metadata and data are stored separately. By separating data and metadata, they claim to be  more resilient to system failures. Object metadata is 3 way replicated, in a replicated database, residing in system nodes. Other object systems often store metadata and object data in the same way.

Object data can be erasure coded. That is, object data is chunked, erasure coding protected and then spread across multiple disk drives for data protection. ActiveScale erasure coding is called BitSpread. With BitSpread customers identify the number of disk drives to spread object data across and the number of drive failures the system should recover from without data loss.

A typical BitSpread configuration splits object data into 18 chunks and spreads these chunks across storage columns. A storage column is from 6-18 storage nodes. There’s no pre-allocated space in BitSpread. Object data chunks are allocated to disk storage based on current capacity and performance of the system, within redundancy constraints.

In addition, ActiveScale has a background task called BitDynamics that scans  erasure coded chunks and does a mathematical health check on the object data. If a chunk is bad, the object data chunk can be recovered and re-erasure coded back to proper health.

WDC performance testing shows that BitDynamics has 0 performance degradation when performing re-erasure coding. Indeed, they took out 98 drives in an ActiveScale cluster and BitDynamics re-coded all that data onto other disk drives and detected no performance impact. No indication how long  re-encoding 98 disk drives of data took nor the % of object store capacity utilization at the time of the test but presumably there’s a report someplace to back this up

Unlike many public cloud based object storage systems, ActiveScale is strongly consistent. That is object puts (writes) are not responded back to the entity doing the put,  until the object metadata and object data are properly and safely recorded in the object store.

ActiveScale also supports 3 site erasure coding. GeoSpread is their approach to erasure coding across sites. In this case, object metadata is replicated across 3 system nodes across the sites. Object data and erasure coded information is split into 20 chunks which are then spread across the three sites.  This way if any one site goes down, the other two sites have sufficient metadata, object data chunks and erasure coded information to reconstruct the data.

ActiveScale 5.2 now supports asynch replication. That is any one ActiveScale cluster can replicate to any other ActiveScale cluster located continent distances away.

Unclear how GeoSpread and asynch replication would interact together, but my guess is that each of the 3 GeoSpread sites could be asynchronously replicated to 3 other sites for maximum redundancy.

Both GeoSpread and ActiveScale replication impact performance,  depending on how far the sites are from one another and the speed and bandwidth of the links between sites.

ActiveScale markets

ActiveScale’s biggest market is media and entertainment (M&E), mostly used for media archive or tape replacement/augmentation. WDC showed one customer case study for the Montreaux Jazz Festival, which migrated 49 years of performance videos up to ActiveScale and can now stream any performance, on request, without delay. Montreax media is GeoSpread across 3 sites in France. Another option is to perform transcoding on the object media in realtime and stream the transcoded media.

Another large market is Bio/Life Sciences. Medical & biological scanners are transitioning to higher resolution scans which take more data space. And this sort of medical information needs to be kept a long time

Data analytics on ActiveScale

One other emerging market is data analytics. With the new S3A (S3 adapter), Hadoop clusters can now support object storage as a 2nd tier. One problem with data analytics is that they have lots of data and storing it in triplicate, costs an awful lot.

In big data world, datasets can get very large very quickly. Indeed PB sizes data sets aren’t that unusual. And with triple replication (in native HDFS). When HDFS runs out of space you have to delete data. Before S3A, the only way you could increase storage you had to scale out (with compute and storage and networking) in order to add capacity.

Using Hadoop’s S3A, ActiveScale’s can provide cold archive for data analytics.  From a Hadoop user/application perspective, S3A ActiveScale storage looks like just another directory under HDFS (Hadoop Data File System). You can run MapReduce or other Hadoop application directly against object buckets. But a more realistic approach is to move inactive or cold data from an disk resident HDFS directory to a S3A directory

HDFS and MapReduce are tightly coupled and were designed to have data close to where computation happens. So,  as long as the active data or working set data is on HDFS disk storage or directly in memory the rest of the (inactive) data could all be placed on S3A object storage. Inactive data is normally historical data no longer being actively analyzed while newer data would be actively analyzed. Older, inactive data can be manually or automatically archived off to S3A. With HIVE you can partition your database to have active data in HDFS disk storage and inactive data in S3A.

Another approach is if the active, working set data can all fit directly in memory then the data can reside on S3A object storage. This way the data is read from S3A storage into memory, analyzed there and output be done back to object store or HDFS disk. Because the data is only read (loaded) once, there’s only a minimal performance penalty to use S3A storage.

Western Digital is an active contributor to Hadoop S3A and have recently added performance improvements to S3A, such as better caching, partial object reading, and core XML performance tuning options.

~~~~
If your interested in learning more about Western Digital ActiveScale, check out the videos referenced earlier and their website.

Also you may be interested in these other posts on the WD sessions at SFD15:

The A is for Active, The S is for Scale by Dan Firth (@PenguinPunk)

Comments?

A knowledge ark, the Arch project

Read an article last week on the Arch Mission Foundation project, which is a non-profit, organization that intends “to continuously preserve and disseminate human knowledge throughout time and space”.

The way I read this is they want to capture, preserve  and replicate all mankind’s knowledge onto (semi-)permanent media and store this information  at various locations around the globe and wherever we may go.

Interesting way to go about doing this. There are plenty of questions and considerations to capturing all of mankind’s knowledge.

Google’s way

 Google has electronically scanned every book in a number of library partners to help provide a searchable database of literature, check out the Google Books Library Project.

There’s over 40 library partners around the globe and the intent of the project was to digitize their collections. The library partners can then provide access to their digital copies. Google will provide full access to books in the public domain and will provide search results for all the rest, with pointers as to where the books can be found in libraries, purchased and otherwise obtained.

Google Books can be searched at Google Books. Last I heard they had digitized over 30M books from their library partners, which is pretty impressive since the Library of Congress has around 37M books. Google Books is starting to scan magazines as well.

Arch’s way

The intent is to create Arch’s (pronounced Ark’s) that can last billions of years. The organization is funding R&D into long lived storage technologies.

Some of these technologies include:

  • 5D laser optical data storage in quartz, I wrote about this before (see my 5D storage … post). Essentially, they are able to record two-tone scans of documents in transparent quartz that can last eons. Data is recorded in 5 dimensions, size of dot, polarity of dot  and 3 layers of dot locations through the media. 5D media lasts for 1000s of years.
  • Nickel ion-beam atomic scale storage, couldn’t find much on this online but we suppose this technology uses ion-beams to etch a nickel surface with nano-scale information.
  • Molecular storage on DNA molecules, I wrote about this before as well (see my DNA as storage… post) but there’s been plenty of research on this more recently. A group from Padua, IT  shows the way forward to use bacteria as a read/write head for DNA storage and there are claims that a gram of DNA could hold a ZB (zettabyte, 10**21 bytes) of data. For some reason Microsoft has been very active in researching this technology and plan to add it to Azure someday.
  • Durable space based flash drives, couldn’t find anything on this technology but assume this is some variant of NAND storage optimized for long duration.  Current NAND loses charge over time. Alternatively, this could be a version of other NVM storage, such as, MRAM, 3DX, ReRAM, Graphene Flash, and  Memristor all of which I have written about
  • Long duration DVD technology, this is sort of old school but there exists archive class WORM DVDs out and available on the market today, (see my post on M[illeniata]-Disc…).
  • Quantum information storage, current quantum memory lifetimes don’t much over exceed 180 seconds, but this is storage not memory. Couldn’t find much else on this, but it might be referring to permanent data storage with light.
M-Disc (c) 2011 Millenniata (from their website)
M-Disc (c) 2011 Millenniata (from their website)

They seem technology agnostic but want something that will last forever.

But what knowledge do they plan to store

In Arch’s FAQ they talk about open data sets like Wikipedia and the Internet Archive. But they have an interesting perspective on which knowledge to save. From an advanced future civilization perspective, they are probably not as interested in our science and technology but rather more interested in our history, art and culture.

They believe that science and technology should be roughly the same in every advanced civilization. But history, art and culture are going to be vastly different across different civilizations. As such, history, art and culture are uniquely valuable to some future version of ourselves or any other advanced scientific civilization.

~~~~

Arch intends to have multiple libraries positioned on the Earth, on the Moon and Mars over time. And they are actively looking for donations and participation (see link above).

Although, I agree that culture, art and history will be most beneficial to any advanced civilization. But there’s always a small but distinct probability that we may not continue to exist as an advanced scientific civilization. In that case, I would think, science and technology would also be needed to boot strap civilization.

To the Wikipedia, I would add GitHub, probably Google Books, and PLOS as well as any other publicly available scientific or humanities journals that available.

And don’t get me started on what format to record the data with. Needless to say, out-dated formats are going to be a major concern for anything but a 2D scan of information after about ten years or so.

In any case, humanity and universanity needs something like this.

Photo Credit(s): The Arch Mission Foundation web page

Google Books Library search on Republic results

“Five dimensional glass disks …” from The Verge

M-disk web page