We talked with Exablox a month or so ago at Storage Field Day 10 (SFD10) and they discussed some of their unique storage solution and new software functionality. If you’re not familiar with Exablox they sell a OneBlox appliance with drive slots, but no data drives.
The OneBlox appliance provides a Linux based, scale-out, distributed object storage software with a file system in front of it. They support SMB and NFS access protocols and have inline deduplication, data compression and continuous snapshot capabilities. You supply the (SATA or SAS) drives, a bring your own drive (BYOD) storage offering.
Their OneSystem management solution is available on a subscription basis, which usually runs in the cloud as a web accessed service offering used to monitor and manage your Exablox cluster(s). However, for those customers that want it, OneSystem is also available as a Docker Container, where you can run it on any Docker compatible system. Continue reading “Exablox, bring your own disk storage”
There’s been an ongoing debate in the analyst community about the advantages of software only innovation vs. hardware-software innovation (see Commodity hardware loses again and Commodity hardware always loses posts). Here is another example where two separate companies have turned to hardware innovation to take storage innovation to the next level.
These two arrays seem to be going after opposite ends of the storage market: the 5U DSSD D5 is going after both structured and unstructured data that needs ultra high speed IO access (<100µsec) times and the 4U FlashBlade going after more general purpose unstructured data. And yet the two have have many similarities at least superficially. Continue reading “A tale of two AFAs: EMC DSSD D5 & Pure Storage FlashBlade”
Rubrik has been around since January 2014 and just GA’d in April of last year. They recently presented at TechFieldDay 10 (TFD10, videos here) with Chris Wahl, Technical Evangelist, Arvin “Nitro” Nithrakashyap, Co-Founder and Bipul Sinha, Co-Founder, in attendance.
Springpath presented at SFD7 and has a new Software Defined Storage (SDS) that attempts to provide the richness of enterprise storage in a SDS solution running on commodity hardware. I would encourage you to watch the SFD7 video stream if you want to learn more about them.
Their core storage architecture is called HALO which stands for Hardware Agnostic Log-structured Object store. We have discussed log-structured file systems before. They are essentially a sequential file that can be randomly accessed (read) but are sequentially written. Springpath HALO was written from scratch, operates in user space and unlike many SDS solutions, has no dependencies on Linux file systems.
HALO supports both data deduplication and compression to reduce storage footprint. The other unusual feature is that they support both blade servers and standalone (rack) servers as storage/compute nodes.
Tiers of storage
Each storage node can optionally have SSDs as a persistent cache, holding write data and metadata log. Storage nodes can also hold disk drives used as a persistent final tier of storage. For blade servers, with limited drive slots, one can configure blades as part of a caching tier by using SSDs or PCIe Flash.
All data is written to the (replicated) caching tier before the host is signaled the operation is complete. Write data is destaged from the caching tier to capacity tier over time, as the caching tier fills up. Data reduction (compression/deduplication) is done at destage.
The caching tier also holds read cached data that is frequently read. The caching tier also has a non-persistent segment in server RAM.
Write data is distributed across caching nodes via a hashing mechanism which allocates portions of an address space across nodes. But during cache destage, the data can be independently spread and replicated across any capacity node, based on node free space available. This is made possible by their file system meta-data information.
The capacity tier is split up into data and a meta-data partitions. Meta-data is also present in the caching tier. Data is deduplicated and compressed at destage, but when read back into cache it’s de-compressed only. Both capacity tier and caching tier nodes can have different capacities.
HALO has some specific optimizations for flash writing which includes always writing a full SSD/NAND page and using TRIM commands to free up flash pages that are no longer being used.
HALO SDS packaging under different Hypervisors
In Linux & OpenStack environments they run the whole storage stack in Docker containers primarily for image management/deployment, including rolling upgrade management.
In VMware and HyperVM, Springpath runs as a VM and uses direct path IO to access the storage. For VMware Springpath looks like an NFSv3 datastore with VAAI and VVOL support. In Hyper-V Springpath’s SDS is an SMB storage device.
For KVM its an NFS storage, for OpenStack one can use NFS or they have a CINDER plugin for volume support.
The nice thing about Springpath is you can build a cluster of storage nodes that consists of VMware, HyperV and bare metal Linux nodes that supports all of them. (Does this mean it’s multi protocol, supporting SMB for Hyper-V, NFSv3 for VMware?)
Springpath supports (mostly) file, block (via Cinder driver) and object access protocols. Backend caching and capacity tier all uses a log structured file structure internally to stripe data across all the capacity and caching nodes. Data compression works very well with log structured file systems.
All customer data is supported internally as objects. HALO has a write-log which is spread across their caching tier and a capacity-log which is spread across the capacity tier.
Data is automatically re-balanced across nodes when new nodes are added or old nodes deleted from the cluster.
Data is protected via replication. The system uses a minimum of 3 SSD nodes and 3 drive (capacity) nodes but these can reside on the same servers to be fully operational. However, the replication factor can be configured to be less than 3 if you’re willing to live with the potential loss of data.
Their system supports both snapshots (2**64 times/object) and storage clones for test dev and backup requirements.
Springpath seems to have quite a lot of functionality for a SDS. Although, native FC & iSCSI support is lacking. For a file based, SDS for hypbervisors, it seems to have a lot of the bases covered.
They were imaging slices of a mouse brain with an electron microscope, in slices one millimeter square, at a micron in depth, representing just a thousand cubic microns per image. Such a scan of the full mouse brain would require 450,000 TB (0.45 EB, exabyte=10E18 bytes) of storage for the images.
Getting an equivalent resolution image of a single human brain would require 1.3 billion TB (or 1.3 ZB, zettabyte=10E21 bytes). They went on to say that the world’s digital storage was just 2.7 billion TB (or 2.7 ZB), which is where they came up with the “… nearly half the world’s digital storage capacity.”
So how much digital storage is there in the world today
Setting aside the need for such a detailed map for the moment. Let’s talk about the world’s digital storage.
Tape – I don’t have much information about the enterprise tape capacity currently available in IBM TS1120/TS1130 or Oracle T10000C/B/A but a relatively recent article indicated that the 225 millionth LTO cartridge was shipped sometime in 3Q13 which represented a capacity of 90,000 PB (or 90 EB, exabyte=10E18 bytes) of storage capacity
Disk – Although I couldn’t find a reasonable estimate of installed disk capacity, IDC reported that 2012 disk capacity shipments were 20EB and through 3Q13 there had been 24.3EB shipped. It’s probably safe to assume that capacity shipments were ~8.3EB or more in 4Q13 so we have shipped ~32.5EB of disk capacity in 2013. One estimate of worldwide disk storage capacity (also provided by IDC) is that we are doubling worldwide disk storage capacity every two years so one estimate of installed disk capacity as of the end of 4Q13 is something on the order of 113.6EB of disk storage.
I won’t delve into optical storage as that’ s even more difficult to get a handle on but my guess is it’s not quite to the level of LTO digital storage so maybe another 90EB there for a total of ~0.3ZB of digital storage in disks, LTO tape and optical.
However, back in February of 2010, researchers reported in Science that the world’s information storage capacity was 2.0 ZB of storage. Also, last October IDC reported that the US alone had a digital storage capacity of 2.6 ZB and that the US had somewhere between 24 to 40% of the world’s storage. Let’s use 33%, for simplicity sake, this would put world’s digital capacity at around 7.8ZB of storage according to IDC.
Thankfully, a human brain scan at the resolutions above would take only a sixth of the world’s digital storage based on my estimates.
But, we really need to talk about data reduction techniques
I think we need to start discussing some form of data reduction, data compression/fractal compression or even graphical encoding. For example, with appropriate software and compute power the neural scans could be encoded at appropriate levels of detail into a graphical representation. Hopefully, this should be many orders of magnitude less storage intensive. So maybe only 1/600th to 1/60,000 of all the world’s digital storage
Another approach might be to use a form of fractal compression similar to that done in motion pictures/photographic images. Perhaps, I am being naive but it seems to me that there ought to be some form of fractal encoding of neural branching. Most of nature’s branching structures have an underlying fractal basis and I see nothing in neural anatomy that would show me it’s any different.
Of course, I am not a neural biologist, but I am a storage expert and there’s got to be a way to reduce this data load somehow.
Read a recent article (actually a series of charts and text) on MIT Technical Review called Bases to Bytes which discusses how the costs of having your DNA sequenced is dropping faster than Moore’s law and how storing a person’s DNA data now takes ~100GB.
Apparently Nature magazine says ~30,000 genomes have been sequenced (not counting biotech sequenced genomes), representing ~3PB of data.
Why it takes 100GB
At the moment DNA sequencing is not doing any compression, no deduplication nor any other storage efficiency tools to reduce this capacity footprint. The 3.2Billion DNA base pairs each would take a minimum of 2 bits to store which should be ~800MB but for some reason more information about each base is saved (for future needs?) and they often re-sequence the DNA multiple times just to be sure (replica’s?). All this seems to add up to needing 100GB of data for a typical DNA sequencing output.
How they go from 0.8GB to 100GB with more info on each base pair and multiple copies or 125X the original data requirement is beyond me.
However, we have written about DNA informatics before (see our Dits, codons & chromozones – the storage of life post). In that post I estimated that human DNA would need ~64GB of storage, almost right on. (Although there was a math error somewhere in that analysis. Let’s see, 1B codons each with 64 possibilities [needing 6 bits] should require 6Bbits or ~750MB of storage, close enough).
Dedupe to the rescue
But in my view some deduplication should help. Not clear if it’s at the Codon level or at some higher organizational level (chromosome, protein, ?) but a “codon-differential” deduplication algorithm might just do the trick and take DNA capacity requirements down to size. In fact with all the replication in junk DNA, it starts to looks more and more like backup sets already.
I am sure any of my Deduplication friends in the industry such as EMC Data Domain, HP StoreOnce, NetApp, SEPATON, and others would be happy to give it some thought if adequate funding were to follow. But with this much storage at stake, some of them may take it on just to go after the storage requirements.
Gosh with a 50:1 deduplication ratio, maybe we could get a human DNA sequence down to 2GB. Then it would only take 14EB to sequence the worlds 7B population today.
Now if we could just sequence the human microbiome with metagenomic analysis of the microbiological communities of organisms that live upon, within and around all of us. Then we might have the answer to everything biologically we wanted to know about some person.
What we could do with all this information is another matter.
The original study (seeLIDAR at Angamuco) cited in the piece above was a result of the Legacies of Resilience project sponsored by Colorado State University (CSU) and goes into some detail about the data processing and archeological use of the LIDAR maps.
LIDAR sends a laser pulse from an airplane/satellite to the ground and measures how long it takes to reflect back to the receiver. With that information and “some” data processing, these measurements can be converted to an X, Y, & Z coordinate system or detailed map of the ground.
The archeologists in the study used LIDAR to create a detailed map of the empire’s main city at a resolution of +/- 0.25m (~10in). They mapped about ~207 square kilometers (80 square miles) at this level of detail. In 4 days of airplane LIDAR mapping, they were able to gather more information about the area then they were able to accumulate over 25 years of field work. Seems like digital archeology was just born.
So how much data?
I wanted to find out just how much data this was but neither the article or the study told me anything about the size of the LIDAR map. However, assuming this is a flat area, which it wasn’t, and assuming the +/-.25m resolution represents a point every 625sqcm, then the area being mapped above should represent a minimum of ~3.3 billion points of a LIDAR point cloud.
Given the above I estimate the 207sqkm LIDAR grid point cloud represents a minimum of ~172GB of data. There are LIDAR compression tools available, but even at 50% reduction, it’s still 85GB for 210sqkm.
My understanding is that the raw LIDAR data would be even bigger than this and the study applied a number of filters against the LIDAR map data to extract different types of features which of course would take even more space. And that’s just one ancient city complex.
With all the above the size of LIDAR raw data, grid point fields, and multiple filtered views is approaching significance (in storage terms). Moving and processing all this data must also be a problem. As evidence, the flights for the LIDAR runs over Angamuco, Mexico occurred in January 2011 and they were able to analyze the data sometime that summer, ~6 months late. Seems a bit long from my perspective maybe the data processing/analysis could use some help.
Indiana Jones meets Hadoop
That was the main subject of the second paper mentioned above done by researchers at the San Diego Supercomputing Center (SDSC). They essentially did a benchmark comparing MapReduce/Hadoop running on a relatively small cluster of 4 to 8 commodity nodes against an HPC cluster (running 28-Sun x4600M2 servers, using 8 processor, quad core nodes, with anywhere from 256 GB to 512GB [only on 8 nodes] of DRAM running a C++ implementation of the algorithm.
The results of their benchmarks were that the HPC cluster beat the Hadoop cluster only when all of the LIDAR data could fit in memory (on a DRAM per core basis), after that the Hadoop cluster performed just as well in elapsed wall clock time. Of course from a cost perspective the Hadoop cluster was much more economical.
The 8-node, Hadoop cluster was able to “grid” a 150M LIDAR derived point cloud at the 0.25m resolution in just a bit over 10 minutes. Now this processing step is just one of the many steps in LIDAR data analysis but it’s probably indicative of similar activity occurring earlier and later down the (data) line.
Let’s see 172GB per 207sqkm, the earth surface is 510Msqkm, says a similar resolution LIDAR grid point cloud of the entire earth’s surface would be about 0.5EB (Exabyte, 10**18 bytes). It’s just great to be in the storage business.
[long post 945 wds] HP held their (annual?) HP Tech Days in Fort Collins, Colorado this last week. We had presentations from a number of HP product managers and got to meet a number of new and old bloggers there.
Craig Nunes VP of Marketing, HP Storage got up and led off the day’s discussion talking about recent results. HP disk storage is up 11% for the quarter, 3par is growing by triple digit growth (QoQ maybe YoY?) and channel sales are growing by 10%. HP storage is gaining market share, grew 3% for the quarter. Also, HP is #2 is shipped backup appliances (1H11). The current focus for HP storage is in three areas:
Invest in established platforms, MSA and EVA (with a 100K customers)
Invest in converged storage aimed at new data centers, 3PAR, Lefthand, IBRIX and StoreOnce.
Invest in converged systems knocking down barriers between servers, storage and networking with Virtual Systems.
Craig spent most of his time talking about converged storage. HP’s converged storage includes:
built in autonomic storage automating operations with one pain of glass and an orchestration layer on top to oversee everything.
scale out storage providing simpler ways to grow storage.
built on standardized platforms using off the shelf server platform technology
Craig ended up discussing HP’s Virtual System, their response to VCE’s Vblock, NetApp’s FlexPod and Dell’s vStart Bundle. HP’s Virtual System was announced earlier last year and has been doing well in the market.
Brad Katz, Product Manager got up next and talked about Lefthand storage solutions. Lefthand’s portfolio now ranges from the Virtual Storage Appliance (VSA) all the way up to a P4800 SAN storage blade with P4300 and P4500 rackmountable storage systems between those two. Lefthand systems provide a clustered, scale-out IP/SAN and NAS storage. Cluster data is striped across all disks in all storage nodes.
The VSA runs as a virtual machine and utilizes any ESX (direct or SAN attached) storage. The P4800 operates as a storage blade in an HP blade server and uses storage in the blade system. The two rackmount systems P4300 and P4500 connect to SAS attached, external disk shelves.
Steve Johnson and Mat Jacoby talked next about the StoreOnce deduplicating backup appliance product line. StoreOnce is an HP R&D Labs home grown, deduplication technology which provides balanced ingest-restore rates and memory efficient deduplication. The current product line spans D2D25xx, D2D41xx, D2D43xx and the recently announced, B6200 backup storage blade.
StoreOnce use a variable block, 4K chunksize and a sparse index which saves on server memory size which both lead to great deduplication rates. Most deduplication functionality is memory intensive making it hard to scale without increasing memory or using different dedupe engines across a product line. StoreOnce’s sparse indexing fixed that issue and as such, can use the same deduplication engine across their entire product line.
Jim Richardson or JR, a 3PAR SE from the start, got up and discussed 3PAR. Early on, 3PAR brought to the market three characteristics that differentiated it from other enterprise storage products:
Multi-tennancy – today’s cloud service providers and just about anyone running enterprise storage needs to support mixed workloads on shared storage. 3PAR’s ASIC allows data to be placed on any storage node and be serviced at direct access speeds to better support these multi-application environments.
Thin provisioning – although certainly not the first to support thin provisioning (Iceberg was the first), 3PAR did much to popularize it. Once again the ASIC provides automated support for thin provisioning.
Autonomic functionality – optimization of storage performance across nodes and tiers of storage was also helped by their ASIC’s ability to transfer data without involving processor interaction. Also 3PAR, tried to take the drudgery out of administration by automatically wide striping and making provisioning easier.
Jim Hankins and Chris Duffy came up next and talked about the X9000 IBRIX storage system. Ibrix has intrinsic scale out NAS support and provides automatic failover across dual processing nodes called couplets. The B6200 backup system (see above) is based on Ibrix technology. Ibrix supports a 15PB single name space that is segmented across cluster couplets. Ibrix also comes in a gateway configuration using shared SAN storage behind it.
Robert Thompson got up and talked about the X5000 Windows Server WSS based NAS product. It is the industry’s first two node file system with active/active clustering in a box. As the product runs Windows Server, one can run Anti-Virus or other server applications directly on the storage and is customer maintainable. Robert pulled out every replaceable unit in the system. Apparently the E5000, HP Storage’s Exchange Appliance is also based on the same hardware. The two servers in the storage system are clustered together using MSCS.
In the afternoon we went on a lab tour and got to see some of HP’s storage and data center cooling technology on display.
On the second day, Mike Koponen got up and discussed HP’s Virtual System (or Vblock competitor) and Aboubacar Diare gave some of his opinions on VMware VAAI & VASA integration from his testing perspective. Finally, Calvin Zito wrapped up the two day event and everyone (except me and a few others) went on a brewery tour.
All in all, we had a good time with HP. Too bad, I didn’t get to go on the New Belgium Brewery tour, perhaps next time.