In this episode, the Greybeards discuss the year in storage and naturally we kick off with the consolidation trend in the industry and the big one last year, the DELL-EMC acquisition. How the high margin EMC storage business is going to work in a low margin company like Dell is the subject of much speculation. That and which of the combined companies storage products will make it through the transition make for interesting discussions. And Finally what exactly is Dell’s long term strategy is another question.
We next turn to the coming of age of object storage. A couple of years ago, object storage was being introduced to a wider market but few wanted to code to RESTful interfaces. Nowadays, that seems to be less of a concern and the fact that one can have onsite/offsite/cloud based object storage repositories from open source, proprietary solutions and everything in between is making object storage a much more appealing option to enterprise IT.
Finally, we discuss the new Tier 0. What with NVMe SSDs and the emergence of NVMe over Fabric coming out last year, Tier 0 has never looked so promising. You may recall that Tier 0 was hot about 5 years with TMS and Violin and others coming out with lightning fast storage IO. But with DELL-EMC DSSD: startups (E8 storage, Mangstor, Apeiron data systems, and others); NVMDIMMs, CrossBar, and Everspin coming out with denser offerings; and other SCM (Micron, HPE, IBM, others?) technologies on the horizon, Tier 0 has become red hot again.
Sorry about the occasional airplane noise and other audio anomalies. The podcast runs over 47 minutes. Howard and I could talk for hours on what’s happening in the storage industry. Listen to the podcast to learn more.
In this episode, we talk with Matt Starr (@StarrFiles), CTO of Spectra Logic, the deep storage experts. Matt has been around a long time and Ray’s shared many a meal with Matt as we’re both in NW Denver. Howard has a minor quibble with Spectra Logic over the use of his company’s name (DeepStorage) in their product line but he’s also known Matt for awhile now.
Matt and Spectra Logic have a number of customers with multi-PB to over an EB of data repository problems and how to take care of these ever expanding storage stashes is an ongoing concern. One of the solutions Spectra Logic offers is the Black Pearl Deep Storage, which provides an object storage, RESTfull interface front end to storage tiering/archive backend that uses flash, (spin-down) disk, (LTFS) tape (libraries) and the (AWS) cloud as backend storage.
Major portions of the Black Pearl are open sourcedand available on GitHub. I see several (DS3-)SDK’s for Java, Python, C, and others. Open sourcing the product provides an easy way for client customization. In fact, one customer was using CEPH and they modified their CEPH backup client to send a copy of data off to the Pearl.
We talk a bit about the Black Pearl’s data integrity. It uses a checksum, computed over the object at creation time which is then verified anytime the object is retrieved, copied, moved or migrated and can be validated periodically (scrubbed), even when it has not been touched.
Super Computing’s interesting (storage) problems
Matt just returned from the SC16 (Super Computing Conference 2016) in Salt Lake City last month. At the conference there were plenty of MultiPB customers that were looking for better storage alternatives.
One customer Matt mentioned was the Square Kilometer Array, the world’s largest radio telescope which will be transmitting 700TB/hour, over an 1EB per year. All that data has to land somewhere and for this quantity (>eb) of data, tape becomes an necessary choice.
Matt likened Spectra’s archive solutions to warehouses vs. factories. For the factory floor, you need responsive (AFA or hybrid) primary storage but for the warehouse, you just want cheap, bulk storage (capacity).
The podcast runs long, over 51 minutes, and reveals a different world from the GreyBeards everyday enterprise environments. Specifically customers that have extra large data repositories and how they manage to survive under the data deluge. Matt’s an articulate spokesperson for Spectra Logic and their archive solutions and we could have talked about >eb data repositories for hours. Listen to the podcast to learn more.
Matt Starr’s tenure with Spectra Logic spans 24 years and includes experience in service, hardware design, software development, operating systems, electronic design and management. As CTO, he is responsible for helping define the company’s product vision, and serves as the executive representative for the voice of the market. He leads Spectra’s efforts in high-performance computing, private cloud and other vertical markets.
Matt served as the lead engineering architect for the design and production of Spectra’s TSeries tape library family. Spectra Logic has secured more than 50 patents under Matt’s direction, establishing the company as the innovative technology leader in the data storage industry. He holds a BS in electrical engineering from the University of Colorado at Colorado Springs.
In this episode, we talk with Rob Peglar (@PeglarR), Senior VP and CTO of Symbolic IO, a computationally defined storage vendor. Rob has been around almost as long as the GreyBeards (~40 years) and most recently was with Micron and prior to that, EMC Isilon. Rob is also on the board of SNIA.
Symbolic IO has emerged out of stealth earlier this year and intends to be shipping products by late this year/early next. Rob joined Symbolic IO in July of 2016.
What’s computational storage?
It’s all about symbolic representation of bits. Symbolic IO has come up with a way to encode bit streams into unique symbols that offer significant savings in memory space, beyond standard data compression techniques.
All that would be just fine if it was at the end of a storage interface and we would probably just call it a new form of data reduction. But Symbolic IO also incorporates persistent memory (NV-DIMMs, in the future 3D XPoint, RERam, others) and provides this symbolic data inside a server, directly through its processor data cache, in (decoded) raw data form.
Symbolic IO provides a translation layer between persistent memory and processor cache that decodes the symbolic representation of the data in persistent memory for data reads on the way into data cache and encodes the symbolic representation of the raw data for data writes on the way out of cache to persistent memory.
Rob says that the mathematics are there to show that Symbolic IO’s data reduction is significant and that the decode/encode functionality can be done in a matter of a few clock cycles per cache (line) access on modern (Intel) processors.
The system continually monitors the data it sees to determine what the optimum encoding should be and can change its symbolic table to provide more memory savings for new data written to persistent memory.
All this reminds the GreyBeards of Huffman encoding algorithms for data compression (which one of us helped deploy on a previous [unnamed] storage product). Huffman encoding transformed ASCII (8-bit) characters into variable length bit streams.
Symbolic IO will offer 3 products:,
IRIS™ Compute, which provides a persistent memory storage, accessed using something like the Linux pmem library and includes Symbolic StoreModules™ (persistent memory hardware);
IRIS Vault, which is an appliance with its own (IRIS) infused Linux (Symbolic’s SymCE™) OS plus Symbolic IO StoreModules, that can run any Linux application without change accessing the persistent memory and offers full data security, next generation snapshot-/clone-like capabilities with BLINK™ full storage backups, and offers enhanced physical security with the removable, IRIS Advanced EYE ASIC; and
IRIS Store, which extends the IRIS Vault and IRIS Compute above with more tiers of storage, using Symbolic IO StoreModules as Tier1, PCIe (flash) storage as Tier 2 and external SSD storage as Tier 3 storage.
For more information on Symbolic IO’s three products, so we would encourage you to read their website (linked above).
The podcast runs long, over 47 minutes, and was wide ranging, discussing some of the history of processor/memory/information technologies. It was very easy to talk with Rob and both Howard and I have known Rob for years, across multiple vendors & organizations. Listen to the podcast to learn more.
Rob Peglar is the Senior Vice President and Chief Technology Officer of Symbolic IO. Rob is a seasoned technology executive with 39 years of data storage, network and compute-related experience, is a published author and is active on many industry boards, providing insight and guidance. He brings a vast knowledge of strategy and industry trends to Symbolic IO. Rob is also on the Board of Directors for the Storage Networking Industry Association (SNIA) and an advisor for the Flash Memory Summit. His role at Symbolic IO will include working with the management team to help drive the future product portfolio, executive-level forecasting and customer/partner interaction from early-stage negotiations through implementation and deployment.
Prior to joining Symbolic IO, Rob was the Vice President, Advanced Storage at Micron Technology, where he led next-generation technology and architecture enablement efforts of Micron’s Storage Business Unit, driving storage solution development with strategic customers and partners. Previously he was the CTO, Americas for EMC where he led the entire CTO functions for the Americas. He has also held senior level positions at Xiotech Corporation, StorageTek and ETA Systems.
Rob’s extensive experience in data management, analytics, high-performance computing, non-volatile memory, distributed cluster architectures, filesystems, I/O performance optimization, cloud storage and replication and archiving, networking, virtualization makes him a sought after industry expert and board member. He was named an EMC Elect in 2014, 2015 and 2016. He was one of 25 senior executives worldwide selected for the CRN ‘Storage Superstars’ Award in 2010.
In this episode, we talk with Donna Dillenberger (@DonnaExplorer), IBM Fellow on IBM’s work with blockchain technology. Ray was at IBM Edge Conference last month where Donna and others presented on what BlockChain technology could do for financial services and asset provenance. Ray wrote a post on Blockchains at IBM after the conference.
Blockchain is the technology behind Bitcoins, the crypto-currency, but the technology has the potential to revolutionize a lot of other activities.
What does blockchain have to do with storage? Probably not that much, but as it’s an up and coming technology with great prospects, the GreyBeards thought it worthwhile to find out more.
Blockchain is essentially a software protocol to establish trust where there is none. At another level, it is a programatic way to maintain a shared ledger of information, without compromise.
The funny thing about ledgers and record keeping in general, is that they are everywhere. From, the first record of written language, to double entry accounting, to todays keeping track of financial transactions, ledgers do it all.
Blockchains is just an updated, software protocol version of good ledger keeping.
What’s so special about blockchain ledgers is that they can be maintained correctly and consistently even with entities/persons/servers that are trying to cheat the system.
There’s a group of Byzantine armies surrounding a castle and some want to attack while others want to retreat, and they would all like to coordinate their actions. But some Byzantine generals are traitors and will selective tell some generals to attack while telling others to retreat, in an attempt disrupt any coordinated actions.
Generalizing the problem, when there are a number of independent entities, how does one determine consensus such that no one entity can cheat the system. CS calls this a Byzantine Fault Tolerance (BFT) algorithm.
Algorithmic consensus in blockchain
With Bitcoin blockchain (Donna calls this blockchain V1.0), consensus is achieved by “Proof of work“, a computational problem difficult to produce but easy to verify.
But Proof of work is not the only way to achieve algorithmic consensus for blockchains. HyperLedger, an open source blockchain project has a pluggable form of consensus. So, different Hyperledger blockchains can support different forms of consensus.
Currently, Hyperledger support a BFT algorithm, which says that 2/3rds +1 of the nodes must agree on a hash (digitally signed current transaction data and historical info) value to reach consensus.
It turns out that Hyperledger blockchains use a key-value store to record transactions history and other metadata, which is RocksDB.
Other current blockchains
At IBM Edge, Donna discussed an IBM supply chain blockchain where suppliers and consumers record sending, receipt and other movement of parts around IBM’s world wide supply chain. It uses a Hyperledger blockchain.
The Everledger blockchain is being used to supply diamond provenance/pedigree validation. Each diamond is encoded with a digital barcode as it’s mined, and as the diamond is processed, cut and sent to wholesalers/retailers with each of those transactions maintained in the blockchain. One can easily validate the origin, clarity, color, carrot and cut of a diamond by examining it’s transaction history on the blockchain.
IBM Blockchain activities
IBM wrote the Hyperledger code from scratch to run on z/Linux but their financial services customers wanted it open sourced. So, IBM donated it to the Linux Foundation and sponsored the Hyperledger project. It’s currently the fastest growing Linux Foundation open source project at the moment. You can run a Hyperledger apps an any Linux system.
IBM z/Linux has some unique security characteristics useful for financial services and other critical organizations/industries. For instance, secure application signing/verification to run, data at rest/in-flight encryption with secured keys and crypto code, and a secure cloud where the hardware is run.
IBM also offers professional services to help customers create and host their own Hyperledger apps. Moreover. IBM are sponsoring Hyperledger hackathons to add features and are sponsoring other Hyperledger community events.
The podcast runs long, over 50 minutes and introduces blockchain technology, where it can be used, and what IBM is doing with it. Howard and I could have talked with Donna for hours on the topic but we had to stop sometime. . Listen to the podcast to learn more.
Donna Dillenberger is an IBM Fellow at IBM’s Watson Research Center. She has redesigned many enterprise applications for greater scalability and availability. She has worked on analytic models for financial, insurance, retail and healthcare industries.
In 2005, she became IBM’s Chief Technology Officer of IT Optimization. In 2006, she became an Adjunct Professor at Columbia University’s Graduate School of Engineering. She is a Master Inventor and is currently working on cognitive analytics and blockchain.
In this episode, we talk with Andy Banta (@andybanta), Storage Janitor (Principal Virt. Architect), Netapp SolidFire. Andy’s been involved in Virtual Volumes (VVOLs) and other VMware API implementations at SolidFire and worked at VMware and other storage/system vendor companies before that.
Howard and I were at VMworld2016 late last month and we thought Andy would be a good person to discuss what went there this year.
No VVOLs & VSAN news at the show
Although, we all thought there’d be another release of VVOLs and VSAN announced at the show, VMware announced Cloud Foundation and Cross-Cloud Services. If anything the show was a bit mum about VMware Virtual Volumes (VVOLs) and Virtual SAN™ (VSAN) this year as compared to last.
On the other hand, Andy’s and other VVOL technical sessions were busy at the conference. And one of them ended up having standing room only and was repeated at the show, due to the demand. Customer interest in VVOLs seems to be peaking.
Our discussion begins with why VVOLs was sidelined this year. One reason was that there was a focus from VMware and their ecosystem on Hyper Converged Infrastructure (HCI) and HCI doesn’t use storage arrays or VVOL.
Howard and I suspected with VMware’s ecosystem growing ever larger, validation and regression testing is starting to consume more resources. But Andy, suggested that’s not the issue, as VMware uses self-certification, where vendors run tests that VMware supplies to show they meet API requirements. VMware does bring in a handful of vendor solutions (5 for VVOLs) for reference architectures and to insure the APIs meet (major) vendor requirements but after that, it’s all self certification.
Another possibility was that the DELL-EMC acquisition (closed 9/6) could be a distraction. But Andy said VMware’s been and will continue on as an independent company and the fact that EMC owned ~84% of the stock never impacted VMware’s development before. So DELL’s acquisition shouldn’t either.
Finally we suggested that executive churn at VMware could be the problem. But Andy debunked that and said the amount of executive transitions hasn’t really accelerated over the years.
After all that, we concluded that just maybe the schedule had slipped and perhaps we will see something new in Barcelona for VVOLs and VMware APIs for Storage Awareness (VASA), at VMworld2016 Europe.
Cloud Foundation and Cross-Cloud Services
What VMware did announce was VMware Cloud Foundation and Cross-Cloud Services. This seems to signal a shift in philosophy to be more accommodating to the public cloud rather than just competing with them.
VMware Cloud Foundation is a repackaging of VMware Software Defined Data Center (SDDC), NSX®, VSAN and vSphere® into a single bundle that customers can use to spin up a private cloud with ease.
VMware Cross-Cloud Services is a set of targeted software for public cloud deployment to ease management and migration of services . They showed how NSX could be deployed over your cloud instances to control IP addresses and provide micro-segmentation services and how other software allows data to be easily migrated between the public cloud and VMware private cloud implementations. Cross Cloud Services was tech previewed at the show and Ray wrote a post describing them in more detail (please see VMworld2016 Day 1 Cloud Foundation & Cross-Cloud Services post).
Howard talked about how difficult it can be to move workloads to the cloud and back again. Most enterprise application data is just too large to transfer quickly and to complex to be a simple file transfer. And then there’s legal matters for data governance, compliance and regulatory regimens that have to be adhered to which make it almost impossible to use public cloud services.
On the other hand, Andy talked about work they had done at SolidFire to use cloud in development. They moved some testing to the cloud to spin up 1000s of (SolidFire simulations) instances to try to catch an infrequent bug (occurring once every 10K runs). They just couldn’t do this in their lab. In the end they were able to catch and debug the problem much more effectively using public cloud services.
Howard mentioned that they were also using AWS as an IO trace repository for benchmark development work he is doing. AWS S3 as a data repository has been a great solution for his team, as anyone can upload their data that way. By the way, he is looking for a data scientist to help analyze, this data if anyone’s interested.
In general, workloads are becoming more transient these days. Public cloud services are encouraging this movement but Docker and micro services are also having an impact.
One can even see this sort of trend in VMware VVOLs, which can be another way to enable more transient workloads. VVOLs can be created and destroyed a lot quicker than Vdisks in the pasts. In fact, some storage vendors are starting to look at VVOLs as transient storage and are improving their storage and meta-data garbage collection accordingly.
Earlier this year Howard, Andy and I were all at a NetApp SolidFire Analyst event in Boulder. At that time, SolidFire said that they had implemented VVOLs so well they considered “VVOLs done right”. I asked Andy what was different with SolidFire’s VVOL implementation. One thing they did was completely separate the Protocol endpoints from the storage side. Another was to provide QoS at the VM level that could be applied to a single or 1000s of VMs
Andy also said that SolidFire had implemented a bunch of scripts to automate VVOL policy changes across 1000s of objects. SolidFire wanted to make use of these scripts for their own VVOL implementation but as they could apply to any vendors implementation of VVOLs, they decided to open source them.
The podcast runs over 42 minutes and covers a broad discussion of the VMware ecosystem, the goings on at VMworld and SolidFire’s VVOL implementation. Listen to the podcast to learn more.
Andy is currently a Storage Janitor acting as a Principal Virtualization Architect at NetApp SolidFire, focusing on VMware integration and Virtual Volumes. Andy was a part of the Virtual Volumes development team at SoldiFire.
Prior to SolidFire, he was the iSCSI Tech Lead at VMware, as well as being on the engineering teams at DataGravity and Sun Microsystems.
Andy has presented at numerous VMworlds, as well as several VMUGs and other industry conferences. Outside of work, and enjoys racing cars, hiking and wines. Find him on twitter at @andybanta.