39: Greybeards talk deep storage/archive with Matt Starr, CTO Spectra Logic

In this episode, we talk with Matt Starr (@StarrFiles),  CTO of Spectra Logic, the deep storage experts. Matt has been around a long time and Ray’s shared many a meal with Matt as we’re both in NW Denver. Howard has a minor quibble with Spectra Logic over the use of his company’s name (DeepStorage) in their product line but he’s also known Matt for awhile now.

The Pearl

Matt and Spectra Logic have a number of customers with multi-PB to over an EB of data repository problems and how to take care of these ever expanding storage stashes is an ongoing concern.  One of the solutions Spectra Logic offers is the Black Pearl Deep Storage, which provides an object storage, RESTfull interface front end to storage tiering/archive backend that uses flash, (spin-down) disk, (LTFS) tape (libraries) and the (AWS) cloud as backend storage.

Major portions of the Black Pearl are open sourced and available on GitHub. I see several (DS3-)SDK’s for Java, Python, C, and others. Open sourcing the product provides an easy way for client customization. In fact, one customer was using CEPH and they modified their CEPH backup client to send a copy of data off to the Pearl.

We talk a bit about the Black Pearl’s data integrity. It uses a checksum, computed over the object at creation time which is then verified anytime the object is retrieved, copied, moved or migrated and can be validated periodically (scrubbed), even when it has not been touched.

Super Computing’s interesting (storage) problems

Matt just returned from the SC16 (Super Computing Conference 2016) in Salt Lake City last month. At the conference there were plenty of MultiPB customers that were looking for better storage alternatives.

One customer Matt mentioned  was the Square Kilometer Array, the world’s largest radio telescope which will be transmitting 700TB/hour, over an 1EB per year.  All that data has to land somewhere and for this quantity (>eb) of data, tape becomes an necessary choice.

Matt likened Spectra’s  archive solutions to warehouses vs. factories. For the factory floor,  you need responsive (AFA or hybrid) primary storage but for the warehouse, you just want cheap, bulk storage (capacity).

The podcast runs long, over 51 minutes, and reveals a different world from the GreyBeards everyday enterprise environments. Specifically customers that have extra large data repositories and how they manage to survive under the data deluge. Matt’s an articulate spokesperson for Spectra Logic and their archive solutions and we could have talked about >eb data repositories for hours.  Listen to the podcast to learn more.

matt-starrMatt Starr, CTO, Spectra Logic

Matt Starr’s tenure with Spectra Logic spans 24 years and includes experience in service, hardware design, software development, operating systems, electronic design and management. As CTO, he is responsible for helping define the company’s product vision, and serves as the executive representative for the voice of the market. He leads Spectra’s efforts in high-performance computing, private cloud and other vertical markets.

Matt served as the lead engineering architect for the design and production of Spectra’s TSeries tape library family. Spectra Logic has secured more than 50 patents under Matt’s direction, establishing the company as the innovative technology leader in the data storage industry. He holds a BS in electrical engineering from the University of Colorado at Colorado Springs.

33: GreyBeards talk HPC storage with Frederic Van Haren, founder HighFens & former Sr. Director of HPC at Nuance

IMG_6319In episode 33 we talk with Frederic Van Haren (@fvha), founder of HighFens, Inc. (@HighFens), a new HPC consultancy and former Senior Director of HPC at Nuance Communications. Howard and I got a chance to talk with Frederic at a recent HPE storage deep dive event, I met up with him again during SFD10, where he was talking on behalf of Kaminario, and he was also at HPE Discover conference last week.

Nuance is the backend speech recognition engine for a number of popular service offerings. Nuance looks very similar to a lot of other hyper-scale customers and ultimately, we feel may be the way of the future for all IT over the coming decades.  Nuance’s data storage journey since Frederic’s tenure with the company holds many lessons for all of us in the storage industry

Nuance currently has ~6PB usable (~16PB raw) of speech wave files as well as uncountable text and other files, all inside IBM SpectrumScale (GPFS).  They have both lots of big files and lots of small files. These days, Spectrum Scale is processing 2-3M files/second. They have doubled capacity for each of the last 9 years, and today handle a billion new files a month. GPFS stripes data across storage, provides data protection, migration, snapshotting and storage tiering across a diverse mix of storage. At the end of the podcast we discussed some open source alternatives to Spectrum Scale but at the time Nuance started down this path,  GPFS was found to be the only thing that could do the job. This proved to be a great solution as they have completely swapped out the underlying storage at least 3 times and all their users were none the wiser.

The first storage that Frederic talked about was Coraid (no longer in business) and their ATA over Ethernet storage solution. This used a SuperMicro with 24 SATA drives/shelf and they bought 40 shelves. Over time this grew to 1000s of SATA drives and was easily scaleable but hard to manage, as it was pretty dumb storage. In fact, they had to deploy video cameras, focused on drive shelves, to detect when drives failed!

Overtime, Nuance came to the realization that they had to do something more manageable and brought in HPE MSA storage to replace their Coraid storage. The MSA was a great solution for them which had 96 SAS drives, were able to support both faster “SCRATCH” storage using fast SAS 300GB/15KRPM drives and slower “STATIC” storage with slower SATA 760GB/7.2KRPM drives and was much more manageable than the Coraid solution.

Although MSA storage worked great, after a while, Nuance’s sprawling FC environment which was doubling yearly, caused them to rethink their storage once again. This led them to swap out all their HPE MSA storage, for HPE 3PAR to consolidate their FC network and storage footprint.

For metadata, Nuance uses a 76 node, Hadoop cluster for sophisticated search queries as doing an LS on the GPFS file system would take days. Their file meta-data is essentially a textual, row by row database and they use queries over the Hadoop cluster to determine things like which files have american english, spoken by females, with 8Khz recording.  Not sure when, but eventually Nuance deployed HPE Vertica SQL over Hadoop for their metadata engine and dropped average query from 12 minutes to 73 sec.(!!)

Nuance, because of their extreme growth and more open environment to storage innovation, had become a favorite for storage startups and major vendors to do Proofs of Concepts (PoC) on new storage offerings. One PoC, Nuance did was for Kamanario storage. There is a standard metric that says a CPU core requires so many IOPS, so that when CPU cores  increase,  you need to supply more IOPS. They went with Kaminario for their test-dev environment and more performance intensive storage. Nuance appreciates Kamanario’s reliability, high availability and highly predictable performance. (See the SFD10 video feed for Frederic’s session)

We talked a bit about how speech recognition’s Hidden Markov Chain statistical model was heavily dependent on CPU cores. Today, if you want to do a recognition task, you assigned it to one core and waited until it was done, a serial process dependent on the # of CPU cores you had available. This turned out to be quite a problem as you had to scale CPU cores if you wanted to do more concurrent speech recognition activities. Then came GPUs and you could do speech recognition work on a GPU core. With the new GPU cards,   instead of a server having ~16 CPU cores,  you could have a server with multiple Graphic cards having 3000-GPU cores. This scaled a lot easier. Machine learning and deep neural nets have the potential to parallelize this, so that it will scale even better

In the end, HPC trials, tribulations and ways of doing business are starting to become  mainstream. I was recently talking to one vendor that said, most HPC groups start out in isolation to support one application but over time they either subsume corporate IT or get absorbed into corp. IT or continue to be a standalone group (while waiting until one of the other two happen).

The podcast runs ~41 minutes and  covers a lot of ground about one HPC organization’s evolution of their storage environment over time, what was driving some of that evolution and the tools they chose to master it.  Listen to the podcast to learn more.

0F2A7849 - Copyv2-resizedFrederic Van Haren, founder HighFens, Inc.

Frederic Van Haren is the Chief Technology Officer @Highfens and known for his insights in the HPC and storage industry. He has over 20 years of experience in High Tech providing technical leadership and strategic direction in Telecom and Speech markets. Frederic spent the last decade at  Nuance Communications building large HPC environments from the ground up. He is frequently invited to speak at events to provide his insights on the HPC and storage markets. He has played leading roles as President of a variety of technology user groups promoting the use of innovative technology. As an Engineer he enjoys working with the engineering teams from technology vendors providing feedback on new and upcoming products.

Frederic lives in Massachusetts,  USA but grew up in the northern part of Belgium where he received his Masters in Electrical Engineering, Electronics and Automation.

Greybeards talk car videos, storage and IT trends with Marc Farley

In our 30th episode, we talk with 3rd time guest star,  Marc Farley (@GoFarley), Formerly with Datera and Tegile. Marc has recently gone on sabbatical and we wanted to talk to him about what was keeping him busy and what was going on in storage/IT industry these days.

Marc is currently curating a car comedy vlog called theridecast.com. Apparently people, at least in California, are making comedy videos in their cars. They can be quite hilarious, checkout this episode of comedian in cars getting coffee.

While in the storage biz, the industry is getting battered by a number of trends: IT shrinking budgets, vendor proliferation, migration to cloud, and flash becoming old hat. Marc makes multiple points as to why the storage market is undergoing such a major transition these days:

  • Death to tech refresh, long live the cloud –  yes the cloud does upgrade hardware but  planned storage system obsolesce doesn’t happen in the cloud anymore. Cloud providers are  buying new SSDs, disks, white box servers, memory etc,  but not enterprise class storage, server or networking hardware.
  • AFA is boring, but selling – every vendor’s got one , two or sometimes three and they all know how to provide flash storage services. Customers pay extra for AFA, whether they need to or not, because they are swapping out old expensive, enterprise class storage for AFAs that often cost less but still provide better performance..
  • Tail IO latency becoming more important but it’s not understood – when IO response times go from 100µsec to 10msec, it hurts. It doesn’t matter if it’s every 1000 or 10,000 IOs, customers want less performance variability, which is a main reason they move to AFA in the first place. But not all AFA’s perform the same in tail latency and SSD controller/system architecture make a big difference.
  • Hybrid storage survives but only if you go big – hybrid storage economics makes sense only for large, diverse data repositories, that mix user directories, non-performance sensitive apps, and other structured and unstructured data in one data store.
  • Greenfield apps & secondary storage are moving to the cloud but migrating current apps to the cloud is difficult –  for new app development and archive storage, moving to or starting in the cloud is a no-brainer. Transitioning running enterprise class apps to the cloud is tough to do, that requires multiple skill sets and may never be successful. Hybrid  (cloud-on premises) enterprise class apps are too arduous to even contemplate.
  • Realtime analytics is emerging but data needs to be on flash – yes MapReduce is a batch activity which can uses lots of slow disk but there’s more to analytics than MR, and doing log analysis, in anything approaching realtime, one needs flash performance.
  • Optical’s persistence is great but who leaves data on the same technology for  20 years –  with magnetic and electronic storage densities going up every couple of years, who could afford keep data on the same optical technology that was 20 years old. Imagine using microfiche to keep PB of data today, inconceivable.

As for IT in general, one limiter of IT activity will become the lack of skilled engineers, specifically full-stack engineers and data scientists.

We ended our discussions on the economics of Samsung 3D NAND and Intel-Micron (IM) 3D Xpoint non-volatile memories. Both new semiconductor technologies are always long term investments. Today, Samsung is probably losing money on each 3D TLC NAND SSD it sells, but over time, as  fab yields improve, it should become cheap enough to make a profit. Similarly, 3D Xpoint may be costly to produce early on, but as IM perfect  their fab processes, the technology should become inexpensive enough to make oodles of $s for them. And there’s more technology changes to come.

The podcast runs just over 40 minutes and covers a lot of ground. Marc’s been in the IT almost as long as the GreyBeards and has a unique perspective on what’s happening today, having been with so many diverse, major and (minor) startup vendors throughout his tenure in the industry.  Listen to the podcast to learn more.

Marc Farley


Marc is a storage greybeard who has worked for many storage companies and is currently on sabbatical. He has written three books on storage including his most recent, Rethinking Enterprise Storage: A Hybrid Cloud Model and his previous books Building Storage Networks and Storage Networking Fundamentals.

In addition to his writing books he has been a blogger and podcaster about storage topics while working for EqualLogic, Dell, 3PAR, HP, StorSimple,  Microsoft, and others.

When he is not working, Marc likes to ride bicycles, listen to music, spend time with his family and dote on his cats. Of course there’s that car video curation…

GreyBeards talk EMCWorld2015 news with Chad Sakac, Pres. EMC System Eng.

In this podcast we discuss some of the more interesting storage announcements to come out of EMCWorld2015 last week with Chad Sakac, (@sakacc on twitter and VirtualGeek blogger) President, EMC Global Systems Engineering. Chad’s was up on the “big stage” at EMCWorld helping to introduce much of the news we found thought provoking.

Chad said he was growing out his greybeard for the podcast, but we had to shut off the video to record the talk. But from the picture below, there’s no doubt he has a beard growing.

EMCWorld2015 in Las Vegas had over 14,000 participants and is EMC’s premier customer event. As such, there are always a lot of interesting news revealed at the show. This years event was no exception. I listed about a dozen topics to discuss with Chad but had to cut it down to just four major areas to fit into a reasonable time.

Chad at his VirtualGeek blog discussed many of these topics at length, across multiple posts and Ray reviewed some EMCWorld2015 news over two posts on his RayOnStorage blog as well.

In the podcast, Howard,  Ray and Chad discuss EMC’s new rack-scale flash storage, the DSSD, their new VxRack hyper converged system, the new XtremIO 4.0  and their new free & frictionless delivery model for Emerging Technology Devision software defined solutions.

I would have to say the DSSD drew the most interest from the analyst community but the new VxRack and the Emerging Technology Division’s move to open sourcing ViPR Controller caught many of us by surprise.

Just about at the end of the call Ray’s Internet service dropped out so Howard and Chad were kind enough to end the session by themselves. Thanks to my co-host for picking up the ball, after I fell off and my apologies for going missing at the end.

This months episode runs long, just under an hour and that’s after we cut about 5 minutes of discussion on the problems in open sourcing proprietary products. Chad can talk for hours on this stuff and pretty much at any level of technical detail we could possibly want. Probably need to invite him back someday to discuss more.

Sorry this podcast is so late but we had to wait for EMCWorld2015 to be over. Hopefully, next month we will be back on schedule.

We hope you enjoy the podcast.

ChadSakac_Cropped-resizedChad Sakac, President Global EMC Systems Engineering

Chad Sakac leads EMC’s technology, architecture and strategy team across the world. He is a global thought leader and evangelist, with a background and skill set in IT strategy, innovation, disruption and organizational change.  He is intimately involved in driving EMC’s technology roadmap, acquisition strategy and R&D direction.

As a leading mind in IT, Chad is the author of one of the top 20 virtualization blogs “VirtualGeek”.  He holds Electrical Engineering and Computer Science degrees from the University Of Western Ontario, Canada.