Tag Archives: Data-aware storage

GreyBeards talk data-aware, scale-out file systems with Peter Godman, Co-founder & CEO, Qumulo

In this podcast we discuss Qumulo’s data-aware, scale-out file system storage with Peter Godman, Co-founder & CEO of Qumulo. Peter has been involved in scale-out storage for a while now, coming from (EMC) Isilon before starting Qumulo. Although, this time he’s adding data-awareness to scale-out storage. Presently, Qumulo is vertically focused on the HPC and media/entertainment market spaces.

Qumulo is the first storage vendor we have heard of that implements their software with Agile development techniques. This allows them to release new functionality to the field every two weeks – talk about rapidly turning out software. We believe this is pretty risky and Ray talks more about Agile development for storage in his Storage on Agile blog post.

But Qumulo mostly sees itself as data aware NAS, using Posix metadata and a neat, internally designed/developed database to store, index and retrieve file system metadata. Qumulo’s proprietary database provides much faster response to queries on meta-data, such as what files have changed since last backup, calculate all the  storage space consumed by a specific owner, supply inclusion/exclusion lists to split the file systems into 100 partitions, etc. The database is not a relational or conventional database, but almost old-school, indexed data structures tailored to providing quick answers to the queries of most interest to customers and their application environment. In a scale-out NAS environment like Qumulo’s, with potentially billions of files, you just don’t have time to walk an inode tree to get these sorts of answers, anymore.

Qumulo supplies both hardware and software to its customers but also offers a software-only or software defined storage (SDS) version for those few customers that want it. SDS versions can help potential customers perform  proofs of concept (PoCs) using VMs.

In their system nodes, Qumulo uses SSDs and disks. SSDs provide a sort of NVM that holds recently written data but can also be used for reading data. Behind the SSDs are 8TB disks. Today, Qumulo provides mirrored storage that’s widely spread or dispersed across all the storage in their system. With this wide-striping of data, rebuild times for (an 8TB) disk failure is ~1:20 for a single QC204 (204TB) system node and halves every time you double the number of nodes.

It was refreshing to hear a startup vendor clearly answer what they have and don’t have implemented in their current system. Some startups try to obfuscate or talk around the lack of functionality but Peter’s answers were always clear and (sometimes to) concise on what’s in and not in current Qumulo functionality.

This months edition runs just over 47 minutes and gets pretty technical in places, but mostly stays at a high functional level.  We hope you enjoy the podcast.

Peter Godman, Co-founder & CEO Qumulo

pete_7CPeter brings 20 years of industry systems experience to Qumulo. As VP of Engineering and CEO of Corensic, Peter brought the world’s first thin-hypervisor based product to market. As Director of Software Engineering at Isilon, he led development of several major releases of Isilon’s award-winning OneFS distributed file system and was inventor of 18 patented technologies. Peter studied math and computer science at MIT.

GreyBeards talk data-aware storage with Paula Long & Dave Siles, CEO&CTO DataGravity

In this podcast we discuss data-aware storage with Paula Long, CEO/Co-Founder and Dave Siles, CTO of DataGravity. Paula comes from EqualLogic and Dave from Veeam so they both have a lot of history in and around the storage industry, almost qualifying them as grey hairs :/

Data-aware storage is a new paradigm in storage that combines primary (block and file) storage, file and data analytics and text indexing. Just to top it off, they also add data protection to a separate storage partition. Their system is VM aware and is able to crack open VMDKs to find out what’s inside. With all their file and data analytics, DataGravity is  able to supply data leakage detection and a much better understanding of what data is actually being stored on the system.

Paula believes, in 5 years or so, this new approach to storage will become common. Their system also supports targeted data deduplication and compression as well as provide self-service restore and a “google-like” rich search experience to their data aware storage.

DataGravity was designed for mid-market but are being pulled up market by workgroups as department level storage for F500 companies. They find that once installed,  they usually uncover some exposure and then other departments take notice. Also they’re discovering an awful lot of dormant data and moving this off of primary storage can save quite a lot.

DataGravity has a 2U controller with a 24-disk drive shelf but have SSDs inside the controllers. They use spinning disks for a majority of the data storage.

DataGravity has an interesting twist on the active-passive, standard dual conttroller/HA approach to storage, which you will have to listen to the podcast to truly understand.

This months episode runs a bit over 44 minutes and wanders over a lot of high ground but dips into technical waters occasionally.

Paula Long, CEO & Co-founder, DataGravity

PaulaLong-G Paula brings over 30 years of experience to DataGravity in delivering meaningful and game changing high-tech innovation. Prior to DataGravity, Paula served as vice president of product development at Heartland Robotics. In 2001 Paula co-founded storage provider EqualLogic, resetting the bar on how customers managed and purchased data storage. EqualLogic was acquired by Dell for $1.4 billion in 2008 and Paula remained at Dell as vice president of storage until 2010. Previous to EqualLogic, she served in several engineering management positions at Allaire Corporation and oversaw all aspects of the ClusterCATS product line while at Bright Tiger Technologies.

Her executive and technical leadership has been extensively recognized, including the New Hampshire High Tech Council Entrepreneur of the Year award, the Ernst & Young 2008 Northeast Regional “Entrepreneur of the Year” and a national finalist for the same award. Her technical awards span systems designs and enterprise software including the EqualLogic and ClusterCATS product lines. She is a graduate of Westfield State College

Paula is also active in the startup community. Outside of high tech, she works with charities creating equality for professional women and girls, as well as with organizations enabling literacy for all children, regardless of economic status.

Dave Siles, CTO DataGravity

DaveSiles-colorWith more than 20 years in operations and leadership roles with growth companies, David serves as chief technology officer of DataGravity, responsible for leading the technical strategic vision for the company while guiding our product management teams and research and development efforts to better serve the needs of organizations looking for more from their data storage.

Prior to becoming CTO, David served as vice president of worldwide field operations at DataGravity. Previously, David was a member of the senior leadership team at Veeam Software, a leading data protection software provider for virtualized and cloud environments.

David also served as CTO and VP of professional services for systems integrator Hipskind TSG. He also served as CTO for Kane County, Ill., and has held technology leadership roles with various organizations. A graduate of DeVry University, he is a frequent speaker at top tier technology shows and is a recognized expert in virtualization.