76: GreyBeards talk backup content, GDPR and cyber security with Jim McGann, VP Mkt & Bus. Dev., Index Engines

In this episode we talkindexing old backups, GDPR and CyberSense, a new approach to cyber security, with Jim McGann, VP Marketing and Business Development, Index Engines.

Jim’s an old industry hand that’s been around backups, e-discovery and security almost since the beginning. Index Engines solution to cyber security, CyberSense, is also offered by Dell EMC and Jim presented at a TFDx event this past October hosted by Dell EMC (See Dell EMC-Index Engines TFDx session on CyberSense).

It seems Howard’s been using Index Engines for a long time but keeping them a trade secret. In one of his prior consulting engagements he used Index Engines technology to locate a a multi-million dollar email for one customer.

Universal backup data scan and indexing tool

Index Engines has long history as a tool to index and understand old backup tapes and files. Index Engines did all the work to understand the format and content of NetBackup, Dell EMC Networker, IBM TSM (now Spectrum Protect), Microsoft Exchange backups, database vendor backups and other backup files. Using this knowledge they are able to read just about anyone’s backup tapes or files and tell customers what’s on them.

But it’s not just a backup catalog tool, Index Engines can also crack open backup files and index the content of the data. In this way customers can search backup data, with Google like search terms. This is used day in and day out, for E-discovery and the occasional consulting engagement.

Index Engines technology is also useful for companies complying with GDPR and similar legislation. When any user can request information about them be purged from corporate data, being able to scan, index and search backups is great feature.

In addition to backup file scanning, Index Engines has a multi-PB, indexing solution which can be used to perform the same, Google-like searching on a data center’s file storage. Once again, Index Engines has done the development work to implement their own, highly parallelized metadata and content search engine, demonstratively falter than any open source (Lucene) search solution available today.

CyberSense

All that’s old news, what Jim presented at a TFDx event was their new CyberSense solution. CyberSense was designed to help organizations detect and head off ransomware, cyber assaults and other data corruption attacks.

CyberSense computes a data entropy (randomness) score as well as ~39 other characteristics for every file in backups or online in a custmer’s data center. It then uses that information to detect when a cyber attack is taking place and determine the extent of the corruption. With current and previous entropy and other characteristics on every data file, CyberSense can flag files that look like they have been corrupted and warn customers that a cyber attack is in process before it corrupts all of customers data files.

One typical corruption is to change file extensions. CyberSense cracks open file contents and can determine if it’s an office or other standard document type and then check to see if its extension matches its content. Another common corruption is to encrypt files. Such files necessarily have an increased entropy and can be automatically detected by CyberSense

When CyberSense has detected some anomaly, it can determine who last accessed the file and what executable was used to modify it. In this way CyberSecurity can be used to provide forensics on who, what, when and where about a corrupted file, so that IT can shut the corruption activity down before it’s gone to far.

CyberSense can be configured to periodically scan files online as well as just examine backup data (offline) during or after it’s backed up. Their partnership with Dell EMC is to do just that with Data Domain and Dell EMC backup software.

Index Engines proprietary indexing functionality has been optimized for parallel execution and for reduced index size. Jim mentioned that their content indexes average about 5% of the full storage capacity and that they can index content at a TB/hour.

Index Engines is a software only offering but they also offer services for customers that want a turn key solution. They also are available through a number of partners, Dell EMC being one.

The podcast runs ~44 minutes. Jim’s been around backups, storage and indexing forever. And seems to have good knowledge on data compliance regimes and current security threats impacting customers, across the world today . Listen to our podcast to learn more.

Jim McGann, VP Marketing and Business Development, Index Engines

Jim has extensive experience with the eDiscovery and Information Management in the Fortune 2000 sector. Before joining Index Engines in 2004, he worked for leading software firms, including Information Builders and the French based engineering software provider Dassault Systemes.

In recent years he has worked for technology based start-ups that provided financial services and information management solutions. Prior to Index Engines, Jim was responsible for the business development of Scopeware at Mirror Worlds Technologies, the knowledge management software firm founded by Dr. David Gelernter of Yale University. Jim graduated from Villanova University with a degree in Mechanical Engineering.

Jim is a frequent writer and speaker on the topics of big data, backup tape remediation, electronic discovery and records management.

41: Greybeards talk time shifting storage with Jacob Cherian, VP Product Management and Strategy, Reduxio

In this episode, we talk with Jacob Cherian (@JacCherian),  VP of Product Management and Product Strategy at Reduxio. They have a produced a unique product that merges some characteristics of CDP storage and the best of hybrid and deduplicating storage today into a new primary storage system. We first saw Reduxio at VMworld a couple of years back and this is the first chance we have had a chance to talk with them.

Backdating data

Many of us have had the need to go back to previous versions of files, volumes and storage. But few systems provide an easy way to do this. Reduxio is the first storage system that makes this extremely effortless to do.

Reduxio’s storage system splits apart an IO write operation into data and meta-data. The IO meta-data information includes the volume/LUN id, offset into the volume, and data length. The data is chunked, compressed, hashed, and then sent to NVRam cache. The IO meta-data and a system wide time stamp together with data chunk hash(es) are sent to a separate key-value (K-V) meta-data store.

What Reduxio supplies is an easy way to go back for any data volume, to any second in its past. Yes there are limits as to how far back one can go with a data volume. Like saving every second for the last 8 hours,  every hour for the last week, every week for the last month, every month for the last year, etc. all of which can be established at volume configuration time. But all this does is tell Reduxio when to discard old data.

With all this in place, re-establishing a volume to some instant in its past is simply a query to the meta-data K-V store with the appropriate time stamp. The meta-data K-V store returns from the query all the hashes and other IO meta-data for all the data chunks in sequence for the volume of data at that point in time, in it’s past. With that information the system can easily fabricate the volume at that moment in its past.

By keeping the data and the meta-data tag, time stamp and hash(es) information separate, Reduxio can reconstruct the data at any time (to one second granularity) in the past where data is still available to the system.

Performance

In the past, this sort of time shifting storage functionality was limited to a separate CDP backup appliance. What Reduxio has done is integrate all this functionality with a deduplicating-compressed, auto tiering primary storage system. So every IO is chunking, deduplicating, compressing data and splitting the meta-data, time-stamps, hashes from data chunks.  There is no IO performance penalty for doing any of this, it’s all a part of the normal IO path of the Reduxio primary storage system.

However, there is some garbage collection activity that needs to go on in order to deal with data that’s no longer needed. Reduxio does this mostly in real time, as the data actually expires.

Deduplication, compression and all the other characteristics of the storage system that enable its time shifting capabilities cannot be turned off.

Auto storage tiering

Reduxio optimized their auto-tiering beyond what is normally done in other hybrid storage systems. Data is chunked and moved to cache and ultimately destaged to flash. Hot vs. cold data is analyzed in real time, not sometime later with other hybrid storage system. Also, when data is deemed cold and needs to be moved to disk, Reduxio takes another step to analyze it’s meta-data K-V store and other information to see what other data was referenced during the same time as this data. This way it can attempt to demote a “group” of data chunks that will likely all be referenced together. That way when one chunk of this “group” of data is referenced, the rest can be promoted to flash/cache at the same time.

Their auto-tiering group algorithm is used, every time they demote data and every time they promote data to a faster tier they can start to record any data that is referenced together. This way the next time they demote data chunks  the group definition can be further refined.

Reduxio storage system

Reduxio provides a hybrid (disk-SSD) iSCSI primary storage system that holds 40TB of storage today, and with an average compression-dedupe ratio (over their 2PB of field data) of  >4:1, 40TB should equate to over 160TB of usable data storage. Some of that usable storage would be for current volume data and some would be used for historical data.

There was a Slack discussion the other week on what to do about ransomware. It seems to me that Reduxio with its time traveling storage, could be used as an effective protection for any ransomware.

The podcast runs ~41 minutes, although snapshots have been around for a long time (one of the Greybeards worked on a snapshotting storage system back in the early 90s), Reduxio has taken the idea to new heights.  Listen to the podcast to learn more.

Jacob Cherian, VP Product Management and Product Strategy, Reduxio

Jacob is responsible for Reduxio’s product vision and strategy. Jacob has overall ownership for defining Reduxio’s product portfolio and roadmap.

Prior to joining Reduxio, Jacob spent 14 years at Dell in the Enterprise Storage Group leading product development and architectural initiatives for host storage, NAS, SAN, RAID and other data center infrastructure. As a member of Dell’s storage architecture council he was responsible for developing Dell’s strategy for unstructured data management, and drove its implementation through organic development efforts and technology acquisitions such as Ocarina Networks and Exanet. In his last role as a Dell expatriate in Israel he oversaw Dell’s FluidFS development.

Jacob started his career in Dell as a development engineer for various SAN, NAS and host-side solutions, then served as the Architect and Technologist for Dell’s MD series of external storage arrays.

Jacob was named a Dell Inventor of the Year in 2005, and holds 30 patents and has 20 patents pending in the areas of storage and networking. He holds a Bachelor of Science (B.S.) in Electrical Engineering from the Cochin University of Science and Technology, a Master of Science (M.S.) in Computer Science from Oklahoma State University, and a Master of Business Administration (MBA) from the Kellogg School of Management, Northwestern University