112: GreyBeards annual year end wrap-up with Keith & Matt

It’s the end of the year, so time for our regular year end wrap up discussion with the GreyBeards. 2020 has been an interesting year to say the least. It started out just fine, then COVID19 showed up and threw a wrench in everyone’s plans and as the year closes, we were just starting to see some semblance of the new normal, when one of the largest security breaches in years shows up. Whew, almost glad that’s over and onto 2021.

As always the GreyBeards had a great discussion on these and other topics to highlight the year just past. The talk was wide ranging and hard to characterize but I did my best below. Listen to the podcast to learn more.

COVID19s impact on the enterprise

It will probably take some time before we learn the true, long term impacts of COVID19 on IT but one major change has to be the massive Work From Home (WFH) transition that took place overnight.

While WFH can be more productive for some, the lack of face2face interaction can be challenging for others. The fact that many of the GreyBeards have been working from home for decades now, left us a bit oblivious to how jarring this transition can be for newcomers.

There’s definitely some psychological changes that need to occur to be productive at WFH. Organization skills become even more important. Structured interactions (read conference calls, zoom/webex and other forms of communication become much more important. And then there’s security.

Turns out VMware and others have been touting VDI solutions for the past decade or so to better support remote work and at the same time providing corporate levels of security for remote work. While occasionally this doesn’t work quite as well as expected, it’s certainly much much better than having end users access corporate data without any security around that data or worse yet, the “bring your own device”. All these VDI solutions had a field day when WFH happened.

Many workers found they could be more productive at WFH, due the less distractions, no commute time and more flexible hours. What happens when COVID19 is vanquished to all these current WFHers is anyone’s guess.

We thought there might be less need for large office campuses/buildings. But there’s something to be said for more collaboration and random interactions through face2face meetings that can only occur in an office setting with workers present at the same time. Some organizations will take to this new way of work while others will try to dial WFH back to non-existent. Where your organization fits on this spectrum and why, will be telling across a number of dimensions.

The rise of ARM

There’s been a slow but steady improvement in ARM processors over the last almost half century. Nowadays it’s starting to make a place for itself in the enterprise. ARH has always been the goto microprocessor for low power solutions (like smartphones) but nowadays they are being deployed in the cloud and even the enterprise. These can be used as server processors but even outside servers, ARM cores are showing up in hardware accelerators as the brains behind SmartNICs, DPUs, SPUs, etc.

Keith made mention AWS 2nd generation Graviton 64-bit ARM processor EC2 instances. And yes there’s significant cost ( & power) savings that can be had using AWS Graviton ARM instances. So the cloud is starting to adopt them. Somewhere over the past couple of years I heard that VMware was porting ESX to work on ARM cores.

But apparently, it’s not just as simple as dropping an ARM multi-core processor into a server and recompiling your code and away you go. Applications need a certain amount of optimization to run effectively on ARM processors. And the speed up between non-optimized and optimized versions of an application running on ARM cores is significant.

As for SmartNICs and DPUs, these are data networking hardware accelerators that provide real time processing capabilities needed to keep up with higher speed networking, 100GbE and beyond. These DPUs perform deep packet inspection, data compression, encryption and other services all at wire speeds.. Yes you could devote 1 or more X86 cores to do this, but it’s much cheaper (and more effective) to do this outside the CPU core. Moreover, performing this activity at the network entry point to the server means that much of this data doesn’t have to be transferred back and forth through server memory. So not only does it save CPU core cycles but also memory size and memory & PCIe bus bandwidth. We published a recent podcast with Kevin Deierling, NVIDIA Networking discussing DPUs if you want to learn more.

Pat made mention at (virtual) VMworld their plans to port ESX to the DPU. Keith followed up on this and asked some other exec’s at VMware about this and they said VMware will more likely support DPUs as just another hardware accelerator in their cluster. In either case, CPU cycles should be freed up and this should help VMware use X86 cores more efficiently. And perhaps this will help them engage in more CPU constrained environments such as Telcom.

Then there’s computational storage. We have been watching this technology for a couple of years now and it’s seeing some success in being deployed to public cloud environments. They seem to be being used to provide outboard data compression. It’s unclear whether these systems depend on ARM processing or not but my bet is that they do. To learn more about computational storage check out these podcasts, FMS2020 wrap up with Jim Handy and our talk with Scott Shadley on NGD’s computational storage.

System security

At yearend, we are learning of a massive security breach throughout US government IT facilities. All based on what is believed to be a Russian hack to a software package that is embedded in a popular networking tool software solution, SolarWinds. They are calling this a software supply chain hack. Although we are mainly hearing about government agencies being hacked, SolarWinds is also pervasive in the enterprise as well.

There have been many hardware supply chain hacks in the past, where a board supplier used chips or logic that weren’t properly vetted. Over time, hardware suppliers have started to scrutinize their supply chains better and have reduced this risk.

And the US government have been lobbying for the industry to use a security chip with a backdoor or to supply back doors to smartphone encryption capabilities. Luckily, so far, none of these have been implemented by industry.

What Russia has shown us is that this particular hack is not limited to the hardware sphere. Software supply chain risk can’t be ignored anymore.

This means that any software application supplier will need to secure their supply chain or bring it all in house. Which may mean that costs for these packages will go up. It’s possible that using a pure open source supply chain may reduce this risk as well. At least that’s the promise of open source.

We said 2020 was an interesting year and it’s going out with a bang.

Matt Leib (@MBLeib), one of our co-hosts, has been blogging in the storage space for over 10 years, with work experience both on the engineering and presales/product marketing.. His blog is at Virtually Tied to My Desktop and he’s on LinkedIN.

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

111: GreyBeards talk data analytics with Matthew Tyrer, Sr. Mgr. Solutions Mkt & Competitive Intelligence, Commvault

Sponsored by:

I’ve known Matthew Tyrer, Senior Manager Solutions Marketing and Competitive Intelligence, Commvault for quite awhile now and he’s always been knowledgeable about the problems the enterprise has in supporting and backing up large file data repositories. But lately he’s been focused on Commvault Activate their data analytics solution.

We had a great talk with Matthew. He was easy to talk to and knew a lot about how data analytics can ease the operational burden of the enterprise growing file data environments. .Remind me not to have two Matthew’s on the same program ever again. Listen to the podcast to learn more.

Matthew mentioned that their Activate was built on the Commvault platform software stack, which has had a rich and long history of development and customer deployments. It seems that Activate data analytics had been an early part of the platform but recently was split out as a separate solution.

One capability that Activate has that many other data analytics solutions do not, is the ability to examine both online data as well as data in backups. Most analytics solution can do one or the other, only a few do both. But if a solution only has access to online or backup data, they are missing half the story.

In addition, Activate can operate across multiple data centers as well as across multiple public cloud environments to provide analytics for an enterprise’s file data where it may reside.

Given the proliferation of file data these days, data analytics has become a necessity to most large IT shops. In the past, an admin could track some data over time but with the volumes of file data today, this is no longer tenable. At PB or more of file data, located in on prem data centers as well as across multiple clouds, there’s just too much file data to keep track of manually anymore.

Activate also indexes file content to provide more visibility and tracking of the different types of data under management in the enterprise. This is in addition to the extensive metadata that is collected and analyzed so it can better understand data access rights, copies and physical locations around the enterprise.

Activate can help organizations govern their data flows in support of industry as well as government data compliance requirements. Activate Data Governance, one of the three Activate solutions, is focused exclusively on providing enterprises the tools needed to manage any and all data that exists under compliance regulation environments.

Mat Leib had worked in eDiscovery before and it had always been a pain to extract “legally relevant” data from online and backup repositories. With the Activate eDiscovery solution and Activate’s content indexing of all file data, legal can perform their own relevant data searches to create eDiscovery data sets in support of litigation activities. Self service legal extracts like this vastly reduces the admin time and cost needed for eDiscovery.

The Activate File Space Optimization solution was deployed in one environment that had ~20PB of data online. By using File Space Optimization, the customer was able to cut 20PB down to 10PB. Any customer could benefit from such a reduction but customers doing data migration would see even more benefit.

At the end of the podcast, Matthew mentioned some videos that show Activate solution use cases.

Matthew Tyrer, Senior Solutions Marketing and Competitive Intelligence

Having worked at Commvault for over twelve years, after 8 years as a Sales Engineer Matt took that technical knowledge and transitioned to marketing where he is currently serving as a Senior Manager in Commvault’s Solution Marketing team. He is also heavily involved in Competitive Intelligence initiatives, and actively participates in field enablement programs.

He brings over 20 years’ experience in the IT industry, including within the fields of data and information management, cloud, data governance, enterprise storage, disaster recovery, and ultimately both implementing and supporting those projects and endeavours for public and private sector clients across Canada and around the globe.

Matt’s passion, deep product knowledge, and broad field experiences have enabled him to translate Commvault technology and vision such that their value is easily understood in the market and amongst client and partner families.

A self-described geek-dad, Matt is an avid boardgame enthusiast, firmly believes that Han shot first, and enjoys tormenting his girls with bad dad jokes.

110: GreyBeards talk FMS2020 wrap up with Jim Handy, General Director of Objective Analysis

This months it’s back to storage and our annual wrap-up on the Flash Memory Summit Conference with Jim Handy, General Director of Objective Analysis. Jim’s been on our show 5 times before and is a well known expert on NAND and SSDs (as well as DRAM and memory systems). Jim also blogs at TheSSDGuy.com and TheMemoryGuy.com just in case you want to learn more.

FMS went virtual this year and had many interesting topics including how computational storage is making headway in the cloud, 3D QLC is hitting the enterprise with PLC on the way, and for a first at FMS, a talk on DNA storage (for more information on this, see our podcast with CatalogDNA). Jim’s always interesting to talk with to help us understand where the NAND-SSD industry is headed. Listen to the podcast to learn more.

Jim mentioned that the major NAND vendors are all increasing the number of layers for their 3D NAND, and it continues to scale well. Most vendors are currently shipping ~100 layer NAND, with Micron doing more than that. And vendor roadmaps are looking at the possibility of 200 layers or more. Jim doesn’t think anyone knows how high it can go.

Another advantage of 3D NAND is it can be used to make bigger bit cells and thus have better endurance. From Jim’s perspective more electrons per cell means a better more resilient bit cell.

Many vendors in the nascent persistent memory industry were all hoping that NAND would stop scaling at some point and they would be able to pick up the slack. But NAND manufacturers found 3D and scaling hasn’t stopped at all. This has relegated most persistent memory vendors to a small niche market with the exception of Intel (and Micron).

Jim said that Intel is losing money on Optane every year, ~$5B so far. But Intel knows that chip profitability is tied to economies of scale, volumes matter. With enough volume, Optane will become cheap enough to manufacture that they will make buckets of money from it.

Interestingly, Jim said that DRAM scaling is slowing down. That means there may be an even bigger market for something close to DRAM access speeds, but with increased density and lower cost. Optane seems to fit that description very well.

Jim also mentioned that computational storage is starting to see some traction with public cloud vendors. Computational storage adds generic compute power to inside an SSD which can be used to perform storage intensive functions out at the SSD rather than transferring data into the CPU for processing. This makes sense where a lot of data would need to be transferred back and forth to an SSD and where computational cycles are just as cheap out on the SSD as in the server. For example, for data compression, search, and video transcoding, computational storage can make a lot of sense. (See our podcast with NGD systems for more informaiton).

In contrast, Open-Channel SSDs are making dumb SSDs, or SSDs without any flash translation layer or other smarts needed to make NAND work as persistent storage bin the enterprise. There’s a small group of system providers that want to perform all this functionality at a global scale (or across multiple SSDs) rather than at the local, SSD drive level.

Another topic that hit it’s stride this year at FMS2020 was Zoned Name Spaces (ZNS). ZNS partitions an SSD into separately addressable segments, to allow higher performing sequential (write) access within those zones. As SSD capacity has increased, IO activity has sky-rocketed and this has led to an “IO blender” effect. Within an IO blender, it’s impossible to understand which IO is following a sequential pattern and which is not. ZNS is intended to solve that probplem

With ZNS SSDs, IOs doing sequential access can have their own partition and that way the SSD can understand its sequential IO and act accordingly. It turns out that sequential writes to NAND can perform much, much faster than random writes.

ZNS was invented for SMR (shingled magnetic recording) disks, because these overwrote portions of other tracks (like roof shingles, tracks on SMR disks overlap). We had heard about ZNS at FMS2019 but had thought this just a better way to share access to a single SSD, by carving it up into logical (mini-)volumes. Jim said that was also a benefit but the major advantage is being able to understand sequential IO and write to the SSD more effectively.

We talked some on the economics of NAND flash, disk and tape as storage media. Jim and I see this continuing a trend that’s been going on for years, where NAND storage cost $/GB ~10X more than disk capacity, and disk storage costs $/GB ~10X more than tape capacity. All three technologies continue their relentless pursuit of increasing capacity but it’s almost like train tracks, all three $/GB curves following one another into the future.

On the other hand, high RPM disk seems to have died, and been replaced with SSDs. Disk manufacturers have seen unit declines but the # GB they are shipping continues to increase. Contrary to a number of AFA system providers, disk is not dead and is unlikely to die anytime soon.

Finally, we discussed DNA storage and it’s coming entry into the storage market. It’s all a question of price of the drive and media technology, size of the mechanism (drive?) and read and write access times. At the moment all these are coming down but are not yet competitive with tape. But given DNA technology trends, there doesn’t appear to be any physical barrier that’s going to stop it from becoming yet another storage technology in the enterprise, most likely at a 10X $/GB cost advantage over tape…

Jim Handy, General Director, Objective Analysis

Jim Handy of Objective Analysis has over 35 years in the electronics industry including 20 years as a leading semiconductor and SSD industry analyst. Early in his career he held marketing and design positions at leading semiconductor suppliers including Intel, National Semiconductor, and Infineon.

A frequent presenter at trade shows, Mr. Handy is known for his technical depth, accurate forecasts, widespread industry presence and volume of publication.

He has written hundreds of market reports, articles for trade journals, and white papers, and is frequently interviewed and quoted in the electronics trade press and other media. 

He posts blogs at www.TheMemoryGuy.com, and www.TheSSDguy.com

109: GreyBeards talk SmartNICs & DPUs with Kevin Deierling, Head of Marketing at NVIDIA Networking

We decided to take a short break (of sorts) from storage to talk about something equally important to the enterprise, networking. At (virtual) VMworld a month or so ago, Pat made mention of developing support for SmartNIC-DPUs and even porting vSphere to run on top of a DPU. So we thought it best to go to the source of this technology and talk with Kevin Deierling (TechSeerKD), Head of Marketing at NVIDIA Networking who are the ones supplying these SmartNICs to VMware and others in the industry.

Kevin is always a pleasure to talk with and comes with a wealth of expertise and understanding of the technology underlying data centers today. The GreyBeards found our discussion to be very educational on what a SmartNIC or DPU can do and why VMware and others would be driving to rapidly adopt the technology. Listen to the podcast to learn more.

NVIDIA’s recent acquisition of Mellanox brought them Mellanox’s NIC, switch and router technology. And while Mellanox, and now NVIDIA have some pretty impressive switches and routers, what interested the GreyBeards was their SmartNIC technology.

Essentially, SmartNICS provide acceleration and offload of data handling needs required to move data around an enterprise network. These offload services include at a minimum, encryption/decryption, packet pacing (delivering gadzillion video streams at the right speed to insure proper playback by all), compression, firewalls, NVMeoF/RoCE, TCP/IP, GPU direct storage (GDS) transfers, VLAN micro-segmentation, scaling, and anything else that requires real time processing to perform at line speeds.

For those who haven’t heard of it, GDS transfers data from storage directly into GPU memory and from GPU memory directly to storage without any CPU cycles or server memory involvement, other than to set up the transfer. This extends NVMeoF RDMA tech to/from storage and server memory, to GPUs. That is, GDS offers a RDMA like path between storage and GPU memory. GPU to/from server memory direct interface already exists over the PCIe bus.

But even with all the offloads and accelerators above, they can also offer an additional a secure enclave outside the TPM in the CPU, to better isolate security sensitive functionality for a data center. (See DPU below).

Kevin mentioned multiple times that the new unit of computation is no longer a server but rather is now a data center. When you have public cloud, private cloud and other systems that all serve up virtual CPUs, NICs, GPUs and storage, what’s really being supplied to a user is a virtual data center. Cloud providers can carve up their hardware and serve it to you any way you want or need it. Virtual data centers can provide a multitude of VMs and any infrastructure that customers need to use to run their workloads.

Kevin mentioned by using SmartNics, IT or cloud providers can return 30% of the processor cycles (that were being spent doing networking work on CPUs) back to workloads that run on CPUs. Any data center can effectively obtain 30% more CPU cycles and increased networking speed and performance just by deploying SmartNICs throughout all the servers in their environment.

SmartNICs are an outgrowth of Mellanox technology embedded in their HPC InfiniBAND and high end Ethernet switches/routers. Mellanox had been well known for their support of NVMeoF/RoCE to supply high IOPs/low-latency IO activity for NVMe storage over Ethernet and before that their InfiniBAND RDMA technologies.

As Mellanox came out with their 2nd Gen SmartNIC they began to call their solution a “DPU” (data processing unit), which they see forming part of a “holy trinity” underpinning the new data center which has CPUs, GPUs and now DPUs. But a DPU is more than just a SmartNIC.

All NVIDIA SmartNICs and DPUs are based on Mellanox’s BlueField cards and chip technology. Their DPU uses BlueField2 (gen 2 technology) chips, which has a multi-core ARM engine inside of it and memory which can be used to perform computational processing in addition to the onboard offload/acceleration capabilities.

Besides adding VMware support for SmartNICs, PatG also mentioned that they were porting vSphere (ESX) to run on top of NVIDIA Networking DPUs. This would move the core VMware’s hypervisor functionality from running on CPUs, to running on DPUs. This of course would free up most if not all VMware Hypervisor CPU cycles for use by customer workloads.

During our discussion with Kevin, we talked a lot about the coming of AI-ML-DL workloads, which will require ever more bandwidth, ever lower latencies and ever more compute power. NVIDIA was a significant early enabler of the AI-ML-DL with their CUDA API that allowed a GPU to be used to perform DL network training and inferencing. As such, CUDA became an industry wide phenomenon allowing industry wide GPUs to be used as DL compute engines.

NVIDIA plans to do the same with their SmartNICs and DPUs. NVIDIA Networking is releasing the DOCA (Data center On a Chip Architecture) SDK and API. DOCA provides the API to use the BlueField2 chips and cards which are the central techonology behind their DPU. They have also announced a roadmap to continue enhancing DOCA, as they have done with CUDA, over the foreseeable future, to add more bandwidth, speed and functionality to DPUs.

It turns out the real problem which forced Mellanox and now NVIDIA to create SmartNics was the need to support the extremely low latencies required for NVMeoF and GDS IO.

It wasn’t clear that the public cloud providers were using SmartNICS but Kevin said it’s been sort of a widely known secret that they have been using the tech. The public clouds (AWS, Azure, Alibaba) have been deploying SmartNICS in their environments for some time now. Always on the lookout for any technology that frees up compute resources to be deployed for cloud users, it appears that public cloud providers were early adopters of SmartNICS.

Kevin Deierling, Head of Marketing NVIDIA Networking

Kevin is an entrepreneur, innovator, and technology executive with a proven track record of creating profitable businesses in highly competitive markets.

Kevin has been a founder or senior executive at five startups that have achieved positive outcomes (3 IPOs, 2 acquisitions). Combining both technical and business expertise, he has variously served as the chief officer of technology, architecture, and marketing of these companies where he led the development of strategy and products across a broad range of disciplines including: networking, security, cloud, Big Data, machine learning, virtualization, storage, smart energy, bio-sensors, and DNA sequencing.


Kevin has over 25 patents in the fields of networking, wireless, security, error correction, video compression, smart energy, bio-electronics, and DNA sequencing technologies.

When not driving new technology, he finds time for fly-fishing, cycling, bee keeping, & organic farming.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png
This image has an empty alt attribute; its file name is Spotify_Logo_CMYK_Black-1024x307.png


108: GreyBeards talk DNA storage with David Turek, CTO, Catalog DNA

The Greybeards get off the beaten (enterprise) path this month, to see what lies ahead with a discussion on DNA storage. David Turek, CTO, Catalog DNA (@CatalogDNA) is a long time IBMer that had been focused on HPC systems at IBM but left and went to Catalog DNA to pursue the commercialization of DNA storage, an “emerging” technology. CatalogDNA is a company out of Boston that had recently closed a round of funding and are focused on bringing DNA storage out into the world of IT.

David was a pleasure to talk and has lot’s of knowledge on HPC and enterprise data center solutions. He also has a good grasp of what it will take to bring DNA storage to market. Keith has had some prior experience with DNA technologies in BioPharma so could talk in more detail about the technology and its ecosystem. [We’re trying out a new format, let us know what you think; The Eds.]

Ray has written about DNA storage in his RayOnStorage Blog, most recently in April of this year and May of last year. It’s been an ongoing blog topic of his for almost a decade now. When Ray was interviewed about the technology he thought it interesting but had serious obstacles with read and write latencies and throughput as well as the size of the storage device.

Well CatalogDNA seems to have got a good handle on write throughput and are seriously working on the rest.

However, DNA storage’- volumetric density was always of exceptional. Early on in the podcast, David mentioned that DNA storage was 6 orders of magnitude (1 million times) more dense in bytes/mm**3 than magnetic tape today. An LTO8 tape device stores 12TB (uncompressed) in a tape cartridge, 14.2 in**3 (230.3 cm**3) or roughly 845GB/in**3 (52GB/cm**3). One million times this, would be 12EB in the same volume.

The challenge with LTO8, disk or SSD storage today is at some point you have to move the data from one device to a more modern device. This could be every 3-5 years (for disk or SSD) or 25-30 years for tape. In either case, at some point IT would need to incur the cost and time to move the data. Not much of a problem for 100TB or so but when you start talking PB or EB of data, it can be a never ending task.

DNA storage

David mentioned Catalog uses “synthetic DNA” in their storage. This means the DNA it uses is designed to be incompatible with natural DNA such that it wouldn’t work in a cell. It has stops or other biological mechanisms to inhibit it’s use in nature. Yes it uses the same sugars, backbones, and other chemistry of biologically active DNA, but it has been specifically modified to inhibit its use by normal cellular machinery. 

DNA storage has a number of unique capabilities :

  • It can be made to last forever, by being dried out (dessicated) and encased in a crystal and takes 0 power/energy to be stored for eons.
  • It can be cheaply and easily replicated, almost an infinite number of times, for only the cost of chemical feedstock, chemical interactions and energy. Yes, this may take time but the process scales up nicely. One could make 2 copies in first cycle, 4 in the 2nd, 8 in the 3rd, etc and by doing this it would only take 20 cycles to create a million copies. If each cycle takes 10 minutes, in 3:20, you could have a million copies of 1EB of data.
  • It can be easily searched for target information. This involves fabricating a DNA search molecule and inserting it into the storage solution. Once there it would match up with the DNA segment that held your key. And of course, the search molecule and the data could be replicated to speed up any search process.
  • We already mentioned the extreme density advantage above.

Speed of DNA storage access

David said they can already write Catalog DNA storage in MB/sec.

The process they use to write is like a conveyer belt which starts off with a polyethylene sheet (web actually). Somewhere, the digital data comes in, is chunked and transformed into DNA strand (25-50 base pairs) molecules or dots. The polyethylene sheet rolls into a machine that uses multiple 3D print heads to deposit dots (the DNA strand data chunks) at web points. This machine/process deposits 100K or more of these dots onto the web. The sheet then moves to the next stage where the DNA molecules are scraped off and drained into a solution. Then a wet process occurs which uses chemistry to make the DNA more readable and enables the separate DNA molecules to connect into a data strand. Then this data strand goes into another process where it gets reduced in volume and so that it is more stable.

If needed, one can add another step that dries out or desiccates the data strand into even a smaller volume which can then be embedded into a crystalline structure which could last for centuries.

David compared the DNA molecules (data chunks) to Legos, only they are the same pieces in a million different colors Each piece represents some segment of data bits/bytes. Using chemistry and proprietary IP each separate DNA molecule self organizes (connects) into a data strand, representing the information you want to store.

Reading DNA involves, off the shelf, DNA sequencers. The one Catalog currently uses is the Oxford NanoPore device, but there are others. David didn’t say how fast they could read DNA data. But current DNA reading devices destroy the data. So making replicas of the data would be required to read it.

David said their current write device is L shaped with one leg about 14’ (4.3m) long and the other about 12’ (3.7m) long with each leg being about 3’ (0.9m) wide.

Searching EB of data in minutes?!

DNA strands can be searched (matched) using a search molecule and inserting this into the storage solution (that holds the data strands). Such a molecule will find a place in the data that has a matching (DNA) data element and I believe attach itself to the data strand.

For example, lets say you had recorded all of a country’s emails for a month or so and you wanted to search them for the words, “bomb”, “terrorist”, “kill”, etc. One could create a set of search molecules, replicate them any number of times (depending on how quickly you wanted to search the data and how many matches you expected), and insert them into a data pool with multiple data strands that stored the email traffic.

After some time, you’d come back and your search would be done. You’d need to then extract the search hits, and read out the portion of the data strands (emails) that matched. I’m guessing extraction would involve some sort of (wet) chemical process or filtration.

State of Catalog DNA storage

David mentioned that as a publicity stunt they wrote the whole Wikipedia onto Catalog DNA storage. The whole Wikipedia fit into a cylinder about the height of a big knuckle on your hand and in a width smaller than a finger. The size of the whole Wikipedia, with complete edit history is 10TB uncompressed and if they stored all the edit versions plus its media such as images, videos, audio and other graphics, that would add another 23TB (as of end of 2014), so ~33TB uncompressed.

David believes in 18 months they could have a WORM (write once, read many times) data storage solution that could be deployed in customer data centers which would supply immense data repositories in relatively small solution containers.

CatalogDNA is currently in a number of PoCs with major corporations (not labs or universities) to show how DNA storage technology can be used to solve problems.

David believes that at some point they will be able to make compute engines entirely of DNA. At that point, one could have a combined compute and storage (HCI-like) DNA server using the same technology in a solution. And as mentioned previously, one could replicate from one DNA server & storage to a million DNA servers & storage in just 20 cycles. How’s that for scale out.


David Turek, CTO Catalog DNA

Dave Turek is Catalog’s Chief Technology Officer. He comes to Catalog from IBM where he held numerous executive positions in High Performance Computing and emerging technologies.

He was the development executive for the IBM SP program which produced the first commercially successful massively parallel system; he started IBM’s Linux Cluster business; launched an early offering in Cloud computing called Deep Computing Capacity on Demand; produced the Roadrunner system, the world’s first petascale computer; and was responsible for IBM’s exascale strategy which led to the deployment of the Summit and Sierra systems at Oak Ridge and Lawrence Livermore National Laboratories respectively.

David has been invited to testify to Congress on numerous occasions regarding the future of computing in the US and has helped establish technical collaborations with universities, businesses, and government agencies around the world.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png
This image has an empty alt attribute; its file name is Spotify_Logo_CMYK_Black-1024x307.png