It’s been a couple of months since we last talked with a startup, so the GreyBeards thought it was time. We reached out to Charles Fan (@CharlesFan14), CEO and Co-Founder of MemVerge to find out about their big memory solution or as Charles likes to call it, “software defined (big) memory”. Although neither Matt or I had ever talked with Charles before, he’s been just about everywhere in the storage industry throughout his career.
If you have been following my RayOnStorage blog you will have seen a post (Need memory, Intel’s Optane DC PM to the rescue) last year on Intel’s new Persistent Memory solutions using 3D XPoint, called Optane DC PM (data center, persistent memory) . At the announcement Intel made available a couple of ways customers could use Optane DC PM (PMem).
Optane DC PM primer
Native Optane DC PM access modes include:
- A Memory Mode, which has Pmem emulating a large volatile memory space and uses a defined ratio of DRAM to PMem as a cache to access the Optane DC PM memory behind it.
- An Application Direct (AppDirect) Mode which supports two sub-modes: a storage device mode that uses Pmem to emulate a persistent, 4KB block storage device; and a byte addressable, persistent memory address space mode that uses Pmem to emulate a large, non-volatile memory space . AppDirect memory content persists across boots, power failures and other system crashes.
Native PMem modes are selectected in the BIOS and are deployed at Boot time. Optane DC PM on a server can be split up into any of the three modes. And currently with Optane DC PM (Gen 1), a single server can have up to 6TB of DC PM which will go up to 8TB with Optane DC PM Gen 2 coming out later this year.
MemVerge Memory Machine
MemVerge has written a “software defined memory” service called the Memory Machine, that sits above the Intel Optane DC PM in server(s) and provides application access AND data services for PMem. .
Charles likens their Memory Machine to what VMware did for CPU cores, ie. they provide memory virtualization. This, Charles believes will bring on the age of Big Memory applications. He feels that PMem, with Memory Machine on top of it, will eliminate the need for high performance, tier 0 storage. Tier 0 storage is ~$10B market today, which he sees shifting from networked storage to PMem solutions.
Memory Machine Data Services
One of the data services that the Memory Machine offers is a Pmem snapshot service. PMem thick or thin snapshots can be taken any (infinite) number of times (for thick snapshots storage space availability may limit their number) and can be taken up to once per minute. PMem thin snapshots take little time to accomplish and are very PMem space efficient but thick snapshots are a PMem to PMem copy of data, which will take longer to accomplish and will take double the memory of the original PMem being snapshot.
One significant use case for Pmem snapshots is for checkpoint crash recovery. Charles mentioned many securities and financial analysis firms use KDB as streaming data base service to monitor/analyze market activity and provide automated trading and other market services. These firms are always trying to gain an advantage through speed and reduced latency and as a result have moved their time sensitive processing to use in memory data structures/databases.
However, because checkpointing for crash recovery takes time, they usually checkpoint in memory databases only once a day (after market close) and maintain a log of database transactions on SSD. If there’s a system crash, they reload the last checkpoint and re-play all the transaction logs since that checkpoint to bring their in memory database back to the point of crash. Due to the number of transactions these firms do, this sort of crash recoverys can take hours.
With Memory Machine, these customers can take in memory checkpoints every minute and in the event of a crash, only have to re-play a minutes worth of transaction logs which could be done in no time to get back up
Other environments do similar checkpoint crash recoveries all of which could also take advantage of PMem snapshots to take more frequent checkpoints. Charles mentioned Rendering farms on the podcast but long scientific simulations (HPC) and others use checkpoints for crash recovery.
Another data (or application) service offered by Memory Machine is application cloning. Most in memory applications are single threaded. meaning they can only take advantage of a single CPU core (thread). In order to speed up processing, customers must shard (split up) or copy their database and application onto other servers/CPU/cores to provide more processing power. Memory Machine can use its thick or thin snapshots to clone applications in seconds.
Charles also mentioned that Memory Machine offers PMem dynamic reconfiguration. That is instead of having to make BIOS changes and re-boot server(s) to re-allocate PMem across different applications, Memory Machine is allocated 100% of the PMem at boot time but then, on demand, anytime its operating, operators using MemVerge’s GUI/CLI can carve Pmem up into any number of application memory spaces. That is as application demand for in memory data changes, operations can use the Memory Machine to re-allocate PMem to keep up.
Memory Machine also supports PMem clustering or scaling across servers. With the current 6TB (and soon 8TB) per server PMem limit, some customer applications still run out of memory. Memory Machine is able to cluster or aggregate PMem across up to 32 servers to support a single larger, PMem address space of 192TB (Gen 1) or 256TB (Gen 2) DC PM. The Memory Machine uses an RDMA (RoCE Ethernet or InfiniBand) cluster interconnect which adds ~1 microsecond of overhead to access PMem in another server. This comes with PMem automatic data tiering using DRAM, local (on the server) PMem and remote (across cluster interconnect) PMem.
Charles mentioned another data service provided by Memory Machine is (Synch or Asynch) replication. One use case for replication is to create a Pub-Sub service for market data.
Charles believes that in memory databases and data processing workloads are just starting to become popular these days. Besides KDB and rendering, other data processing such as AI training/inferencing, Reddis applications, and other database systems are able to take advantage of in memory, large data structures to speed up their data processing
MemVerge’s EAP (early access program) opened up recently (5/19/2020). Charles suggested anyone using large, in memory data processing, take a look at what the Memory Machine can do and contact them to sign up.
The podcast runs ~45 minutes. Charles was very articulate as well as knowledgeable about the technology and its applications. He was great to talk tech with. Matt and I had a fun time talking Optane DC PM and Memory Machine functionality/applications with him. Listen to the podcast to learn more.
Charles Fan, CEO & Co-founder, MemVerge
Charles Fan is co-founder and CEO of MemVerge. Prior to MemVerge, Charles was a SVP/GM at VMware, founding the storage business unit that developed the Virtual SAN product.
Charles also worked at EMC and was the founder of the EMC China R&D Center. Charles joined EMC via the acquisition of Rainfinity, where he was a co-founder and CTO.
Charles received his Ph.D. and M.S. in Electrical Engineering from the California Institute of Technology, and his B.E. in Electrical Engineering from the Cooper Union.