OFA DNNs, cutting the carbon out of AI

Read an article (Reducing the carbon footprint of AI… in Science Daily) the other day about a new approach to reducing the energy demands for AI deep neural net (DNN) training and inferencing. The article was reporting on a similar piece in MIT News but both were discussing a technique original outlined in a ICLR 2020 (Int. Conf. on Learning Representations) paper, Once-for-all: Train one network & specialize it for efficient deployment.

The problem stems from the amount of energy it takes to train a DNN and use it for inferencing. In most cases, training and (more importantly) inferencing can take place on many different computational environments, from IOT devices, to cars, to HPC super clusters and everything in between. In order to create DNN inferencing algorithms for use in all these environments, one would have to train a different DNN for each. Moreover, if you’re doing image recognition applications, resolution levels matter. Resolution levels would represent a whole set of more required DNNs that would need to be trained.

The authors of the paper suggest there’s a better approach. Train one large OFA (once-for-all) DNN, that covers the finest resolution and largest neural net required in such a way that smaller, sub-nets could be extracted and deployed for less weighty computational and lower resolution deployments.

The authors contend the OFA approach takes less overall computation (and energy) to create and deploy than training multiple times for each possible resolution and deployment environment. It does take more energy to train than training a few (4-7 judging by the chart) DNNs, but that can be amortized over a vastly larger set of deployments.

OFA DNN explained

Essentially the approach is to train one large (OFA) DNN, with sub-nets that can be used by themselves. The OFA DNN sub-nets have been optimized for different deployment dimensions such as DNN model width, depth and kernel size as well as resolution levels.

While DNN width is purely the number of numeric weights in each layer, and DNN depth is the number of layers, Kernel size is not as well known. Kernels were introduced in convolutional neural networks (CovNets) to identify the number of features that are to be recognized. For example, in human faces these could be mouths, noses, eyes, etc. All these dimensions + resolution levels are used to identify all possible deployment options for an OFA DNN.

OFA secrets

One key to the OFA success is that any model (sub-network) selected actually shares the weights of all of its larger brethren. That way all the (sub-network) models can be represented by the same DNN and just selecting the dimensions of interest for your application. If you were to create each and every DNN, the number would be on the order of 10**19 DNNs for the example cited in the paper with depth using {2,3,4) layers, width using {3,4,6} and kernel sizes over 25 different resolution levels.

In order to do something like OFA, one would need to train for different objectives (once for each different resolution, depth, width and kernel size). But rather than doing that, OFA uses an approach which attempts to shrink all dimensions at the same time and then fine tunes that subsets NN weights for accuracy. They call this approach progressive shrinking.

Progressive shrinking, training for different dimensions

Essentially they train first with the largest value for each dimension (the complete DNN) and then in subsequent training epochs reduce one or more dimensions required for the various deployments and just train that subset. But these subsequent training passes always use the pre-trained larger DNN weights. As they gradually pick off and train for every possible deployment dimension, the process modifies just those weights in that configuration. This way the weights of the largest DNN are optimized for all the smaller dimensions required. And as a result, one can extract a (defined) subnet with the dimensions needed for your inferencing deployments.

They use a couple of tricks when training the subsets. For example, when training for smaller kernel sizes, they use the center most kernels and transform their weights using a transformation matrix to improve accuracy with less kernels. When training for smaller depths, they use the first layers in the DNN and ignore any layers lower in the model. Training for smaller widths, they sort each layer for the highest weights, thus ensuring they retain those parameters that provide the most sensitivity.

It’s sort of like multiple video encodings in a single file. Rather than having a separate file for every video encoding format (Mpeg 2, Mpeg 4, HVEC, etc.), you have one file, with all encoding formats embedded within it. If for example you needed Mpeg-4, one could just extract those elements of the video file representing that encoding level

OFA DNN results

In order to do OFA, one must identify, ahead of time, all the potential inferencing deployments (depth, width, kernel sizes) and resolution levels to support. But in the end, you have a one size fits all trained DNN whose sub-nets can be selected and deployed for any of the pre-specified deployments.

The authors have shown (see table and figure above) that OFA beats (in energy consumed and accuracy level) other State of the Art (SOTA) and Neural (network) Architectural Search (NAS) approaches to training multiple DNNs.

The report goes on to discuss how OFA could be optimized to support different latency (inferencing response time) requirements as well as diverse hardware architectures (CPU, GPU, FPGA, etc.).

~~~~

When I first heard of OFA DNN, I thought we were on the road to artificial general intelligence but this is much more specialized than that. It’s unclear to me how many AI DNNs have enough different deployment environments to warrant the use of OFA but with the proliferation of AI DNNs for IoT, automobiles, robots, etc. their will come a time soon where OFA DNNs and its competition will become much more important.

Comments

Photo Credit(s):

DNA storage using nicks

Read an article the other day in Scientific American (“Punch card” DNA …) which was reporting on a Nature Magazine Article (DNA punch cards for storing data… ). The articles discussed a new approach to storing (and encoding) data into DNA sequences.

We have talked about DNA storage over the years (most recently, see our Random access DNA object storage post) so it’s been under study for almost a decade.

In prior research on DNA storage, scientists encoded data directly into the nucleotides used to store genetic information. As you may recall, there are two complementary nucleotides A-T (adenine-thymine) and G-C (guanine-cytosine) that constitute the genetic code in a DNA strand. One could use one of these pairs to encode a 1 bit and the other for a 0 bit and just lay them out along a DNA strand.

The main problem with nucleotide encoding of data in DNA is that it’s slow to write and read and very error prone (storing data in DNA separate nucleotides is a lossy data storage). Researchers have now come up with a better way.

Using DNA nicks to store bits

One could encode information in DNA by utilizing the topology of a DNA strand. Each DNA strand is actually made up of a sugar phosphate back bone with a nucleotide (A, C, G or T) hanging off of it, and then a hydrogen bond to its nucleotide complement (T, G, C or A, respectively) which is attached to another sugar phosphate backbone.

It appears that one can deform the sugar phosphate back bone at certain positions and retain an intact DNA strand. It’s in this deformation that the researchers are encoding bits and they call this a “DNA nick”.

Writing DNA nick data

The researchers have taken a standard DNA strand (E-coli), and identified unique sites on it that they can nick to encode data. They have identified multiple (mostly unique) sites for nick data along this DNA, the scientists call “registers” but we would call sectors or segments. Each DNA sector can contain a certain amount of nick data, say 5 to 10 bits. The selected DNA strand has enough unique sectors to record 80 bits (10 bytes) of data. Not quite a punch card (80 bytes of data) but it’s early yet.

Each register or sector is made up of 450 base (nucleotide) pairs. As DNA has two separate strands connected together, the researchers can increase DNA nick storage density by writing both strands creating a sort of two sided punch card. They use this other or alternate (“anti-sense”) side of the DNA strand nicks for the value “2”. We would have thought they would have used the absent of a nick in this alternate strand as being “3” but they seem to just use it as another way to indicate “0” .

The researchers found an enzyme they could use to nick a specific position on a DNA strand called the PfAgo (Pyrococcus furiosus Argonaute) enzyme. The enzyme can de designed to nick distinct locations and register (sectors) along the DNA strand. They designed 1024 (2**10) versions of this enzyme to create all possible 10 bit data patterns for each sector on the DNA strand.

Writing DNA nick data is done via adding the proper enzyme combinations to a solution with the DNA strand. All sector writes are done in parallel and it takes about 40 minutes.

Also the same PfAgo enzyme sequence is able to write (nick) multiple DNA strands without additional effort. So we can replicate the data as many times as there are DNA strands in the solution, or replicating the DNA nick data for disaster recovery.

Reading DNA nick data

Reading the DNA nick data is a bit more complicated.

In Figure 1 the read process starts by by denaturing (splitting dual strands into single strands dsDNA) and then splitting the single strands (ssDNA) up based on register or sector length which are then sequenced. The specific register (sector) sequences are identified in the sequence data and can then be read/decoded and placed in the 80 bit string. The current read process is destructive of the DNA strand (read once).

There was no information on the read time but my guess is it takes hours to perform. Another (faster) approach uses a “two-dimensional (2D) solid-state nanopore membrane” that can read the nick information directly from a DNA string without dsDNA-ssDNA steps. Also this approach is non-destructive, so the same DNA strand could be read multiple times.

Other storage characteristics of nicked DNA

Given the register nature of the nicked DNA data organization, it appears that data can be read and written randomly, rather than sequentially. So nicked DNA storage is by definition, a random access device.

Although not discussed in the paper, it appears as if the DNA nicked data can be modified. That is the same DNA string could have its data be modified (written multiple times).

The researcher claim that nicked DNA storage is so reliable that there is no need for error correction. I’m skeptical but it does appear to be more reliable than previous generations of DNA storage encoding. However, there is a possibility that during destructive read out we could lose a register or two. Yes one would know that the register bits are lost which is good. But some level of ECC could be used to reconstruct any lost register bits, with some reduction in data density.

The one significant advantage of DNA storage has always been its exceptional data density or bits stored per volume. Nicked storage reduces this volumetric density significantly, i.e, 10 bits per 450 (+ some additional DNA base pairs required for register spacing) base pairs or so nicked DNA storage reduces DNA storage volumetric density by at least a factor of 45X. Current DNA storage is capable of storing 215M GB per gram or 215 PB/gram. Reducing this by let’s say 100X, would still be a significant storage density at ~2PB/gram.

Comments?

Picture Credit(s):

Random access, DNA object storage system

Read a couple of articles this week Inching closer to a DNA-based file system in ArsTechnica and DNA storage gets random access in IEEE Spectrum. Both of these seem to be citing an article in Nature, Random access in large-scale DNA storage (paywall).

We’ve known for some time now that we can encode data into DNA strings (see my DNA as storage … and Genomic informatics takes off posts).

However, accessing DNA data has been sequential and reading and writing DNA data has been glacial. Researchers have started to attack the sequentiality of DNA data access. The prize, DNA can store 215PB of data in one gram and DNA data can conceivably last millions of years.

Researchers at Microsoft and the University of Washington have come up with a solution to the sequential access limitation. They have used polymerase chain reaction (PCR) primers as a unique identifier for files. They can construct a complementary PCR primer that can be used to extract just DNA segments that match this primer and amplify (replicate) all DNA sequences matching this primer tag that exist in the cell.

DNA data format

The researchers used a Reed-Solomon (R-S) erasure coding mechanism for data protection and encode the DNA data into many DNA strings, each with multiple (metadata) tags on them. One of tags is the PCR primer tag header, another tag indicates the position of the DNA data segment in the file and an end of data tag that is the same PCR primer tag.

The PCR primer tag was used as sort of a file address. They could configure a complementary PCR tag to match the primer tag of the file they wanted to access and then use the PCR process to replicate (amplify) only those DNA segments that matched the searched for primer tag.

Apparently the researchers chunk file data into a block of 150 base pairs. As there are 2 complementary base pairs, I assume one bit to one base pair mapping. As such, 150 base pairs or bits of data per segment means ~18 bytes of data per segment. Presumably this is to allow for more efficient/effective encoding of data into DNA strings.

DNA strings don’t work well with replicated sequences of base pairs, such as all zeros. So the researchers created a random sequence of 150 base pairs and XOR the file DNA data with this random sequence to determine the actual DNA sequence to use to encode the data. Reading the DNA data back they need to XOR the data segment with the random string again to reconstruct the actual file data segment.

Not clear how PCR replicated DNA segments are isolated and where they are originally decoded (with a read head). But presumably once you have thousands to millions of copies of a DNA segment,  it’s pretty straightforward to decode them.

Once decoded and XORed, they use the R-S erasure coding scheme to ensure that the all the DNA data segments represent the actual data that was encoded in them. They can then use the position of the DNA data segment tag to indicate how to put the file data back together again.

What’s missing?

I am assuming the cellular data storage system has multiple distinct cells of data, which are clustered together into some sort of organism.

Each cell in the cellular data storage system would hold unique file data and could be extracted and a file read out individually from the cell and then the cell could be placed back in the organism. Cells of data could be replicated within an organism or to other organisms.

To be a true storage system, I would think we need to add:

  • DNA data parity – inside each DNA data segment, every eighth base pair would be a parity for the eight preceding base pairs, used to indicate when a particular base pair in eight has mutated.
  • DNA data segment (block) and file checksums –  standard data checksums, used to verify and correct for double and triple base pair (bit) corruption in DNA data segments and in the whole file.
  • Cell directory – used to indicate the unique Cell ID of the cell, a file [name] to PCR primer tag mapping table, a version of DNA file metadata tags, a version of the DNA file XOR string, a DNA file data R-S version/level, the DNA file length or number of DNA data segments, the DNA data creation data time stamp, the DNA last access date-time stamp,and DNA data modification data-time stamp (these last two could be omited)
  • Organism directory – used to indicate unique organism ID, organism metadata version number, organism unique cell count,  unique cell ID to file list mapping, cell ID creation data-time stamp and cell ID replication count.

The problem with an organism cell-ID file list is that this could be quite long. It might be better to somehow indicate a range or list of ranges of PCR primer tags that are in the cell-ID. I can see other alternatives using a segmented organism directory or indirect organism cell to file lists b-tree, which could hold file name lists to cell-ID mapping.

It’s unclear whether DNA data storage should support a multi-level hierarchy, like file system  directories structures or a flat hierarchy like object storage data, which just has buckets of objects data. Considering the cellular structure of DNA data it appears to me more like buckets and the glacial access seems to be more useful to archive systems. So I would lean to a flat hierarchy and an object storage structure.

Is DNA data is WORM or modifiable? Given the effort required to encode and create DNA data segment storage, it would seem it’s more WORM like than modifiable storage.

How will the DNA data storage system persist or be kept alive, if that’s the right word for it. There must be some standard internal cell mechanisms to maintain its existence. Perhaps, the researchers have just inserted file data DNA into a standard cell as sort of junk DNA.

If this were the case, you’d almost want to create a separate, data  nucleus inside a cell, that would just hold file data and wouldn’t interfere with normal cellular operations.

But doesn’t the PCR primer tag approach lend itself better to a  key-value store data base?

Photo Credit(s): Cell structure National Cancer Institute

Prentice Hall textbook

Guide to Open VMS file applications

Unix Inodes CSE410 Washington.edu

Key Value Databases, Wikipedia By ClescopOwn work, CC BY-SA 4.0, Link

Magnonics for configurable electronics

Read an article today in ScienceDaily on [a] New way to write magnetic info … that discusses research done at Imperial College Of London that used a magnetic force microscope (small magnetic probe) to write magnetic fields onto a dense array of nanowires.

Frustrated metamaterials needed

The original research is written up in a Nature article Realization of ground state in artificial kagome spin ice via topological defect driven magnetic writing  (paywall). Unclear what that means but the paper abstract discusses geometrically frustrated magnetic metamaterials.  This is where the physical size or geometrical properties of the materials at the nanometer scale restricts or limits the magnetic states that material can exhibit.

Magnetic storage deals with magnetic material but there are a number of unique interactions of magnetic material when in close (nm) proximity to one another and the way nanowire geometrically frustrated magnetic metamaterials can be magnetized to different magnetic moments which can be exploited for other uses.  These interactions and magnetic moments can be combined to provide electronic circuitry and data storage.

I believe the research provides a proof point that such materials can be written, in close proximity to one another using a magnetic force microscope.

Why it’s important

The key is the potential to create  magnonic circuitry based on the pattern of moments writen into an array of nanowires. In doing so, one can fabricate any electrical circuit. It’s almost like photolithography but without fabs, chemicals, or laser scanners.

At first I thought this could be a denser storage device, but the potential is much greater if electronic circuitry could be constructed without having to fabricate semiconductors. It would seem ideal for testing out circuitry before manufacturing. And ultimately if it could be scaled up, the manufacture/fabrication of electronic circuitry itself could be done using these techniques.

Speed, endurance, write limits?

There was no information in the public article about the speed of writing the “frustrated magnetic metamaterials”. But an atomic force microscope can scan 150×150 micrometers in several minutes. If we assume that a typical chip size today is 150×150 mm, then this would take 1E6 times several minutes, or ~2K days. With multiple scanning force microscopes operating concurrently we could cut this down by a factor of 10 or 100 and maybe someday 1000. 2 days to write any electronic circuit on the order of todays 23nm devices with nanowires and magnetic force microscopes would be a significant advance

Also there was no mention of endurance, write limits or other characteristics we have learned to love with Flash storage. But the assumption is that it can be written multiple times and that the pattern stays around for some amount of time.

How magnetics generate electronic circuits

Neither Wikipedia page, the public article or the paywall articles’ abstract describes how Magnonics can supply electronic circuitry. However both the abstract and the public article discuss applications for this new technology in hardware based neural networks using arrays of densely packed nanowires.

Presumably, by writing different magnetic patterns in these nanowire metamaterials, such patterns can be used to simulate hardware connected neurons. This means that the magnetic information can be overwritten because it can be trained. Also, such magnetic circuits can be constructed to: a) can create different path for electrons to flow through the material; b) can restrict or enhance this electronic flow, and c) can integrate across a number of inputs and determine how electronic flow will proceed from a simulated neuron.

If magnonics can do all that,  it’s very similar to electronic gates today in CPU, GPUs and other electronic circuitry. Maybe it cannot simulate every gate or electronic device that’s found in todays CPUs but it’s a step in the right direction. And magnonics is relatively new. Silicon transistors are over 70 years old and the integrated circuit is almost 60 years old. So in time, magnonics could very well become the next generation of chip technology.

Writing speed is a problem. Maybe if they spun the nanowire array around the magnetic force microscope…

Comments?

Photo Credits:  Real space observation of emergent magnetic monopoles … Nature article

Realization of ground state in artificial kagome spin ice via topological defect driven magnetic writing, Nature article

 

Disk rulz, at least for now

Last week WDC announced their next generation technology for hard drives, MAMR or Microwave Assisted Magnetic Recording. This is in contrast to HAMR, Heat (laser) Assisted Magnetic Recording. Both techniques add energy so that data can be written as smaller bits on a track.

Disk density drivers

Current hard drive technology uses PMR or Perpendicular Magnetic Recording with or without SMR (Shingled Magnetic Recording) and TDMR (Two Dimensional Magnetic Recording), both of which we have discussed before in prior posts.

The problem with PMR-SMR-TDMR is that the max achievable disk density is starting to flat line and approaching the “WriteAbility limit” of the head-media combination.

That is even with TDMR, SMR and PMR heads, the highest density that can be achieved is ~1.1Tb/sq.in. The Writeability limit for the current PMR head-media technology is ~1.4Tb/sq.in. As a result most disk density increases over the past years has been accomplished by adding platters-heads to hard drives.

MAMR and HAMR both seem able to get disk drives to >4.0Tb/sq.in. densities by adding energy to the magnetic recording process, which allows the drive to record more data in the same (grain) area.

There are two factors which drive disk drive density (Tb/sq.in.): Bits per inch (BPI) and Tracks per inch (TPI). Both SMR and TDMR were techniques to add more TPI.

I believe MAMR and HAMR increase BPI beyond whats available today by writing data on smaller magnetic grain sizes (pitch in chart) and thus more bits in the same area. At 7nm grain sizes or below PMR becomes unstable, but HAMR and MAMR can record on grain sizes of 4.5nm which would equate to >4.5Tb/sq.in.

HAMR hurdles

It turns out that HAMR as it uses heat to add energy, heats the media drives to much higher temperatures than what’s normal for a disk drive, something like 400C-700C.  Normal operating temperatures for disk drives is  ~50C.  HAMR heat levels will play havoc with drive reliability. The view from WDC is that HAMR has 100X worse reliability than MAMR.

In order to generate that much heat, HAMR needs a laser to expose the area to be written. Of course the laser has to be in the head to be effective. Having to add a laser and optics will increase the cost of the head, increase the steps to manufacture the head, and require new suppliers/sourcing organizations to supply the componentry.

HAMR also requires a different media substrate. Unclear why, but HAMR seems to require a glass substrate, the magnetic media (many layers) is  deposited ontop of the glass substrate. This requires a new media manufacturing line, probably new suppliers and getting glass to disk drive (flatness-bumpiness, rotational integrity, vibrational integrity) specifications will take time.

Probably more than a half dozen more issues with having laser light inside a hard disk drive but suffice it to say that HAMR was going to be a very difficult transition to perform right and continue to provide today’s drive reliability levels.

MAMR merits

MAMR uses microwaves to add energy to the spot being recorded. The microwaves are generated by a Spin Torque Oscilator, (STO), which is a solid state device, compatible with CMOS fabrication techniques. This means that the MAMR head assembly (PMR & STO) can be fabricated on current head lines and within current head mechanisms.

MAMR doesn’t add heat to the recording area, it uses microwaves to add energy. As such, there’s no temperature change in MAMR recording which means the reliability of MAMR disk drives should be about the same as todays disk drives.

MAMR uses todays aluminum substrates. So, current media manufacturing lines and suppliers can be used and media specifications shouldn’t have to change much (?) to support MAMR.

MAMR has just about the same max recording density as HAMR, so there’s no other benefit to going to HAMR, if MAMR works as expected.

WDC’s technology timeline

WDC says they will have sample MAMR drives out next year and production drives out in 2019. They also predict an enterprise 40TB MAMR drive by 2025. They have high confidence in this schedule because MAMR’s compatabilitiy with  current drive media and head manufacturing processes.

WDC discussed their IP position on HAMR and MAMR. They have 400+ issued HAMR patents with another 100+ pending and 75 issued MAMR patents with 46 more pending. Quantity doesn’t necessarily equate to quality, but their current IP position on both MAMR and HAMR looks solid.

WDC believes that by 2020, ~90% of enterprise data will be stored on hard drives. However, this is predicated on achieving a continuing, 10X cost differential between disk drives and (QLC 3D) flash.

What comes after MAMR is subject of much speculation. I’ve written on one alternative which uses liquid Nitrogen temperatures with molecular magnets, I called CAMR (cold assisted magnetic recording) but it’s way to early to tell.

And we have yet to hear from the other big disk drive leader, Seagate. It will be interesting to hear whether they follow WDC’s lead to MAMR, stick with HAMR, or go off in a different direction.

Comments?

 

Photo Credit(s): WDC presentation

Research reveals ~liquid nitrogen temperature molecular magnets with 100X denser storage


Must be on a materials science binge these days. I read another article this week in Phys.org on “Major leap towards data storage at the molecular level” reporting on a Nature article “Molecular magnetic hysteresis at 60K“, where researchers from University of Manchester, led by Dr David Mills and Dr Nicholas Chilton from the School of Chemistry, have come up with a new material that provides molecular level magnetics at almost liquid nitrogen temperatures.

Previously, molecular magnets only operated at from 4 to 14K (degrees Kelvin) from research done over the last 25 years or so, but this new  research shows similar effects operating at ~60K or close to liquid nitrogen temperatures. Nitrogen freezes at 63K and boils at ~77K, and I would guess, is liquid somewhere between those temperatures.

What new material

The new material, “hexa-tert-butyldysprosocenium complex—[Dy(Cpttt)2][B(C6F5)4], with Cpttt = {C5H2tBu3-1,2,4} and tBu = C(CH3)3“, dysprosocenium for short was designed (?) by the researchers at Manchester and was shown to exhibit magnetism at the molecular level at 60K.

The storage effect is hysteresis, which is a materials ability to remember the last (magnetic/electrical/?) field it was exposed to and the magnetic field is measured in oersteds.

The researchers claim the new material provides magnetic hysteresis at a sweep level of 22 oersteds. Not sure what “sweep level of 22 oersteds” means but I assume a molecule of the material is magnetized with a field strength of 22 oersteds and retains this magnetic field over time.

Reports of disk’s death, have been greatly exaggerated

While there seems to be no end in sight for the densities of flash storage these days with 3D NAND (see my 3D NAND, how high can it go post or listen to our GBoS FMS2017 wrap-up with Jim Handy podcast), the disk industry lives on.

Disk industry researchers have been investigating HAMR, ([laser] heat assisted magnetic recording, see my Disk density hits new record … post) for some time now to increase disk storage density. But to my knowledge HAMR has not come out in any generally available disk device on the market yet. HAMR was supposed to provide the next big increase in disk storage densities.

Maybe they should be looking at CAMMR, or cold assisted magnetic molecular recording (heard it here, 1st).

According to Dr Chilton using the new material at 60K in a disk device would increase capacity by 100X. Western Digital just announced a 20TB MyBook Duo disk system for desktop storage and backup. With this new material, at 100X current densities, we could have 2PB Mybook Duo storage system on your desktop.

That should keep my ever increasing video-photo-music library in fine shape and everything else backed up for a little while longer.

Comments?

Photo Credit(s): Molecular magnetic hysteresis at 60K, Nature article

 

Intel’s Optane (3D Xpoint) SSD specs in the wild

Read an article the other day in Ars Technica (Specs for 1st Intel 3DX SSD…) about a preview of the Intel Octane specs for their 375GB 3D Xpoint (3DX) flash card. The device is NVMe compliant, PCIe Gen3 add in card, that’s in a half height, half length, low profile form factor.

Intel’s Optane SSD vs. the competition

A couple of items from the Intel Optane spec sheet of interest to me as a storage guru:

  • 30 Drive writes per day/12.3 PBW (written) – 3DX, at launch, had advertised that it would have 1000 times the endurance of (2D-MLC?) NAND. Current flash cards (see Samsung SSD PRO NVMe 256GB Flash card specs) offer about 200TBW (for 256GB card) or 400TBW (for 512GB card). The Samsung PRO is based on 3D (V-)NAND, so its endurance is much better than  2D-MLC at these densities. That being said, the Octane drive is still ~40X the write endurance of the PRO 950. Not quite 1000 but certainly significantly better.
  • Sequential (bandwidth) performance (R/W) of 2400/2000 MB/sec – 3DX advertised 1000 times the performance of (2D-MLC,  non-NVMe?) NAND. Current 3D (V-)NAND cards (see Samsung SSD PRO above) above offers (R/W) 2200/900 MB/sec for an NVMe device. The Optane’s read bandwidth is a slight improvement but the write bandwidth is a 2.2X improvement over current competitive devices.
  • Random 4KB IOPs performance (R/W) of 550K/500K – Similar to the previous bulleted item, 3DX advertised 1000 times the performance of (2D-MLC,  non-NVMe?) NAND. Current 3D (V-)NAND cards like the Samsung SSD PRO offer Random 4KB IOPs performance  (R/W) of 270K/85K IOPS (@4 threads). Optane’s read random 4KB IOPs performance is 2X the PRO 950 but its write performance is ~5.9X better.
  • IO latency of <10 µsec. – 3DX advertised 10X better latency than the current (2D-MLC, non-NVMe) flash drives. According to storage review (Samsung 950 Pro M.2), the Samsung PRO 950 had a latency of ~22 µsec. Optane has at least 2X better latency than the current competition.
  • Density 375GB/HH-HL-LP – 3DX advertised 1000X the density of (then current DRAM). Today Micron offers a 4GiB DDR4/288 pin DIMM which is probably 1/2 the size of the HH flash drive. So maybe in the same space this could be 8GiB. This says that the Optane is about 100X denser than today’s DRAM.

Please note, when 3DX was launched, ~2 years ago, the then current NAND technology was 2D-MLC and NVMe was just a dream. So comparing launch claims against today’s current 3D-NAND, NVMe drives is not a fair comparison.

Nevertheless, the Optane SSD performs considerably better than current competitive NVMe drives and has significantly better endurance than current 3D (V-)NAND flash drives. All of which is a great step in the right direction.

What about DRAM replacement?

At launch, 3DX was also touted as a higher density, potential replacement for DRAM. But so far we haven’t seen any specs for what 3DX NVM looks like on a memory bus. It has much better density than DRAM, but we would need to see 3DX memory access times under 50ns to have a future as a DRAM replacement. Optane’s NVMe SSD at 10 µsec. is about 200X too slow, but then again it’s not a memory device configuration nor is it attached to a memory bus.

Comments?

Photo Credit(s):  Intel Optane Spec sheet from Ars Technica Article,  DDR4 DRAM from Wikimedia user:Dsimic

5D storage for humanity’s archive

5D data storage.jpg_SIA_JPG_fit_to_width_INLINEA group of researchers at the University of Southhampton in the UK have  invented a new type of optical recording, based on femto-second laser pulses and silica/quartz media that can store up to 300TB per (1″ diameter) disc platter with thermal stability at up to 1000°C or a media life of up to 13.8B years at room temperature (190°C?). The claim is that the memory device could outlive humanity and maybe the universe.

The new media/recording technique was used recently to create copies of text files (Holy Bible, pictured above). Other significant humanitarian, political and scientific treatise have also been stored on the new media. The new device has been nicknamed “Superman Memory Crystal”, due to the memory glass (quartz) likeness to Superman’s memory crystals.

We have written before on long term archives(See Super Long Term Archive and Today’s data and the 1000 year archive posts) but this one beats them all by many orders of magnitude.
Continue reading “5D storage for humanity’s archive”