Data banks, data deposits & data withdrawals in the data economy – part 1

perspective by anomalous4 (cc) (from Flickr)
Big data visualization, Facebook friend connections
Facebook friend carrousel by antjeverena (cc) (from flickr)

Read an interesting article this week in The Atlantic, Why Technology Favors Tyranny by Yuvai Noah Harari, about the inevitable future of technology and how the use of data will drive it.

At the end of the article Harari talks about the need to take back ownership of our data in order to gain some control over the tech giants that currently control our data.

In part 3, Harari discusses the coming AI revolution and the impact on humanity. Yes there will still be jobs, but early on less jobs for unskilled labor and over time less jobs for skilled labor.

Yet, our data continues to be valuable. AI neural net (NN) accuracy increases as a function of the amount of data used to train it. As a result,  he has the most data creates the best AI NN. This means our data has value and can be used over and over again to train other AI NNs. This all sounds like data is just another form of capital, at least for AI NN training.

If only we could own our data, then there would still be value from people’s (digital) exertions (labor), regardless of how much AI has taken over the reigns of production or reduced the need for human work.Safe by cjc4454 (cc) (from flickr)

Safe by cjc4454 (cc) (from flickr)What we need is data (savings) banks. These banks would hold people’s data, gathered from social media likes/dislikes,  cell phone metadata, app/web history, search history, credit history, purchase history,  photo/video streams, email streams, lab work, X-rays, wearables info, etc. Probably many more categories need to be identified but ultimately ALL the digital data we generate today would need to be owned by people and deposited in their digital bank accounts.

Data deposits?

Social media companies, telecom, search companies, financial services app companies, internet  providers, etc. anywhere you do business should supply a copy of the digital data they gather for a person back to that persons data bank account.

There are many technical problems to overcome here but it could be as simple as an object storage bucket, assigned to each person that each digital business deposits (XML versions of) our  digital data they create for everyone that uses their service. They would do this as compensation for using our data in their business activities.

How to change data ownership?

Today, we all sign user agreements which essentially gives a company the rights to our data in perpetuity. That needs to change. I see a few ways that this change could come about

  1. Countries could enact laws to insure personal data ownership resides in the person generating it and enforce periodic distribution of this data
  2. Market dynamics could impel data distribution, e.g. if some search firm supplied data to us, we would be more likely to use them.
  3. Societal changes, as AI becomes more important to profit making activities and reduces the need for human work, and as data continues to be an important factor in AI success, data ownership becomes essential to retaining the value of human labor in society.

Probably, all of the above and maybe more would be required to change the ownership structure of data.

How to profit from data?

Technical entities needing data to train AI NNs could solicit data contributions through an Initial Data Offering (IDO). IDO’s would specify types of data required and a proportion of AI NN ownership, they would cede to all  data providers. Data providers would be apportioned ownership based on the % identified and the number of IDO data subscribers.

perspective by anomalous4 (cc) (from Flickr)
perspective by anomalous4 (cc) (from Flickr)

Data banks would extract the data requested by the IDO and supply it to the IDO entity for use. For IDOs, just like ICO’s or IPO’s, some would fail and others would succeed. But the data used in them would represent an ownership share sort of like a  stock (data) certificate in the AI NN.

Data bank responsibilities

Data banks would have various responsibilities and would need to collect fees to perform them. For example, data banks would be responsible for:

  1. Protecting data deposits – to insure data deposits are never lost, are never accessed without permission, are always trackable as to how they are used..
  2. Performing data deposits – to verify that data is deposited from proper digital entities, to validate that data deposits are in a usable form and to properly store the data in a customers object storage bucket.
  3. Performing data withdrawals – upon customer request, to extract all the appropriate data requested by an IDO,  anonymize it, secure it, package it and send it to the IDO originator.
  4. Reconciling data accounts – to track data transactions, data banks would supply a monthly statement that identifies all data deposits and data withdrawals, data revenues and data expenses/fees.
  5. Enforcing data withdrawal types – to enforce data withdrawal types, as data  withdrawals can have many different characteristics, such as exclusivity, expiration, geographic bounds, etc. Data banks would need to enforce withdrawal characteristics, at least to the extent they can
  6. Auditing data transactions – to insure that data is used properly, a consortium of data banks or possibly data accountancies would need to audit AI training data sets to verify that only data that has been properly withdrawn is used in trying the NN. .

AI NN, tools and framework responsibilities

In order for personal data ownership to work well, AI NNs, tools and frameworks used today would need to change to account for data ownership.

  1. Generate, maintain and supply immutable data ownership digests – data ownership digests would be a sort of stock registry for the data used in training the AI NN. They would need to be a part of any AI NN and be viewable by proper data authorities
  2. Track data use – any and all data used in AI NN training should be traceable so that proper data ownership can be guaranteed.
  3. Identify AI NN revenues – NN revenues would need to be isolated, identified and accounted for so that data owners could be rewarded.
  4. Identify AI NN data expenses – NN data costs would need to somehow be isolated, identified and accounted for so that data expenses could be properly deducted from data owner awards. .

At some point there’s a need for almost a data profit and loss statement as well as a data balance sheet for at an AI NN level. The information supplied above should make auditing data ownership, use and rewards much more feasible. But it all starts with identifying data ownership and the data used in training the AI.

~~~~

There are a thousand more questions that come to mind. For example

  • Who owns earth sensing satellite, IoT sensors, weather sensors, car sensors etc. data? Everyone in the world (or country) being monitored is laboring to create the environment sensed by these devices. Shouldn’t this sensor data be apportioned to the people of the world or country where these sensors operate.
  • Who pays data bank fees? The generators/extractors of the data could pay in addition to providing data deposits for the privilege to use our data. I could also see the people paying.  Having the company pay would give them an incentive to make the data load be as efficient and complete as possible. Having the people pay would induce them to use their data more productively.
  • What’s a decent data expiration period? Given application time frames these days, 7-15 years would make sense. But what happens to the AI NN when data expires. Some way would need to be created to extract data from a NN, or the AI NN would need to cease being used and a new one would  need to be created with new data.
  • Can data deposits be rented/sold to data aggregators? Sort of like a AI VC partnership only using data deposits rather than money to fund AI startups.
  • What happens to data deposits when a person dies? Can one inherit a data deposits, would a data deposit inheritance be taxable as part of an estate transfer?

In the end, as data is required to train better AI, ownership of our data makes us all be capitalist (datalists) in the creation of new AI NNs and the subsequent advancement of society. And that’s a good thing.

Comments?

 

 

AI processing at the edge

Read a couple of articles over the past few weeks (TechCrunch: Google is making a fast, specialized TPU chip for edge devices … and IEEE Spectrum: Two startups use processing in flash for AI at the edge) about chips for AI at the IoT edge.

The two startups, Syntiant and Mythic, are moving to analog only or analog-digital solutions to provide AI processing needed at the edge while Google is taking their TPU technology to the edge.  We have written about Google’s TPU before (see: TPU and hardware vs. software  innovation (round 3) post).

The major challenge in AI processing at the edge is power consumption. Both  startups attack the power problem by using flash and other analog circuitry to provide power efficient compute.

Google attacked the power problem with their original TPU by reducing computational precision from 64- to 8-bits. By reducing transistor counts, they lowered power requirements proportionally.

AI today is based on neural networks (NN), that connect simulated neurons via simulated synapses with weights attached to indicate whether to boost or decrease the signal being transmitted. AI learning is done by setting those weights and creating the connections between simulated neurons and the synapses.  So learning is setting weights and establishing connections. Actual inferences (using AI to do something) is a process of exciting input simulated neurons/synapses and letting the signal flow through the NN with each weight being used to determine output(s).

AI with standard compute

The problem with doing AI learning or inferencing with normal CPUs or even CUDAs is that the NN does thousands if not millions of  multiplication-accumulation actions at each simulated synapse-neuron connection. Doing all these multiplication-accumulation takes power. CPUs and CUDAs can do these sorts of operations on 32 or 64 bit numbers or even floating point but it still takes power.

AI processing power

AI processing power is measured in trillions of (accumulate-multiply) operations per second per watt (TOPS/W). Mythic believes it can perform 4 TOPS/W and Syntiant says it can do 20 TOPS/W. In comparison, the NVIDIA Volta V100 can do about 0.4 TOPS/W (according to the article). Although  comparing Syntiant-Mythic TOPS to NVIDIA TOPS is a little like comparing apples to oranges.

A current Intel Xeon Platinum 8180M (2.5Ghz, 28 Core processors, 205 W) can probably do (assuming one multiplication-accumulation per hertz) about 2.5 Billion X 28 Cores = 70 Billion Ops Second/205 W or 0.3 GOPS/W (source: Platinum 8180M Data sheet).

As for Google’s TPU TOPS/W, TPU2 is rated at 45 GFLOPS/chip and best guess for power consumption is between 160W and 200W, let’s say 180W. With power at that level, TPU2 should hit 0.25 GFLOPS/W.  TPU3 is coming out with 8X the power but it uses water cooling (read LOTS MORE POWER).

Nonetheless, it appears that Mythic and Syntiant are one to two orders of magnitude better than the best that NVIDIA and TPU2 can do today and many orders of magnitude better than Intel X86.

Improving TOPS/W

Using NAND, as an analog memory to read, write and hold  NN weights is an easy way to reduce power consumption. Combine that with  analog circuitry that can do multiplication and addition with those flash values and you have a AI NN processor. This way you reduce the need to hold weights in memory and do compute in registers by collapsing both compute and memory into the same componentry.

The major difference between Syntiant and Mythic seems to be the amount of analog circuitry they use. Mythic seems to relegate the analog circuitry to an accelerator while Syntiant has a more extensive use of analog circuitry throughout their chip. Probably why it can perform 5X the TOPS/W of Mythic’s IPU.

IBM and others have been working on neuromorphic chips some of which are analog based and others which are all digital based. We’ve written extensively on IBM and some on MIT’s approaches (for the latest on IBM see: More power efficient deep learning through IBM and PCM, and for MIT see: MIT builds an analog synapse chip) and follow the links there to learn more.

~~~~

Special purpose AI hardware is emerging from the labs and finally reaching reality. IBM R&D has been playing with it for a long time. Google is working on TPU3 so there’s no stopping them. And startups are seeing an opening and are taking everyone on. Stay tuned, were in for a good long ride before the someone rises above the crowd and becomes the next chip giant.

Comments?

 

Photo Credit(s): TechCrunch  Google is making a fast, specialized TPU chip for edge devices … article

Introduction to Digital Design Verification at Mythic, Medium.com Article

Images from Google Cloud Platform Blog on the TPU

Two startups use processing in flash for AI at the edge, IEEE Spectrum article courtesy of Mythic

Skyrmion and chiral bobber solitons for racetrack storage

Read an article this week in Science Daily (Magnetic skyrmions: Not the only one of their class; …) about new magnetic structures that could lend themselves to creating a new type of moving, non-volatile storage.  (There’s more information in the press release and the Nature paper [DOI: 10.1038/s41565-018-0093-3], behind a paywall).

Skyrmions and chiral bobbers are both considered magnetic solitons, types of magnetic structures only 10’s of nm wide, that can move around, in sort of a race track configuration.

Delay line memories

Early in computing history, there was a type of memory called a delay line memory which used various mechanisms (mercury, magneto-resistence, capacitors, etc.) arranged along a circular line such as a wire, and had moving pulses of memory that raced around it. .

One problem with delay line memory was that it was accessed sequentially rather than core which could be accessed randomly. When using delay lines to change a bit, one had to wait until the bit came under the read/write head . It usually took microseconds for a bit to rotate around the memory line and delay line memories had a capacity of a few thousand bits 256-512 bytes per line,  in today’s vernacular.

Delay lines predate computers and had been used for decades to delay any electronic or acoustic signal before retransmission.

A new racetrack

Solitons are being investigated to be used in a new form of delay line memory, called racetrack memory. Skyrmions had been discovered a while ago but the existence of chiral bobbers was only theoretical until researchers discovered them in their lab.

Previously, the thought was that one would encode digital data with only skyrmions and spaces. But the discovery of chiral bobbers and the fact that they can co-exist with skyrmions, means that chiral bobbers and skyrmions can be used together in a racetrack fashion to record digital data.  And the fact that both can move or migrate through a material makes them ideal for racetrack storage.

Unclear whether chiral bobbers and skyrmions only have two states or more but the more the merrier for storage. I am assuming that bit density or reliability is increased by having chiral bobbers in the chain rather than spaces.

Unlike disk devices with both rotating media and moving read-write heads, the motion of skyrmion-chiral bobber racetrack storage is controlled by a very weak pulse of current and requires no moving/mechanical parts prone to wear/tear. Moreover, as a solid state devices, racetrack memory is not sensitive to induced/organic vibration or shock,  So, theoretically these devices should have higher reliability than disk devices.

There was no information comparing the new racetrack memory reliability to NAND or 3D Crosspoint/PCM SSDs, but there may be some advantage here as well. I suppose one would need to understand how to miniaturize the read-erase-write head to the right form factor for nm racetracks to understand how it compares.

And I didn’t see anything describing how long it takes to rotate through bits on a skyrmion-chiral bobber racetrack. Of course, this would depend on the number of bits on a racetrack, but some indication of how long it takes one bit to move, one postition on the racetrack would be helpful to see what its rotational latency might be.

~~~~

At the moment, reading and writing skyrmions and the newly discovered chiral bobbers takes a lot of advanced equipment and is only done in major labs. As such, I don’t see a skyrmion-chiral bobber racetrack storage device arriving on my desktop anytime soon. But the fact that there’s a long way to go before, we run out of magnetic storage options, even if it is on a chip rather than magnetic media,  is comforting to know. Even if we don’t ever come up with an economical way to produce it.

I wonder if you could synchronize rotational timing across a number of racetrack devices, at least that way you could be reading/erasing/writing a whole byte, word, double word etc, at a time, rather than a single bit.

Comments?

Photo Credit(s): From Experimental observation of chiral magnetic bobbers in B20 Type FeGe paper

From Experimental observation of chiral magnetic bobbers in B20 Type FeGe paper

From Timeline of computer history Magnetoresistive delay lines

From Experimental observation of chiral magnetic bobbers in B20 Type FeGe paper

MIT’s new Navion chip for better Nano drone navigation

Read an article this week in Science Daily (Chip upgrade help’s bee-sized drones navigate) about a recent chip created by MIT, called Navion, that reduces size and power consumption for electronics used in drone navigation. The chip is also documented on MIT’s Navion project homepage and in a technical  paper describing the new VIO (Visual-Inertial Odometry ) Navion chip.

The Navion chip can perform inertial measurement at 52Khz as well as process video streams of 752×480 stereo images at 171 frames per second in a 20 sqmm package consuming only 24mW of power. The chip was fabricated on a 65nm CMOS process line.

Navion is the result of a collaborative design process which optimized electronics required to perform  drone navigation processing. By placing all the memory required for inertial measurement and image analysis and all the processing hardware on the same chip, they have substantially reduced power consumption and space requirements for drone navigation.

Navion architecture

Navion uses a state of the art, non-linear factor graph optimization algorithm to navigate in space.  It doesn’t sound like  DL neural net image recognition but more like a statistical/probabilistic approach to image mapping and place estimation. The chip uses image compression, two stage memory, and sparse linear solver memory to reduce image processing memory requirements from 3.5MB to less than 1MB.

The chip uses 3 inputs: two images (right &  left image) and IMU (inertial management unit sensor) and has one (complex output), its estimate of the current state of where it is on the map.

Navion processing creates and maintains a 3D map using stereo images and provides navigational support to move through that space.  According to the paper, the Navion chip updates the state(s) and sparse 3D map at a KF (Kalman filter) rate of between 16 and 90 fps. Navion also offers configurations options to maximize accuracy, throughput or energy efficiency.

Navion compares well to other navigation electronics

The table shows comparisons of the Navion chip against other traditional navigational systems that use Xeon, ARM or FPGA chips. As far as I can tell it’s either much better or at least on a par with these other larger, more complex, power hungry systems.

Nano drones are coming to our space, sooner than anyone expects.

Comments?

Photo credit(s): System overview from Navion project page (c) 2018 MIT;

Picture of chip with layout  from Navion project page (c) 2018 MIT;

Navion: A Fully Integrated Energy-Efficient Visual-Inertial Odometry Accelerator for Autonomous Navigation of Nano Drones (c) 2018 MIT

Stanford Data Lab students hit the ground running…

Read an article (Students confront the messiness of data) today about Stanford’s Data Lab  and how their students are trained to cleanup and analyze real world data.

The Data Lab teaches two courses the Data Challenge Lab course and the Data Impact Lab course. The Challenge Lab is an introductory course in data gathering, cleanup and analysis. The Impact Lab is where advanced students tackle real world, high impact problems through data analysis.

Data Challenge Lab

Their Data Challenge Lab course is a 10 week course with no pre-requisites that teaches students how to analyze real world data to solve problems.

Their are no lectures. You’re given project datasets and the tools to manipulate, visualize and analyze the data. Your goal is to master the tools, cleanup the data and gather insights from the data. Professors are there to provide one on one help so you can step through the data provided and understand how to use the tools.

In the information provided on their website there were no references and no information about the specific tools used in the Data Challenge Lab to manipulate, visualize and analyze the data. From an outsiders’ viewpoint it would be great to have a list of references or even websites describing the tools being used and maybe the datasets that are accessed.

Data Impact Lab

The Data Impact lab course is an independent study course, whose only pre-req is the Data Challenge Lab.

Here students are joined into interdisplinary teams with practitioner partners to tackle ongoing, real world problems with their new data analysis capabilities.

There is no set time frame for the course and it is a non-credit activity. But here students help to solve real world problems.

Current projects in the Impact lab include:

  • The California Poverty Project  to create an interactive map of poverty in California to supply geographic guidance to aid agencies helping the poor
  • The Zambia Malaria Project to create an interactive map of malarial infestation to help NGOs and other agencies target remediation activity.

Previous Impact Lab projects include: the Poverty Alleviation Project to provide a multi-dimensional index of poverty status for areas in Kenya so that NGOs can use these maps to target randomized experiments in poverty eradication and the Data Journalism Project to bring data analysis tools to breaking stories and other journalistic endeavors.

~~~~

Courses like these should be much more widely available. It’s almost the analog to the scientific method, only for the 21st century.

Science has gotten to a point, these days, where data analysis is a core discipline that everyone should know how to do. Maybe it doesn’t have to involve Hadoop but rudimentary data analysis, manipulation, and visualization needs to be in everyone’s tool box.

Data 101 anyone?

Photo Credit(s): Big_Data_Prob | KamiPhuc;

Southbound traffic speeds on Masonic avenue on different dates | Eric Fisher;

Unlucky Haiti (1981-2010) | Jer Thorp;

Bristol Cycling Level by Wards 2011 | Sam Saunders

Information flows everywhere – part 1

Read an article today from Scientific American (Sewage is helping cities flush out the opioid crisis) about how using chemical analysis of wastewater can be used to assess the extent of the opioid crisis in their city.

Wastewater information highway

There’s a lab at ASU (Arizona State University) that chemically analyzes samples of wastewater to determine the amount of drugs that a city’s population excretes. They can provide a near real-time assessment of the proportion of drugs in city sewage and thereby, in a city’s population.

The problem with public drug use surveys and hospital data gathering is that they take time.  Moreover, surveys and hospital data gathering typically come long after drugs problem have become a serious problem in a city’s population.

Wastewater sample drug analysis can be done in a matter of days and can be redone as often as needed. Such data could be used to track intervention activities and see if they have a real impact (positive or negative) on drug use in a population.

Neighborhood health

In addition, by sampling sewage at a neighborhood level, one can gain an assessment of drug problems at any sub-division of a city that’s needed.

The above article talks about an MIT program with Cary, NC (from Biobot.io)  that is designing robots to traverse sewer pipes and analyze wastewater chemical makeup in real time, reporting this back to ground stations around the city.

With such an approach, one could almost zero in (depending on sewer pipe networks) on any neighborhood in a city, target specific interventions at that level and measure impact in (digestion delayed) real time. Doing so, cities or states for that matter, could  experiment with different interventions on a neighborhood by neighborhood basis and gain statistical evidence on drug problem intervention effectiveness.

But, you can analyze wastewater for any number of variables, such as viruses, bacteria, enzymes, etc. Any of which can lead to a better understanding of a population’s health.

~~~~

Two things I want to leave you with:

First, public health has had a major impact on human health and has doubled our lifespan in 200 years. All modern cities have water treatment plants today to insure water quality and thereby, have reduced the incidence of cholera and other waterborne epidemics in their cities. Wastewater analysis has the potential for significant improvements in population health monitoring. Just like water treatment, wastewater analysis will someday become common public health practice in modern cities throughout the world.

Second, I was at a conference this week which presented a slide that there was no cold data anymore (Pure//Accelerate 2018). This was in reference to  re-analyzing old, cold data can often lead to insights and process improvements that were not obvious at first glance.

But it’s not just data anymore. Any activity done by man needs to be analyzed for (inherent & invisible) information flows that could be extracted to make the world a better place.

Photo Credit(s):

Random access, DNA object storage system

Read a couple of articles this week Inching closer to a DNA-based file system in ArsTechnica and DNA storage gets random access in IEEE Spectrum. Both of these seem to be citing an article in Nature, Random access in large-scale DNA storage (paywall).

We’ve known for some time now that we can encode data into DNA strings (see my DNA as storage … and Genomic informatics takes off posts).

However, accessing DNA data has been sequential and reading and writing DNA data has been glacial. Researchers have started to attack the sequentiality of DNA data access. The prize, DNA can store 215PB of data in one gram and DNA data can conceivably last millions of years.

Researchers at Microsoft and the University of Washington have come up with a solution to the sequential access limitation. They have used polymerase chain reaction (PCR) primers as a unique identifier for files. They can construct a complementary PCR primer that can be used to extract just DNA segments that match this primer and amplify (replicate) all DNA sequences matching this primer tag that exist in the cell.

DNA data format

The researchers used a Reed-Solomon (R-S) erasure coding mechanism for data protection and encode the DNA data into many DNA strings, each with multiple (metadata) tags on them. One of tags is the PCR primer tag header, another tag indicates the position of the DNA data segment in the file and an end of data tag that is the same PCR primer tag.

The PCR primer tag was used as sort of a file address. They could configure a complementary PCR tag to match the primer tag of the file they wanted to access and then use the PCR process to replicate (amplify) only those DNA segments that matched the searched for primer tag.

Apparently the researchers chunk file data into a block of 150 base pairs. As there are 2 complementary base pairs, I assume one bit to one base pair mapping. As such, 150 base pairs or bits of data per segment means ~18 bytes of data per segment. Presumably this is to allow for more efficient/effective encoding of data into DNA strings.

DNA strings don’t work well with replicated sequences of base pairs, such as all zeros. So the researchers created a random sequence of 150 base pairs and XOR the file DNA data with this random sequence to determine the actual DNA sequence to use to encode the data. Reading the DNA data back they need to XOR the data segment with the random string again to reconstruct the actual file data segment.

Not clear how PCR replicated DNA segments are isolated and where they are originally decoded (with a read head). But presumably once you have thousands to millions of copies of a DNA segment,  it’s pretty straightforward to decode them.

Once decoded and XORed, they use the R-S erasure coding scheme to ensure that the all the DNA data segments represent the actual data that was encoded in them. They can then use the position of the DNA data segment tag to indicate how to put the file data back together again.

What’s missing?

I am assuming the cellular data storage system has multiple distinct cells of data, which are clustered together into some sort of organism.

Each cell in the cellular data storage system would hold unique file data and could be extracted and a file read out individually from the cell and then the cell could be placed back in the organism. Cells of data could be replicated within an organism or to other organisms.

To be a true storage system, I would think we need to add:

  • DNA data parity – inside each DNA data segment, every eighth base pair would be a parity for the eight preceding base pairs, used to indicate when a particular base pair in eight has mutated.
  • DNA data segment (block) and file checksums –  standard data checksums, used to verify and correct for double and triple base pair (bit) corruption in DNA data segments and in the whole file.
  • Cell directory – used to indicate the unique Cell ID of the cell, a file [name] to PCR primer tag mapping table, a version of DNA file metadata tags, a version of the DNA file XOR string, a DNA file data R-S version/level, the DNA file length or number of DNA data segments, the DNA data creation data time stamp, the DNA last access date-time stamp,and DNA data modification data-time stamp (these last two could be omited)
  • Organism directory – used to indicate unique organism ID, organism metadata version number, organism unique cell count,  unique cell ID to file list mapping, cell ID creation data-time stamp and cell ID replication count.

The problem with an organism cell-ID file list is that this could be quite long. It might be better to somehow indicate a range or list of ranges of PCR primer tags that are in the cell-ID. I can see other alternatives using a segmented organism directory or indirect organism cell to file lists b-tree, which could hold file name lists to cell-ID mapping.

It’s unclear whether DNA data storage should support a multi-level hierarchy, like file system  directories structures or a flat hierarchy like object storage data, which just has buckets of objects data. Considering the cellular structure of DNA data it appears to me more like buckets and the glacial access seems to be more useful to archive systems. So I would lean to a flat hierarchy and an object storage structure.

Is DNA data is WORM or modifiable? Given the effort required to encode and create DNA data segment storage, it would seem it’s more WORM like than modifiable storage.

How will the DNA data storage system persist or be kept alive, if that’s the right word for it. There must be some standard internal cell mechanisms to maintain its existence. Perhaps, the researchers have just inserted file data DNA into a standard cell as sort of junk DNA.

If this were the case, you’d almost want to create a separate, data  nucleus inside a cell, that would just hold file data and wouldn’t interfere with normal cellular operations.

But doesn’t the PCR primer tag approach lend itself better to a  key-value store data base?

Photo Credit(s): Cell structure National Cancer Institute

Prentice Hall textbook

Guide to Open VMS file applications

Unix Inodes CSE410 Washington.edu

Key Value Databases, Wikipedia By ClescopOwn work, CC BY-SA 4.0, Link