Dits, codons & chromosomes – the storage of life

All is One, the I-ching and Genome case by TheAlieness (cc) (from flickr)
All is One, the I-ching and Genome case by TheAlieness (cc) (from flickr)

I was thinking the other day that DNA could easily be construed as information storage for life.  For example, DNA uses 4-distinct nucleic acids (A, C, G, & U) as its basic information unit.  I would call these units of DNA information as Dits (for DNA digITs) and as such, DNA uses a base-4 number system.

Next in data storage parlance comes the analogue for the binary byte that holds 8-bits.  In the case of DNA the term to use is the codon, a three nucleic-acid (or 3-Dit) unit which codes for one of the 20 amino acids used in life, not unlike how a byte of data defines an ASCII character.  With 64 possibilities in a codon, there is some room for amino acid encoding overlap and to encode for other mechanisms beyond just amino acids (see chart above for amino-acid codon encoding).  I envision something akin to ASCII non-character codes such as STX (DNA-AUG), ETX (DNA-UAA, -UAG & -UGA), etc. which for DNA would define non-amino acid encoding DNA codons.

DNA is stored in two strips, each one a complementary image of the other strand.  In data storage terminology we would consider this a form of data protection somewhat similar to RAID1. Perhaps we should call this -RAID1 as it’s complementary storage.

DNA chromosomes seem to exist primarily as a means to read-out codons.  It seems the chromosomes are split, read sequentially, duplicated into intermediate mRNA and then these intermediate mRNA forms, with the help of enzymes are converted into the proteins of life.  Chromosomes would correspond to data blocks in standard IT terminology as they are read as a single unit and read sequentially.  However, they are variable in length and seem to carry with them some historical locality of reference information but this is only my perception.  mRNA might be considered as a storage cache for DNA data, although it’s unclear whether mRNA is read multiple times or used just once.

The cell or rather the cell nucleus could be construed as an information (data) storage device where DNA blocks or chromosomes are held.  However when it comes to Dits as in bits there are multiple forms of storage devices.  For example, it turns out that DNA can exist outside of the cell nucleus in the form of mitochondrial DNA.  I like to think of mitochondral DNA as similar to storage device firmware as they encode for the proteins needed to supply energy to the cell.

The similarity to data storage starts to breakdown at this point.  DNA is mostly WORM (Write-Once-Read-Many times) tape-like media and is not readily changed except through mutation/evolution (although recent experiments to construct artificial DNA belie this fact).  As such, DNA is mostly exact copies of other DNA within an organism or across organisms within the same species (except for minor individualization changes).  Across species, DNA is readily copied and we find that human DNA has a high (94%) proportion of similarity to chimp DNA and less percentage to other mammalian DNA.

For DNA, I see nothing like storage subsystems that hold multiple storage devices with different (data) information on them.  Perhaps seed banks might qualify for plant DNA but these seem a somewhat artificial construct for life storage subsystems.  However, as I watch the dandelion puffs pass by my back porch there seems to be some rough semblance of cloud storage going on as they look omnipresent, ephemeral, but with active propagation (or replication), not unlike the cloud storage that exists today.  Perhaps my environmentalist friends would call the ecosystem a life storage subsystem as it retains multiple DNA instances or species.

Science tell us that human DNA has ~3B (3×10**9) base pairs or ~1B codons.  To put this into data storage perspective, human DNA holds ~64GB of data.  Density wise, human DNA aligned end to end stands about ~8.5cm long and at that length it’s about 620 million bits per mm or over 45,000 times the density of an LTO-4 tape and roughly half that for LTO-5 tape.

It’s fairly amazing to me that something as marvelous as a human being can be constructed using only 64GB of data.  I now have an unrestrained urge to want to copy my DNA so I can back it up offline, to some other non-life media.  But it’s not clear what I could do with it other than that and restore seems somewhat problematic at best…