Data-at-rest security

Safe by cjc4454 (cc) (from flickr)
Safe by cjc4454 (cc) (from flickr)

Although we have discussed securing data in the cloud before but we have not discussed IT data security in general.  I count at least 6 different places one can secure IT data-at-rest today.  In most cases, one has some sort of system to provide encryption/decryption services and some way to get encryption keys, generated, stored, and securely retrieved by this system.  All these systems use symmetric key cryptography where the same key is used for encryption and decryption purposes.   Approaches to IT data-at-rest security include data encryption performed as follows:

  • Drive level
  • Subsystem-based
  • Network-based
  • Appliance-based
  • HBA-based
  • Host-based.

Drive level encryption

For tape transports drive level encryption has been around since LTO-4 and previously with other proprietary tape formats. For disk, data encryption capabilities have been around for a long time in the consumer space and lately has been introduced into enterprise storage as well.

Encryption key management is critical to securing any drive level encryption.  Key management can be supplied either externally by some sort of standalone key management software/appliance or internally from the tape library or disk subsystem controller itself.

The reasons for tape drive encryption are fairly substantial, tapes in transit can be lost or stolen. Similarly, disks can be replaced/stolen from enterprise storage subsystems and as such are subject to the same security concerns as tape volumes.  As drive encryption is typically performed by special purpose hardware,  it can operate with almost no overhead and thus, little impact to storage performance.

Disk subsystem-based encryption

Although there are only a few current implementations of this capability,  data encryption/decryption could easily be done entirely at the subsystem level with key management available external or internal to the subsystem.  Most likely this would be considered a software cryptographic solution but hardware could also be supplied to encrypt/decrypt data.  With a software implementation, the impact on storage performance (especially, read back) might be considerable.

A couple of years ago, EMC, HDS and others added “secure data erasure” for disks or subsystems going out of service.  However, this does nothing for operating data-at-rest security.

Network-based encryption

Both Cisco and Brocade offer data security services in the SAN or storage network facilities.  Such capabilities will encrypt and decrypt data going to or from LUNs and/or tape drives.  Key management can be supplied externally as well as internally to the networking equipment.  Both Cisco and Brocade SAN encryption servicesare hardware encryption solutions and as such, operate at line speed with high throughput.

Appliance-based encryption

In the past, a number of companies offered appliance or standalone hardware based encryption which places the data security appliance within the data path somewhere between the host and its storage devices.  Such solutions have been falling behind or recently been replaced by network based encryption solutions but still have a significant install base.   Key management can be supplied internal to the appliance or externally.  All appliance based encryption solutions support dedicated hardware for encryption/decryption of data.

HBA-based encryption

Last month EMC announced a new capability for their CLARiiON storage which operates in conjunction with Emulex HBAs to offer hardware HBA-based encryption for data.  This solution is an interesting in that it’s almost host based, hardware solution and should have little to no impact on storage performance.  Key management is supplied external to the HBA.

Host-based encryption

Host encryption has been available in the consumer and enterprise space for a number of years.  Such services have seen much success with laptop data.  Host based services are available from operating system vendors or special purpose applications.  In the consumer space products such as PGP (recently purchased by Symantec) have been available for over a decade, similar capabilities exist in the enterprise space via special purpose “secure” file systems and other applications.  Most host based cryptographic systems use software based algorithms.  Although hardware host-based services are available in the mainframe, System z environment via cryptographic co-processors and the latest versions of Intel’s advanced processors with their instruction set extensions for AES encryption support.

Other data-at-rest security considerations

From a performance perspective, hardware encryption can have the least impact but it’s very expensive.  In addition, drive level encryption is probably the most scaleable as the more drives you have, the more encryption throughput can be supported.  Next comes the appliance or network based encryption solutions which can be scaled by purchasing more appliances or encryption blades/switches.

In contrast, software based services perform the worst but are easiest to deploy.  Most consumer O/Ss support data encryption with a simple configuration change.  Software solutions are the least expensive as well because there is no hardware to purchase.  Software based solutions can also be scaled but only be adding more servers/subsystems.

In any event, key management cannot be overlooked for any data-at-rest security solution.  Given the strength of modern day encryption algorithms, the loss of a data key is equivalent to the loss of all data encrypted with that key.  So when considering key management, one should look for support of key archives, redundant key managers, key hierarchies and other advanced characteristics that make key access continuously available and disaster proof.

Data security is certainly feasible with any of these solutions. But performance, availability and ease of management must be understood before seriously considering any data-at-rest security regimin.

Dits, codons & chromosomes – the storage of life

All is One, the I-ching and Genome case by TheAlieness (cc) (from flickr)
All is One, the I-ching and Genome case by TheAlieness (cc) (from flickr)

I was thinking the other day that DNA could easily be construed as information storage for life.  For example, DNA uses 4-distinct nucleic acids (A, C, G, & U) as its basic information unit.  I would call these units of DNA information as Dits (for DNA digITs) and as such, DNA uses a base-4 number system.

Next in data storage parlance comes the analogue for the binary byte that holds 8-bits.  In the case of DNA the term to use is the codon, a three nucleic-acid (or 3-Dit) unit which codes for one of the 20 amino acids used in life, not unlike how a byte of data defines an ASCII character.  With 64 possibilities in a codon, there is some room for amino acid encoding overlap and to encode for other mechanisms beyond just amino acids (see chart above for amino-acid codon encoding).  I envision something akin to ASCII non-character codes such as STX (DNA-AUG), ETX (DNA-UAA, -UAG & -UGA), etc. which for DNA would define non-amino acid encoding DNA codons.

DNA is stored in two strips, each one a complementary image of the other strand.  In data storage terminology we would consider this a form of data protection somewhat similar to RAID1. Perhaps we should call this -RAID1 as it’s complementary storage.

DNA chromosomes seem to exist primarily as a means to read-out codons.  It seems the chromosomes are split, read sequentially, duplicated into intermediate mRNA and then these intermediate mRNA forms, with the help of enzymes are converted into the proteins of life.  Chromosomes would correspond to data blocks in standard IT terminology as they are read as a single unit and read sequentially.  However, they are variable in length and seem to carry with them some historical locality of reference information but this is only my perception.  mRNA might be considered as a storage cache for DNA data, although it’s unclear whether mRNA is read multiple times or used just once.

The cell or rather the cell nucleus could be construed as an information (data) storage device where DNA blocks or chromosomes are held.  However when it comes to Dits as in bits there are multiple forms of storage devices.  For example, it turns out that DNA can exist outside of the cell nucleus in the form of mitochondrial DNA.  I like to think of mitochondral DNA as similar to storage device firmware as they encode for the proteins needed to supply energy to the cell.

The similarity to data storage starts to breakdown at this point.  DNA is mostly WORM (Write-Once-Read-Many times) tape-like media and is not readily changed except through mutation/evolution (although recent experiments to construct artificial DNA belie this fact).  As such, DNA is mostly exact copies of other DNA within an organism or across organisms within the same species (except for minor individualization changes).  Across species, DNA is readily copied and we find that human DNA has a high (94%) proportion of similarity to chimp DNA and less percentage to other mammalian DNA.

For DNA, I see nothing like storage subsystems that hold multiple storage devices with different (data) information on them.  Perhaps seed banks might qualify for plant DNA but these seem a somewhat artificial construct for life storage subsystems.  However, as I watch the dandelion puffs pass by my back porch there seems to be some rough semblance of cloud storage going on as they look omnipresent, ephemeral, but with active propagation (or replication), not unlike the cloud storage that exists today.  Perhaps my environmentalist friends would call the ecosystem a life storage subsystem as it retains multiple DNA instances or species.

Science tell us that human DNA has ~3B (3×10**9) base pairs or ~1B codons.  To put this into data storage perspective, human DNA holds ~64GB of data.  Density wise, human DNA aligned end to end stands about ~8.5cm long and at that length it’s about 620 million bits per mm or over 45,000 times the density of an LTO-4 tape and roughly half that for LTO-5 tape.

It’s fairly amazing to me that something as marvelous as a human being can be constructed using only 64GB of data.  I now have an unrestrained urge to want to copy my DNA so I can back it up offline, to some other non-life media.  But it’s not clear what I could do with it other than that and restore seems somewhat problematic at best…