Learning to live with lattices or say goodbye to security

safe 'n green by Robert S. Donovan (cc) (from flickr)
safe ‘n green by Robert S. Donovan (cc) (from flickr)

Read an article the other day in Quantum Magazine: A tricky path to quantum encryption about the problems that will occur in current public key cryptology (PKC) schemes when quantum computing emerges over the next five to 30 years.  With advances in quantum computing our current PKC scheme that depends on the difficulty of factoring large numbers will be readily crackable. At that time, all current encrypted traffic, used by banks, the NSA, the internet, etc. will no longer be secure.

NSA, NIST, & ETSI looking at the problem

So there’s a search on for quantum-resistant cryptology (see this release from ETSI [European Telecommunications Standard Institute], this presentation from NIST [{USA} National Institute of Standards &Technology], and this report from Schneier on Security on NSA’s [{USA} National Security Agency] Plans for Post-Quantum world ). There are a number of alternatives being examined by all these groups but the most promising at the moment depends on multi-dimensional (100s of dimensions) mathematical lattices.

Lattices?

According to Wikipedia a lattice is a 3-dimensional space of equidistant points. Apparently, for security reasons, they had to increase the number of dimensions significantly beyond 3.

A secret is somehow inscribed in a route (vector) through this 500-dimensional lattice between two points: an original  point (the public key) in the lattice and another arbitrary point, somewhere nearby in the lattice. The problem from a cryptographic sense is that finding a route, in a 500 dimensional lattice, is a difficult task when you only have one of the points.

But can it be efficient for digital computers of today to use?

So the various security groups have been working on divising efficient algorithms for multi-dimensional public key encryption over the past decade or so. But they have run into a problem.

Originally, the (public) keys for a 500-dimensional lattice PKC were on the order of MBs, so they have been restricting the lattice computations to utilize smaller keys and in effect reducing the complexity of the underlying lattice. But in the process they have now reduced the security of the lattice PKC scheme. So they are having to go back to longer keys, more complex lattices and trying to ascertain which approach leaves communications secure but is efficient enough to implement by digital computers and communications links of today.

Quantum computing

The problem is that quantum computers provide a much faster way to perform certain calculations like factoring a number. Quantum computing can speed up this factorization, by on the order of the square root of a number, as compared to normal digital computing of today.

Its possible that similar quantum computing calculations for lattice routes between points could also be sped up by an equivalent factor.  So even when we all move to lattice based PKC, it’s still possible for quantum computers to crack the code hopefully, it just takes longer.

So the mathematics behind PKC will need to change over the next 5 years or so as quantum computing becomes more of a reality. The hope is that this change will will at least keep our communications secure, at least until the next revolution in computing comes along, or quantum computing becomes even faster than that envisioned today.

Comments?

Peak code, absurd

Read a post the other day that said we would soon reach Peak Code (see ROUGH TYPE Peak Code? post). In his post, Nick Carr discussed a NBER paper (see Robots Are Us: Some Economics of Human Replacement, purchase required). The paper implied we will shortly reach peak code because of the proliferation of software reuse and durability which will lead to less of a need for software engineers/coders.

Peak code refers to a maximum amount of code produced in a year that will be reached at some point, afterwards, code production will decline.

Software durability, hogwash

Let’s dispense with the foolish first – durability. Having been a software engineer, and managed/supervised massive (>1MLoC) engineering developments over my 30 years in the industry, code is anything but durable. Fragile yes, durable no.

Code fixes beget other bugs, often more substantial than the original. System performance is under constant stress, lest the competition take your market share. Enhancements are a never ending software curse.

Furthermore, hardware changes constantly, as components go obsolete, new processors come online, IO changes, etc. One might think new hardware would be easy  to accommodate. But you would be sadly mistaken.

New processors typically come with added enhancements beyond speed or memory size that need to be coded for. New IO busses often require significant code “improvements” to use effectively. New hardware today is moving to more cores, which makes software optimization even more difficult.

On all the projects I was on, code counts never decreased. This was mostly due to enhancements, bug fixes, hardware changes and performance improvements.

Software’s essential difference is that it is unbounded by any physical reality. Yes it has to fit in memory, yes it must execute instructions, yes it performs IO with physical devices/memory/busses. But these are just transient limitations, not physical boundaries. They all go away or change after the next generation hardware comes out, every 18 months or so.

So software grows to accommodate any change, any fix, any enhancement that can be dreamed up by man, women or beast. Software is inherently, not durable and is subject to too many changes which most often leads to increased fragility, not durability.

Software reuse, maybe

I am on less firm footing here. Code reuse is wonderful for functionality that has been done before. If adequate documentation exists, if interfaces are understandable, if you don’t mind including all the tag-along software needed to reuse the code, then reuse is certainly viable.

But, reusing software component often requires integration work, adding or modifying code to work with the module. Yes there may be less code to generate and potentially, validate/test. But, you still have to use the new function somewhere.

And Linux, OpenStack, Hadoop, et al, are readily reusable for organizations that need OS, cloud services or big data. But these things don’t operate in a vacuum. Somebody needs to code a Linux application that views, adds, changes or deletes data somewhere.  Somebody needs to write that cloud service offering which runs under OpenStack that services and moves data across the network. Somebody needs to code up MapReduce, MapR or Spark modules to take unstructured data and do something with it.

Yes there are open source applications, cloud services, and MapReduce packages for standardized activities. But these are the easy, already done parts and seldom suffice in and of themselves for what needs to be done next. Often, even using these as is requires some modifications to run on your data, your applications, and in your environment.

So, does software reuse diminish requirements for new coding, yes. Does software reuse eliminate the need for new code, definitely not.

Coding Automation, yes

Coding automation could diminish the need for new software engineers/coders. However, this would  be equivalent to human level artificial intelligence and would eliminate the need for coders/software engineers, if and when it becomes available. But if anything this would lead to a proliferation of ever more (automated) code, not less. So it’s not peak code as much as peak coders. Hopefully, I won’t see this transpire anytime soon.

So as far as I’m concerned peak code is never going to happen and when peak coders does happen, if ever we will have worse problems to contend with (see my post on Existential Threats).

Comments?

Photo Credit(s): PDX Creative Coders by Bill Automata 

Existential threats – ASI part 1

Not sure why but lately I have been hearing a lot about existential events. These are events that threaten the existence of humanity itself.

Massive Solar Storm

A couple of days ago I read about the Carrington Event which was a massive geomagnetic solar storm in 1859. Apparently it wreaked havoc with the communications infrastructure of the time (telegraphs). Researchers have apparently been able to discover other similar events in earth’s history by analyzing ice cores from Greenland which indicate that events of this magnitude occur once every 500 years and smaller events typically occur multiple times/century.

Unclear to me what a solar storm of the magnitude of the Carrington Event would do to the world as we know it today, but we are much more dependent on electronic communications, radio, electronic power, etc. If such an event were to take out, 50% of our electro-magnetic infrastructure, such as frying power transformers, radio transceivers, magnetic storage/motors/turbines, etc. civilization as we know it would be brought back to the mid 1800’s but with a 21st century population.

This would last until we could rebuild all the lost infrastructure, at tremendous cost. During this time we would be dependent on animal-human-water power, paper-optical based communications/storage, and animal-wind transport.

It appears that any optical based communication/computer systems would remain intact but powering them would be problematic without working transformers and generators.

One article (couldn’t locate this) stated that the odds of another Carrington Event happening is 12%  by 2022. But the ice core research seems to indicate that it should be higher than this. By my reckoning, it’s been 155 years since the last event, which means we are ~1/3rd of the way through the next 500 years, so I would expect the probability of a similar event happening to be ~1/3 at this point and rising slightly every year until it happens again.

Superintelligence

I picked up a book called Superintelligence: Paths, Dangers, Strengths by Nick Bostrom last week and started reading it last night. It’s about the dangers of AI gaining the ability to improve itself and after that becoming not just equivalent to Human Level Intelligence (HMLI) but greatly exceeding HMLI at a super-HMLI level (Superintelligent). This means some Superintelligent entity that would have more intelligence than our current population of humans today, by many orders of magnitude.

Bostrom discusses the take off processes that would lead to Superintelligence and some of the ways we could hope to control it. But his belief is that trying to install any of these controls after it has reached HMLI would be fruitless.

I haven’t finished the book but what I have read so far, has certainly scared me.

Bostrom presents three scenarios for a Superintelligence take off: slow take off, fast take off and medium take off. He believes that in a slow take off scenario there may be many opportunities to control the emerging Superintelligence. In a moderate or medium take off, we would know that something is wrong but would have only some limited opportunity to control it. In the fast take off (literally 18months from HMLI to Superintelligence in one scenario Bostrom presents), the likelihood of controlling it after it starts are non-existent.

The later half of Bostrom’s book discusses potential control mechanisms and other ways to moderate the impacts of superintelligence.  So far I don’t see much hope for mankind in the controls he has proposed. But l am only half way through the book and hope to see more substantial mechanisms in the 2nd half.

In the end, any Superintelligence could substantially alter the resources of the world and the impact this would have on humanity is essentially unpredictable. But by looking at recent history, one can see how other species have faired as humanity has altered the resources of the earth. Humanity’s rise has led to massive species die offs, for any species that happened to lie in the way of human progress.

The first part of Bostrom’s book discusses some estimates as to when the world will reach AI with HMLI. Most experts believe that we will see HMLI like this with a 90% probability by the year 2075 and a 50% probability by the year 2050. As for the duration of take off to superintelligence ,the expert opinions are mixed and he believes that they highly underestimate the speed of take off.

Humanity’s risks

The search for extra-terristial intelligence has so far found nothing. One of the parameters for the odds of a successful search was the number of inhabitable planets in the universe. But the another parameter is the ability of a technological civilization to survive long enough to be noticed – the likelihood of a civilization to survive any existential risk that comes up.

Superintelligence and massive solar storms represent just two such risks but there are a multitude of others that can be identified today, and tomorrow’s technological advances will no doubt give rise to more.

Existential risks like these are ever-present and appear to be growing as our technolgical prowess grows. My only problem is that today the study of existential risks seem at best, ad hoc today and at worst, outright disregard.

I believe the best policy is to recognize known existential risks, have some intelligent debate on how probably they are and how we could potentially check them. There really needs to be some systematic study of existential risks around the world bringing academics and technologists together to understand and to mitigate them. The threats to humanity are real, we can continue to ignore them, study a few that gain human interest, or actively seek out and mitigate all of them we can.

Comments?

Photo Credit(s): C3-class Solar Flare Erupts on Sept. 8, 2010 [Detail] by NASA Goddard’s space flight center photo stream

Two dimensional magnetic recording (TDMR)

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)
A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

I attended a Rocky Mountain IEEE Magnetics Society meeting a couple of weeks ago where Jonathan Coker, HGST’s Chief Architect and an IEEE Magnetics Society Distinguished Lecturer was discussing HGST’s research into TDMR heads.

It seems that disk track density is getting so high, track pitch is becoming so small, that the magnetic read heads have become wider than the actual data track width.  Because of this, read heads are starting to pick up more inter-track noise and it’s getting more difficult to obtain a decent signal to noise ratio (SNR) off of a high-density disk platter with a single read head.

TDMR read heads can be used to counteract this extraneous noise by using multiple read heads per data track and as such, help to create a better signal to noise ratio during read back.

What are TDMR heads?

TDMR heads are any configuration of multiple read heads used in reading a single data track. There seemed to be two popular configurations of HGST’s TDMR heads:

  • In-series, where one head is directly behind another head. This provides double the signal for the same (relative) amount of random (electronic) noise.
  • In-parallel (side by side), where three heads were configured in-parallel across the data track and the two inter-track bands. That is, one head was configured directly over the data track with portions spanning the inter-track gap to each side, one head was half way across the data track and the next higher track, and a third head was placed half way across the data track and the next lower track.

At first, the in-series configuration seemed to make the most sense to me. You could conceivably average the two signals coming off the heads and be able to filter out the random noise.  However, the “random noise” seemed to be mostly coming from the inter-track zone and this wasn’t as much random electronics noise as random magnetic noise, coming off of the disk platter, between the data tracks.

In-parallel wins the SNR race

So, much of the discussion was on the in-parallel configuration. The researcher had a number of simulated magnetic recordings which were then read by simulated, in parallel, tripartite read heads.  The idea here was that the information read from each of the side band heads that included inter-track noise could be used as noise information to filter the middle head’s data track reading. In this way they could effectively increase the SNR across the three signals, and thus, get a better data signal from the data track.

Originally, TDMR was going to be the technology that was needed to get the disk industry to 100Tb/sqin. But, what they are finding at HGST and elsewhere, is even today, at “only” ~5Tb/sqin (HGST helium drives), there seems to be an increasing need to help reduce noise coming from read heads.

Disk density increase has been slowing lately but is still on a march to double density every 2 years or so. As such,  1TB platter today will be a 2TB platter in 2 years and a4TB platter in 4 years, etc. TDMR heads may be just the thing that gets the industry to that 4TB platter (20Tb/sqin) in 4 years.

The only problem is what’s going to get them to 100Tb/sqin now?

Comments?

 

SPC-2 performance results MBPS/drive – chart of the month

(SCISPC121029-005B) (c) 2013 Silverton Consulting, Inc. All Rights Reserved
(SCISPC121029-005B) (c) 2013 Silverton Consulting, Inc. All Rights Reserved

The above chart is from our October newsletter and is one of 5 charts we discussed in the Storage Performance Council benchmarks analysis.  There’s something intriguing about the above chart. Specifically, the band of results in numbers 2 through 10 range from a high of 45.7 to a low of 41.5 MBPS/drive.  The lone outlier is the SGI InfiniteStorage system which managed to achieve 67.7 MBPS/drive.

It turns out that the SGI system is actually a NetApp E5460 (from their LSI acquisition) with 60-146GB disk drives in a RAID 6 configuration.  Considering that the configuration ASU (storage capacity used during the test) was 7TB and the full capacity was 8TB, it seemed to use all the drives to the fullest extent possible.  The only other interesting tidbit about the SGI/NetApp system was the 16GB of system memory (which I assume was mostly used for caching).  Other than that it just seemed to be a screamer of a system from a throughput perspective.

Earlier this year I was at an analyst session with NetApp where they were discussing there thoughts on where E-series was going to focus on. One of the items was going to be high throughput intensive applications. From what we see here, they seem to have the right machine to go after this market.

The only storage to come close was an older Oracle J4200 series system which had no RAID protection, which we would not recommend for any data application.   Not sure what the IBM DS5300 series storage is OEMed from but it might be another older E-Series system.

A couple of caveats are in order for our MBPS/drive charts:

  • These are disk-only systems, any system using SSDs or FlashCache are excluded from this analysis
  • These systems all use 140GB disks or larger. (Some earlier SPC benchmarks used 36GB drives).

Also, please note the MBPS SPC-2 metric is a composite (average) of Video-on-demand, Large database query and Large file processing workload.

More information on SPC-2 performance as well as our SPC-1, SPC-2 and ESRP ChampionsCharts for block storage systems can be found in our SAN Storage Buying Guide available for purchase on our web site).

~~~~

The complete SPC-1 and SPC-2 performance report went out in SCI’s October newsletter.  But a copy of the report will be posted on our dispatches page sometime this month (if all goes well).  However, you can get the latest storage performance analysis now and subscribe to future free newsletters by just using the signup form above right.

As always, we welcome any suggestions or comments on how to improve our SPC  performance reports or any of our other storage performance analyses.


 

Super long term archive

Read an article this past week in Scientific American about a new fused silica glass storage device from Hitachi Ltd., announced last September. The new media is recorded with lasers burning dots which represent binary one or leaving spaces which represents binary 0 onto the media.

As can be seen in the photos above, the data can readily be read by microscope which makes it pretty easy for some future civilization to read the binary data. However, knowing how to decode the binary data into pictures, documents and text is another matter entirely.

We have discussed the format problem before in our Today’s data and the 1000 year archive as well as Digital Rosetta stone vs. 3D barcodes posts. And this new technology would complete with the currently available, M-disc long term achive-able, DVD technology from Millenniata which we have also talked about before.

Semi-perpetual storage archive!!

Hitachi tested the new fused silica glass storage media at 1000C for several hours which they say indicates that it can survive several 100 million years without degradation. At this level it can provide a 300 million year storage archive (M-disc only claims 1000 years).   They are calling their new storage device, “semi-perpetual” storage.  If 100s of millions of years is semi-perpetual, I gotta wonder what perpetual storage might look like.

At CD recording density, with higher densities possible

They were able to achieve CD levels of recording density with a four layer approach. This amounted to about 40Mb/sqin.  While DVD technology is on the order of 330Mb/sqin and BlueRay is ~15Gb/sqin, but neither of these technologies claim even a million year lifetime.   Also, there is the possibility of even more layers so the 40Mb/sqin could double or quadruple potentially.

But data formats change every few years nowadays

My problem with all this is the data format issue, we will need something like a digital rosetta stone for every data format ever conceived in order to make this a practical digital storage device.

Alternatively we could plan to use it more like an analogue storage device, with something like a black and white or grey scale like photographs of  information to be retained imprinted in the media.  That way, a simple microscope could be used to see the photo image.  I suppose color photographs could be implemented using different plates per color, similar to four color magazine production processing. Texts could be handled by just taking a black and white photo of a document and printing them in the media.

According to a post I read about the size of the collection at the Library of Congress, they currently have about 3PB of digital data in their collections which in 650MB CD chunks would be about 4.6M CDs.  So if there is an intent to copy this data onto the new semi-perpetual storage media for the year 300,002012 we probably ought to start now.

Another tidbit to add to the discussion at last months Hitachi Data Systems Influencers Summit, HDS was showing off some of their recent lab work and they had an optical jukebox on display that they claimed would be used for long term archive. I get the feeling that maybe they plan to commercialize this technology soon – stay tuned for more

 

~~~~

Image: Hitachi.com website (c) 2012 Hitachi, Ltd.,

Insecure SHA-1 imperils Internet security, PKI, and most password systems

safe 'n green by Robert S. Donovan (cc) (from flickr)
safe ‘n green by Robert S. Donovan (cc) (from flickr)

I suppose it’s inevitable but surprising nonetheless.  A recent article Faster computation will damage the Internet’s integrity in MIT Technology Review indicates that by 2018, SHA-1 will be crackable by any determined large  organization. Similarly, just a few years later,  perhaps by 2021 a much smaller organization will have the computational power to crack SHA-1 hash codes.

What’s a hash?

Cryptographic hash functions like SHA-1 are designed such that, when a string of characters is “hash”ed they generate a binary value which has a couple of great properties:

  • Irreversibility – given a text string and a “hash_value” generated by hashing “text_string”, there is no way to determine what the “text_string” was from its hash_value.
  • Uniqueness – given two or more text strings, “text_string1” and “text_string2” they should generate two unique hash values, “hash_value1” and “hash_value2”.

Although hash functions are designed to be irreversible that doesn’t mean that they couldn’t be broken via a brute force attack. For example, if one were to try every known text string, sooner or later one would come up with a “text_string1” that hashes to “hash_value1”.

But perhaps even more serious, the SHA-1 algorithm is prone to hash collisions  which makes fails the uniqueness property above.  That is, there are a few “text_string1″s that hash to the same “hash_value1”.

All this wouldn’t be much of a problem except that with Moore’s law in force and continuing for the next 6 years or so we will have processing power in chips capable of doing a brute force attack against SHA-1 to find text_strings that match any specific hash value.

So what’s the big deal?

Well it turns out that SHA-1 algorithms underpin almost all secure data transmissions today. That is, most Public-key infrastructure (PKI) depend on SHA-1 to sign digital certificates.  And although that’s pretty bad, what’s even worse is that Secure Socket Layer/Transport Layer Security (SSL/TLS) used by “https://” websites the world over also depend on SHA-1 to send key information used to encrypt/decrypt secure Internet transactions.

On top of all that, many of today’s secure systems with passwords, use SHA-1 to hash passwords and instead of storing actual passwords in plain-text on their password files, they only store the SHA-1 hash of the passwords.  As such, by 2021, anyone that can read the hashed password file can retrieve any password in plain text.

What all this means is that by 2018 for some and 2021 or thereabouts for just about anybody else, todays secure internet traffic, PKI and most system passwords will no longer be secure.

What needs to be done

It turns out that NSA knew about the failings of SHA-1 quite awhile ago and as such, NIST released SHA-2 as a new hash algorithm and its functional replacement.  Probably just in time, this month, NIST announced a winner for a new SHA-3 algorithm as a functional replacement for SHA-2.

This may take awhile, what needs to be done is to have all digital certificates that use SHA-1, be invalidated with new ones generated using SHA-2 or SHA-3.  And of course, TLS and SSL Internet functionality all have to be re-coded to recognize and use SHA-2 or SHA-3, instead of SHA-1.

Finally, for most of those password systems, users will need to re-login and have their password hashes changed over from SHA-1 to SHA-2 or SHA-3.

Naturally, in order to use SHA-2 or SHA-3 many systems may need to be upgraded to later levels of code.  Seems like Y2K all over again, only this time it’s security that’s going to crash.  It’s good to be in the consulting business, again.

~~~~

But the real problem IMHO, is Moore’s law.  If it continues to double processing power/transistor density every two years or so, how long before SHA-2 or SHA-3 succumb to same sorts of brute force attacks?  Given that, we appear destined to change hashing, encryption and other security algorithms every decade or so until Moore’s law slows down or god forbid, stops altogether.

Comments?

 

Million year optical disk

Read an article the other day about scientists creating an optical disk that would be readable in a million years or so. The article in Science Mag titled A million – year hard disk was intended to warn people about potential dangers in the way future that were being created today.

A while back I wrote about a 1000 year archive which was predominantly about disappearing formats. At the time, I believed given the growth in data density that information could easily be copied and saved over time but the formats for that data would be long gone by the time someone tried to read it.

The million year optical disk eliminates the format problem by using pixelated images etched on media. Which works just dandy if you happen to have a microscope handy.

Why would you need a million year disk

The problem is how do you warn people in the far future not to mess with radioactive waste deposits buried below. If the waste is radioactive for a million years, you need something around to tell people to keep away from it.

Stone markers last for a few thousand years at best but get overgrown and wear down in time. For instance, my grandmother’s tombstone in Northern Italy has already been worn down so much that it’s almost unreadable. And that’s not even 80 yrs old yet.

But a sapphire hard disk that could easily be read with any serviceable microscope might do the job.

How to create a million year disk

This new disk is similar to the old StorageTek 100K year optical tape. Both would depend on microscopic impressions, something like bits physically marked on media.

For the optical disk the bits are created by etching a sapphire platter with platinum. Apparently the prototype costs €25K but they’re hoping the prices go down with production.

There are actually two 20cm (7.9in) wide disks that are molecularly fused together and each disk can store 40K miniaturized pages that can hold text or images. They are doing accelerated life testing on the sapphire disks by bathing them in acid to insure a 10M year life for the media and message.

Presumably the images are grey tone (or in this case platinum tone). If I assume 100Kbytes per page that’s about 4GB, something around a single layer DVD disk in a much larger form factor.

Why sapphire

It appears that sapphire is available from industrial processes and it seems impervious to wear that harms other material. But that’s what they are trying to prove.

Unclear why the decided to “molecularly” fuse two platters together. It seems to me this could easily be a weak link in the technology over the course of dozen millennia or so. On the other hand, more storage is always a good thing.

~~~~

In the end, creating dangers today that last millions of years requires some serious thought about how to warn future generations.

Image: Clock of the Long Now by Arenamontanus