April 2017 – Silverton Consulting

Quantum computing at our doorsteps

Posted on April 30, 2017 by Ray in Processing performance, Strategic Inflection Points, Visionary leadershp

Read an article the other day in MIT’s Technical Review, Google’s new chip is a stepping stone to quantum computing… about Google’s latest endeavor to create quantum computers. Although, digital logic or classical electronic computation has been around since mid last century, quantum logic does things differently and there are many problems that are easier to compute with quantum computing that take much longer to solve with digital computing.

Qubits are weird

Classical or digital electronic computation follows the more physical mechanistic view of the world (for the most part) and quantum computing follows the quantum mechanical view of the world. Quantum computing uses quantum bits or Qubits and the device that Google demonstrated has a 2X3 matrix of qubits, 6 in total.

Unlike a bit, which (theoretically)is a two state system that can only take on the values of 0 and 1, a qubit is a two level system but it can take on an infinitely many number of different states in reality. In practice, with a qubit, there are always two states that are distinguishable from one another but they can be any two states of the infinitely many states they can take on.

Also, reading out the state value of a qubit can be a probabilistic endeavor and can impact the “value” of the qubit that is read out afterwards.

There’s more to quantum computing and I am certainly no expert. So if your interested, I suggest starting with this Arxiv article.

Faster quantum algorithms

In any case some difficult and time consuming arenas of classical computation seem to be easier and faster with quantum computation. For example,

Factoring large numbers – in classical computation this process takes an amount of time that is exponential to the number of bits in the “large number”, where “B” is number of bits and “E” epsilon is a constant >0, the best current algorithms take O([1+E]**B) time. But Shor’s quantum factorization algorithm takes only O(B**3) time, which is considerably faster for large numbers. This is important because RSA cryptography and most key exchange algorithms in use today, base their security on the difficulty of factoring large numbers. (See Wikipedia article on Integer Factorization for more information.
Searching an unstructured list – in classical computation for a list of N items, it takes on the O(N). But Grover’s quantum search algorithm only takes O(sort[N]) which is considerably faster for large lists. (See Arxiv paper for more information.)

Using the Shor factorization algorithm, they were able to factor the number 15 with 7 qubits.

There are many quantum algorithms available today (see the Quantum Algorithm Zoo at NIST) with more showing up all the time. Suffice it to say that quantum computing will be a more time efficient and thus, more effective approach to certain problems than classical computing.

Quantum computers starting to scale

Now back to the chip. According to the article the new Googl chip implements a 2X3 matrix of qubits.

For those old enough to remember, this was called an Octal or 3-bit number, ranging from 0 to 7, and two octals can range from 0..64. Octals were used for a long time to represent digital information for some (mostly mini-computers) computers. This is in contrast to most computing nowadays ,which uses Hexadecimal numbers or 4-bit numbers ranging from 0..15, and with two hexadecimal numbers ranging from 0..255.

Why are octals important? Well if quantum computing can scale up multiple octal numbers, then they can start representing really large numbers. According to the article Google chose 2X3 qubit structure because it’s more easy to scale.

I assume all the piping surrounding the chip package in the above photo are cooling ports. It seems that quantum computing only works at very cold temperatures. And if this is a two octals computer, scaling these up to multiple octals is going to take lots of space.

How quickly will it scale?

For some history, Intel introduced their 4004 (4-bit) computing chip in 1971 (Wikipedia), their 8-bit Intel 8008 in 1972 (Wikipedia), their 16-bit Intel 8086 between 1976-78. So in 7 years we went from a 4-bit computer to a 16 bit computer whose (x86) architecture continues on today and rules the world.

Now the Intel 4004 had 16 4-bit registers, had a data/instruction bus that could address 4096 4-bit words, 3-level subroutine stack and was a full fledged 4 bit computer. It’s unclear what’s in Google’s chip. But if we consider that this 2×3-qubit computer, which has multiple 2×3 qubit registers, a qubit storage bus, multi-level qubit subroutine (register) stack, etc. Then we are well on our way to quantum computing being added to the worlds computational capabilities in less than 10 years.

And of course, Googles not the only large organization working on quantum computing.

~~~~

So there you have it, Google and others are in the process of making your cryptography obsolete, rapidly speeding up unstructured searching and doing multiple other computations lots faster than today.

Photo Credit(s): from the MIT Technical Review article.

Crowdsourced vision for visually impaired

Posted on April 22, 2017 by Ray in Crowdsourcing, Distributed computing, System effectiveness, Visionary leadershp

Read an article the other day in Christian Science Monitor (CSM) on the Be My Eyes App. The app is from BeMyEyes.com and is available for the iPhone and Android smart phones.

Essentially there are two groups of people that use the app:

Visually helpful volunteers – these people signup for the app and when a visually impaired person needs help they provide visual aid by speaking to the person on the other end.
Visually impaired individuals – these people signup for the app and when they are having problems understanding what they are (or are not) looking at they can turn on their camera take video with their phone and it will be sent to a volunteer, they can then ask the volunteer for help in deciding what they are looking at.

So, the visually impaired ask questions about the scenes they are shooting with their phone camera and volunteers will provide an answer.

It’s easy to register as Sighted and I assume Blind. I downloaded the app, registered and tried a test call in minutes. You have to enable notifications, microphone access and camera access on your phone to use the app. The camera access is required to display the scene/video on your phone.

According to the app there are 492K sighted individuals, 34.1K blind individuals and they have been helped 214K times.

Sounds like an easy way to help the world.

There was no requests to identify a language to use, so it may only work for English speakers. And there was no way to disable/enable it for a period of time when you don’t want to be disturbed. But maybe you would just close the app.

But other than that it was simple to use and seemed effective.

Now if there were only an app that would provide the same service for the hearing impaired to supply captions or a “filtered” audio feed to ear buds.

The world need more apps like this…

Comments

There’s a new cluster filesystem on the block, Elastifile

Posted on April 14, 2017April 14, 2017 by Ray in Cloud storage, Clustered storage, Distributed computing, Ethernet, File Storage, NVMe, NVMe storage, SSD storage

At SFD12 last month we talked with the team from Elastifile. They are a new startup out of Israel working on a better cluster file system.

Elastifile was designed to support 1000s of nodes, 100,000 of users/client and 1000s of data containers (file systems/mount points), together with an infinite (64 bit) number of files and directories and up to Exabytes (10**18) in capacity. They also offer a 100% SSD file store capability. I encourage you to view the videos of their presentations at SFD12 to learn more.

Elastifile features

Elastifile supports data compression and optionally deduplication with NAND/Flash (e. g., low-/high-endurance) storage tiering, cloud storage tiering and multi-site storage. They also provide NFSv3/v4, SMB, AWS S3 and HDFS as native access protocols for their file storage.

They also offer non-disruptive hardware/software upgrades, n-way (2- or 3-way) data and metadata redundancy, self-healing capabilities, snapshots, and synchronous/asynchronous data replication or mirroring. Further, they provide multi-tenancy and QoS support.

Elastifile can be used in hyper converged mode as well as a dedicated storage server mode. For backend storage, they support heterogeneous, physical (block, I think?) storage systems as well as direct access storage in cluster nodes

Internals matter

Elastifile’s architecture supports accessor, owner and data nodes. But these can all be colocated on the same server or segregated across different servers.

Owner nodes, own all the metadata objects for a file or directory and caches the metadata working set in i’s memory. Ownership file or directory metadata may change in the case of hardware failures.

Elastifile supports a dynamic write data path, which means they determine, in real time, where to write file data rather than having the data locations identified before hand. They call this distributed write anywhere semantics.

Notably they don’t do data caching (with NVMe it doesn’t make sense) however, as noted above, they do use metadata caching

Internally, Elastifile uses variable length objects for both file data and metadata.

File data is composed of three object types: a file metadata (FileMD) object, mapping data objects, and file data objects. FileMD’s hold the normal file metadata (name, file size, create, access & modify ToDs, etc.) as well as pointing to all the Mapping Object (OIDs). Mapping objects exist for each 0.5MB of file data and consist of a 128 element table, each element mapping 4KB of file address space to a data object (OID). Each data object holds the 4KB of compressed file data and journal log entries.
Director metadata is composed of directory metadata (DirMD) object and Directory listing objects. Directory listing objects maps file/directory names to FileMD or DirMD OIDs. Directory listing objects are accessed via an extensible hash table and contain a list of filenames/directory names within the directory

The Elastifile software architecture consists of three layers:

A protocol layer which terminates file system access protocols and translates requests into internal requests. The hashing and data compression of file data occur at this level.
A metadata layer which provides file system/directory name mapping to objects for owned files/directories and maintains file/directory metadata updates/journals/checkpoints.
A data layer which provides transaction consistency and a n-way redundant persistent data storage for (file or metadata) objects.

Metadata operations are persisted via journaled transactions and which are distributed across the cluster. For instance the journal entries for a mapping data object updates are written to the same file data object (OID) as the actual file data, the 4KB compressed data object.

There’s plenty of discussion on how they manage consistency for their metadata across cluster nodes. Elastifile invented and use Bizur, a key-value consensus based DB. Their chief architect Ezra Hoch (@EzraHoch) did a blog post and paper on Bizur for more information

~~~~

New file systems generally take many years to mature and get out into the market, cluster file systems even longer. Elastifile started in 2013, by some very smart engineers, is already on the market, just 4 years later. That’s impressive enough, but with their list of advanced functionality plus cloud storage tiering and multi-site operations all shipping in the current product is mind-blowing.

One lingering question is, does a market exist for another cluster file system? All flash is interesting but most of the current CFS’s do this and ship this today. Cloud storage tiering is interesting and a long term need but some CFSs already have this and others are no doubt implementing it as we speak. CFS’s use of objects for internal data and metadata management is not new and may make internals cleaner but don’t really provide a lot of customer benefit.

Exascale raw capacity, support for 100K users, 1000s of nodes, 1000s of file systems and an infinite # of files/directories is interesting. But most CFSs claim this level of support already, although this is more aspirational for some. And proving support at this scale is difficult, if not impossible.

On the other hand, Bizur is really neat. Its primary benefit is during recovery from hardware failures. For a CFS with 1000s of nodes, failures likely occur quite often. So Bizur’s advantage here may pay significant customer dividends.

Is that enough to to market a new CFS?

To see what other SFD12 bloggers have written on Elastifile, please see:

Andrew Mauro’s (@Andrew_Mauro) post, Elastifile launches cross-cloud data fabric
Adam Bergh’s (@AJBergh) post recap of SFD12 – day 1
Chin-Fat Heoh’s (@StorageGaga) post The engineering of Elastifile

AI’s Image recognition success feeds sound recognition improvements

Posted on April 5, 2017 by Ray in Artificial Intelligence, Data annotation, System effectiveness

I must do reCAPTCHA at least a dozen times a week for various websites I use. It’s become a real pain. And the fact that I know that what I am doing is helping some AI image recognition program do a better job of identifying street signs, mountains, or shop fronts doesn’t reduce my angst.

But that’s the thing with deep learning, machine learning, re-inforcement learning, etc. they all need massive amounts of annotated data that’s a correct interpretation of a scene in order to train properly.

Computers to the rescue

So, when I read a recent article in MIT News that Computers learn to recognize sounds by watching video, I was intrigued. What the researchers at MIT have done is use advanced image recognition to annotate film clips with the names of things that are making sounds on the film. They then fed this automatically annotated data into a sound identifying algorithm to improve its recognition capability.

They used this approach to train their sound recognition system to be able to identify natural and artificial sounds like bird song, speaking in crowds, traffic sounds, etc.

They tested their newly automatically trained sound recognition against standard labeled sound sets and was able to categorize sound with a 92% accuracy for a 10 category data set and with a 74% accuracy with a 50 category dataset. Humans are able categorize these sounds with a 96% and 81% accuracy, respectively.

AI’s need for annotation

The problem with machine learning is that it needs a massive, properly annotated data set in order to learn properly. But getting annotated data takes too long or is too expensive to do for many things that we want AI for.

Using one AI tool to annotate data to train another AI tool is sort of bootstrapping AI technology. It’s acute trick but may have only limited application. I could only think of only a few more applications of similar technology:

Use chest strap or EKG technology to annotate audio clips of heart beat sounds at a wrist or other appendage to train a system to accurately determine pulse rates through sound alone.
Use wave monitoring technology to annotate pictures and audio clips of sea waves to train a system to accurately determine wave levels for better tsunami detection.
Use image recognition to annotate pictures of food and then use this train a system to recognize food smells (if they ever find a way to record smells).

But there may be many others. Just further refinement of what they have used could lead to finer grained people detection. For example, as (facial) image recognition gets better, it’s possible to annotate speaking film clips to train a sound recognition system to identify people from just hearing their speech. Intelligence applications for such technology are significant.

Nonetheless, I for one am happy that the next reCAPTCHA won’t be having me identify river sounds in a matrix of 9 sound clips.

But I fear there’s enough GreyBeards on Storage podcast recordings and Storage Field Day video clips already available to train a system to identify Ray’s and for sure, Howard’s voice anywhere on the planet…

Comments?

Photo Credit(s): Wave by Matthew Potter; Waves crashing on Puget Sound by mikeskatie; Day 16: Podcasting by Laura Blankenship