A steampunk Venusian rover

Read an article last week in theEngineer on “Designing a mechanical rover to explore … Venus“, on a group at JPL, led by Jonathon Sauder who are working on a mechanical rover to study Venus.

Venus has a temperature of ~470c, hot enough to melt lead, which will fry most electronics in seconds. Moreover, the Venusian surface is under a lot of pressure, roughly equivalent to a mile under water or ~160X the air pressure at Earth’s surface (from NASA Venus in depth). Extreme conditions for any rover.

Going mobile

Sauder and his team were brainstorming mechanical rovers, that operated similar to Theo Jansen’s StrandBeest which walks using wind energy alone. (Checkout the video of the BEEST walking).

Jansen had told Sauder’s team that his devices work much better on smooth surfaces and that uneven, beach like surfaces presented problems.

So, Sauder’s team started looking at using something with tracks instead of legs/feet, sort of like a World War 1 tank. That could operate upside down as well as rightside up.

Rather than sails (as the StrandBeest), they plan to use multiple vertical axis wind turbines, called Sarvonius rotors, located inside the tank to create energy and store that energy in springs for future use.

Getting data

They’re not planning to ditch electronics all together but need to minimize the rovers reliance on electronics.

There are some electronics that can operate at 450C based on silicon carbide and gallium carbide which have a very low level of integration at this time, just a 100 transistors per chip.  And they could use this to add electronic processing and control to their mechanical rover.

Solar panels can supply electricity to the high temperature electronics and can operate at 450C.

But to get information off the rover and back to the Earth, they plan to use a highly radio reflective spot on the rover and a mechanical shutter mechanism. The mechanism can be closed and opened and together with an orbiting satellite generating radio pulses and recording the rover’s reflectivity or not, send Morse code from rover to satellite. The orbiting satellite could record this information and then transmit it to Earth.

The rover will make use of simple chemical reactions to measure soil, rock and atmospheric chemistry. Soil and rocks suitable for analysis can be scooped up, drilled out and moved to the analysis chamber(s) via mechanical devices. Wind speed and direction can be sensed with simple mechanical devices.

In order to avoid obstacles wihile roving around the planet, they  plan to use a mechanical probe out othe front (and back?) of the rover with control systems attached to this to avoid obstacles. This way the rover can move around more of the planets surface.

Such a mechanical rover with high temperature electronics might also be suitable for other worlds in the solar system, Mercury for sure but moons of the Jovian planets, also have extreme pressure environments.

And such a electrical-mechanical rover also might work great to probe volcano’s on earth, although the temperatures are 700 to 1200C, ~2 to 3X Venus. Maybe such a rover could be used in highly radioactive environments to record information and send this back to personnel outside the environment or even effect some preprogrammed repairs. Ocean vents could also be another potential place where such a rover might work well.

Possible improvements

Mechanical probes would need to be moved vertically and swing horizontally to be effective and would necessarily have to poke outside the tanks envelope to read obstacles ahead.

Sonar could work better. Sounds or clicks could be produced mechanically and their reflections could be also received mechanically (a mic is just a mechanical transducer). At the pressures on Venus, sound should travel far.

Morse code was designed to efficiently send alpha-numerics and not much else. It would seem that another codec could be designed to send scientific information faster. And if one mechanical spot is good, multiple spots would be better assuming the satellite could detect multiple radio reflective spots located in close proximity to one another on the rover.

Radio works but why not use infrared. If there were some way to read an infrared signal from the probe, it could present more information per pass.

For instance, an infrared photo of the rover’s bottom or top, using with a flat surface, could encode information in cold and hot spots located across that surface.

This could work at whatever infrared resolution available from the satellite orbiting overhead and would send much more information per orbital pass.

In fact, such an infrared surface readout might allow the rover to send B&W pictures up to the satellite. Sonar could provide a mechanism to record a (sound) picture of the environment being scanned. The infrared information could be encoded across the surface via pipes of cool and hot liquids, sort of like core memory of old.

What about steam power. With 450C there ought to be more than enough heat to boil some liquid and have it cool via expansion. Having cool liquid could be used to cool electronics, chemical and solar devices.  And as the high temperatures on Venus seem constant, steam power and liquid cooling would be available all the time and eliminating any need for springs to hold energy.

And the cooling liquid from steam engines could be used to support an infrared signaling mechanism.

Still not sure why we need any electronics. A suitably configured, shrunken, analytical engine could provide the rudimentary information processing necessary to work the shutter or other transmitter mechanisms, initiate, readout and store mechanical/chemical/sonar sensors and control the other items on the rover.

And with a suitably complex analytical engine there might be some way to mechanically program it with various modes using something like punched tape or cards. Such a device could be used to hold and load information for separate programs in minimal space and could also be used to store information for later transmission, supplying a 100% mechanical storage device.

Going 100% mechanical could also lead to a potentially longer lived rover than something using some electronics and mostly mechanical devices on a planet like Venus. Mechanical devices can fail, but their failure modes are normally less catastrophic, well understood. Perhaps with sufficient mechanical redundancy and concern for tribology, such a 100% mechanical rover could last an awful long time, without any maintenance, e.g., like swiss watches.

Comments?

Photo Credit(s): World War One tank – mark 1 by Photos of the Past

Vintage Philmor morse code practice … by Joe Haupt

Accompanied by an instructor… by vy pham;

Core memory more detail by Kenneth Moore;

Model of the Analytical Engine By Bruno Barral (ByB), CC BY-SA 2.5;

Punched tape by Rositslav Lisovy

Steam locomotives by Jim Phillips

Know Fortran, optimize NASA code, make money

Read a number of articles this past week about NASA offering a Fortran optimization contest, the High Performance Fast Computing Contest (HPFCC) for their computational fluid dynamics (CFD) program. They want to speed up CFD by 10X to 1000X and are willing to pay for it.

The contest is being run through HeroX and TopCoder and they are offering $55K, across the various levels of the contests to the winners.

The FUN3D CVD code (manual) runs on NASA’s Pliedes Linux supercomputer complex which sports over 245K cores. Even when running on the supercomputer complex, a typical CVD FUN3D run takes thousands to millions of core hours!

The program(s)

FUN3D does a hypersonic fluid analysis over a (fixed) surface which includes a “simulation of mixtures of thermally perfect gases in thermo-chemical equilibrium and non-equilibrium. The routines in PHYSICS_DEPS enable coupling of the new gas modules to the existing FUN3D infrastructure. These algorithms also address challenges in simulation of shocks and boundary layers on tetrahedral grids in hypersonic flows.”

Not sure what all that means but I am certain there’s a number of iterations on multiple Fortran modules, and it does this over a 3D grid of points, which corresponds to both the surface being modeled and the gas mixture, it’s running through at hypersonic speeds. Sounds easy enough.

The contest(s)

There are two levels to the contest: an Ideation phase (at HeroX) and an architectural phase (at TopCoder). The $55000 is split up between the HeroX ideation phase which rewards a total of $20K: $10K for winner and 2-$5K runner up prizes and the TopCoder architectural phase which rewards a total of $35K: $15K for winner and $10K for 2nd place and another $10K for “Qualified improvement candidate”.

The (HeroX) Ideation phase looks for specific new or faster algorithms that could replace current ones in FUN3D which include “exploiting algorithmic developments in such areas as grid adaptation, higher-order methods and efficient solution techniques for high performance computing hardware.”

The (TopCoder) Architecture phase looks at specifically speeding up actual FUN3D code execution. “Ideal submission(s) may include algorithm optimization of the existing code base,  Inter-node dispatch optimization or a combination of the two.  Unlike the Ideation challenge, which is highly strategic, this challenge focuses on measurable improvements of the existing FUN3d suite and is highly tactical.”

Sounds to me that the ideation phase is selecting algorithm designs and the architecture phase is running the new algorithms or just in general speeding up the FUN3D code execution.

The equation(s)

There’s a Navier-Stokes equation algorithm that get’s called maybe a trillion times until the flow settles down, during a run and any minor improvement here would be obviously significant. Perhaps there are algorithmic changes that can be used, if your an aeronautical engineer or perhaps there are compiler speedups that can be found, if your a fortran expert. Both approaches can be validated/debugged/proved out on a desktop computer.

You have to be a US citizen to access the code and you can apply here. You will receive an email to verify your email address and then once your validated and back on the website, you need to approve the software use agreement. NASA will verify your physical address by sending a letter to you with a passcode to use to finally access the code. The process may take weeks to complete, so if your interested in the contest, best to start now.

The Fortran(s)

I learned Fortran 66 a long time ago and may have dabbled with Fortran 77 but that’s the last touched fortran. But it’s like riding a bike, once you do it, it’s easy to do it again.

As I understand it the FUN3D uses Fortran 2003 and NASA suggests you use the Gnu Fortran GFortran compiler as the Intel one has some bugs in it. There appears to be a Fortran 2015 but it’s not in main use just yet.

A million core hours, just amazing. If you could save a millisecond out of the routine called a trillion times, you’d save 1 billion seconds, or ~280K core hours.

Coders start your engines…

 

Quantum computing at our doorsteps

Read an article the other day in MIT’s Technical Review, Google’s new chip is a stepping stone to quantum computing… about Google’s latest endeavor to create quantum computers. Although, digital logic or classical electronic computation has been around since mid last century, quantum logic does things differently and there are many problems that are easier to compute with quantum computing that take much longer to solve with digital computing.

Qubits are weird

Classical or digital electronic computation follows the more physical mechanistic view of the world (for the most part) and quantum computing follows the quantum mechanical view of the world. Quantum computing uses quantum bits or Qubits and the device that Google demonstrated has a 2X3 matrix of qubits, 6 in total.

Unlike a bit, which (theoretically)is a two state system that can only take on the values of 0 and 1, a qubit is a two level system but it can take on an infinitely many number of different states in reality. In practice, with a qubit, there are always two states that are distinguishable from one another but they can be any two states of the infinitely many states they can take on.

Also, reading out the state value of a qubit can be a probabilistic endeavor and can impact the “value” of the qubit that is read out afterwards.

There’s more to quantum computing and I am certainly no expert. So if your interested, I suggest starting with this Arxiv article.

Faster quantum algorithms

In any case some difficult and time consuming arenas of classical computation seem to be easier and faster with quantum computation. For example,

  • Factoring large numbers – in classical computation this process takes an amount of time that is exponential to the number of bits in the “large number”, where “B” is number of bits and “E” epsilon is a constant >0, the best current algorithms take O([1+E]**B) time. But Shor’s quantum factorization algorithm takes only O(B**3) time, which is considerably faster for large numbers. This is important because RSA cryptography and most key exchange algorithms in use today, base their security on the difficulty of factoring large numbers. (See Wikipedia article on Integer Factorization for more information.
  • Searching an unstructured list – in classical computation for a list of N items, it takes on the O(N). But Grover’s quantum search algorithm only takes O(sort[N]) which is considerably faster for large lists. (See Arxiv paper for more information.)

Using the Shor factorization algorithm, they were able to factor the number 15 with 7 qubits.

There are many quantum algorithms available today (see the Quantum Algorithm Zoo at NIST) with more showing up all the time.  Suffice it to say that quantum computing will be a more time efficient and thus, more effective approach to certain problems than classical computing.

Quantum computers starting to scale

Now back to the chip. According to the article the new Googl chip implements a 2X3 matrix of qubits.

For those old enough to remember, this was called an Octal or 3-bit number, ranging from 0 to 7, and two octals can range from 0..64. Octals were used for a long time to represent digital information for some (mostly mini-computers) computers. This is in contrast to most computing nowadays ,which uses Hexadecimal numbers or 4-bit numbers ranging from 0..15, and with two hexadecimal numbers ranging from 0..255.

Why are octals important? Well if quantum computing can scale up multiple octal numbers, then they can start representing really large numbers. According to the article Google chose 2X3 qubit structure because it’s more easy to scale.

I assume all the piping surrounding the chip package in the above photo are cooling ports. It seems that quantum computing only works at very cold temperatures. And if this is a two octals computer, scaling these up to multiple octals is going to take lots of space.

How quickly will it scale?

For some history, Intel introduced their 4004 (4-bit) computing chip in 1971 (Wikipedia), their 8-bit Intel 8008 in 1972 (Wikipedia), their 16-bit Intel 8086 between 1976-78. So in 7 years we went from a 4-bit computer to a 16 bit computer whose (x86) architecture continues on today and rules the world.

Now the Intel 4004 had 16 4-bit registers, had a data/instruction bus that could address 4096 4-bit words, 3-level subroutine stack and was a full fledged 4 bit computer. It’s unclear what’s in Google’s chip. But if we consider that this 2×3-qubit computer, which has multiple 2×3 qubit registers, a qubit storage bus, multi-level qubit subroutine (register) stack, etc. Then we are well on our way to quantum computing being added to the worlds computational capabilities in less than 10 years.

And of course, Googles not the only large organization working on quantum computing.

~~~~

So there you have it, Google and others are in the process of making your cryptography obsolete, rapidly speeding up unstructured searching and doing multiple other computations lots faster than today.

Photo Credit(s): from the MIT Technical Review article.

 

Testing filesystems for CPU core scalability

IMG_6536I attended HotStorage’16 and Usenix ATC’16 conferences this past week and there was a paper presented at ATC titled “Understanding Manicure Scalability of File Systems” (see p. 71 in PDF) by Changwoo Min and others at Georgia Institute of Technology. This team of researchers set out to understand the bottlenecks in a typical file systems as they scaled from 1 to 80 (or more) CPU cores on the same server.

FxMark, a new scalability benchmark

They created a new benchmark to probe CPU core scalability they called FxMark (source code available at FxMark), consisting of 19 “micro benchmarks” stressing specific scalability scenarios and three application level benchmarks, representing popular file system activities.

The application benchmarks in FxMark included: standard mail server (Exim), a NoSQL DB (RocksDB) and a standard user file server (DBENCH).

In the micro benchmarks, they stressed 7 different components of files systems: 1) path name resolution; 2) page cache for buffered IO; 3) node management; 4) disk block management; 5) file offset to disk block mapping; 6) directory management; and 7) consistency guarantee mechanism.
Continue reading “Testing filesystems for CPU core scalability”

TPU and hardware vs. software innovation (round 3)

tpu-2At Google IO conference this week, they revealed (see Google supercharges machine learning tasks …) that they had been designing and operating their own processor chips in order to optimize machine learning.

They called the new chip, a Tensor Processing Unit (TPU). According to Google, the TPU provides an order of magnitude more power efficient machine learning over what’s achievable via off the shelf GPU/CPUs. TensorFlow is Google’s open sourced machine learning  software.

This is very interesting, as Google and the rest of the hype-scale hive seem to have latched onto open sourced software and commodity hardware for all their innovation. This has led the industry to believe that hardware customization/innovation is dead and the only thing anyone needs is software developers. I believe this is incorrect and that hardware innovation combined with software innovation is a better way, (see Commodity hardware always loses and Better storage through hardware posts).
Continue reading “TPU and hardware vs. software innovation (round 3)”

Quasar, data center scheduling reboot

Two people talking to one another in a data center hallway about one person wide with bunches of racks and cabling on either side
Microsoft Bing Maps’ datacenter by Robert Scoble

Read an article today from ZDnet called Data center scheduling as easy as watching a movie. It was about research out of Stanford University that shows how using short glimpses of applications in operation can be used to optimally determine the best existing infrastructure to run it on (for more info, see “Quasar: Resource-Efficient and QoS-Aware Cluster Management”  by Christina Delimitrou and Christos Kozyrakis).

What with all the world’s compute moving to the cloud, the cloud providers are starting to see poor CPU utilization. E.g., AWS’s EC2 average server utilization is typically between 3 and 17%, Google’s is between 25-25% and Twitter’s is consistently below 20%, source: paper above. Such poor utilization at cloud scale causing them to lose a lot of money.

Most cloud organizations and larger companies these days have myriad of servers they have acquired over time. These servers often range from the latest multi-core behemoths, to older servers that have seen better days.

Nonetheless, as new applications come into the mix, it’s hard to know whether they need the latest servers or could get by just as well with some older equipment that happens to be lying around idle in the shop. Because of this inability to ascertain the best infrastructure to run them on, it often leads to over provisioning/under utilization that we see today.

A better way to manage clusters

This is the classic problem that is trying to be solved by cluster management. There are essentially two issues in cluster management for new applications:

  • What resources the application will need to run,
  • Which available servers can best satisfy the application’s resource requirements,

The  first issue is normally answered by the application developer/deployer which they get to specify. When they get this wrong the applications run on severs with more resources than needed which end up being lightly utilized.

But if there was a way to automate the first step in this process?

It turns out if you run a new application for a short time you can determine its execution characteristics. Then if you coluld search a database of applications currently running on your infrastructure you could match how the new application runs with how current applications run and determine a pseudo-optimum fit for the best place to run the new application.

Such a system would need to monitor current applications and determine its server resource usage, e.g., memory use, IO activity, CPU utilization, etc. in your shop. The system would need to construct and maintain a database of applications to server resource utilization. Also, somewhere you would need a database of current server resources in your cluster.

But if you have all that in place, it seems like you could have a solution to the classic cluster management problem presented above.

What about performance critical apps

There’s a class of applications that have stringent QoS requirements that go beyond optimal runtime execution characteristics (latency/throughput sensitive workloads). These applications must be run in environments that can guarantee their latency requirements can be met. This may not be the most optimal location from a cluster perspective but it may be the only place it can run and meet its service objectives.

So any cluster management optimization would also need to factor in such application QoS requirements into its decision matrix on where to run new applications.

Quasar cluster management

The researchers at Stanford have implemented the Quasar cluster management solution which does all that. Today it provides

  1. A way for users to specify application QoS requirements for those applications that require special services,
  2. It takes and runs new applications quickly to ascertain it’s resource requirements and quickly classify its characteristics against a database of currently running applications, and
  3. It allocates new applications to the optimal server configurations that are available.

 

The paper cited above shows results from using Quasar cluster management on hadoop clusters, memcached and Cassandra clusters,  HotCRP clusters as well as a cloud environment. For the cloud environment Quasar has shown that it can boost server utilization for a 200 node cloud environment running 1200 workloads up to 65%.

The paper goes into more detail and there’s more information on Quasar available on Christina Delimitrou’s website.

~~~

Comments?

IBM’s next generation, TrueNorth neuromorphic chip

Ok, I admit it, besides being a storage nut I also have an enduring interest in AI. And as the technology of more sophisticated neuromorphic chips starts to emerge it seems to me to herald a whole new class of AI capabilities coming online. I suppose it’s both a bit frightening as well as exciting which is why it interests me so.

IBM announced a new version of their neuromorphic chip line, called TrueNorth with +5B transistors and the equivalent of ~1M neurons. There were a number of articles on this yesterday but the one I found most interesting was in MIT Technical Review, IBM’s new brainlike chip processes data the way your brain does, (based on a Journal Science article requires login, A million spiking neuron integrated circuit with a scaleable communications network and interface).  We discussed an earlier generation of their SyNAPSE chip in a previous post (see my IBM research introduces SyNAPSE chip post).

How does TrueNorth compare to the previous chip?

The previous generation SyNAPSE chip had a multi-mode approach which used  65K “learning synapses” together with ~256K “programming synapses”. Their current generation, TrueNorth chip has 256M “configurable synapses” and 1M “programmable spiking neurons”.  So the current chip has quadrupled the previous chips “programmable synapses” and multiplied the “configurable synapses” by a factor of a 1000.

Not sure why the configurable synapses went up so high but it could be an aspect of connectivity, something akin to what happens to a “complete graph” which has a direct edge connection to every node in the graph. In a complete graph if you have N nodes then the number of edges is given as [N*(N-1)]/2, which for 1M nodes would be ~500M edges. So it must not be a complete graph, but it’s “close to complete” with 1/2 the number of edges.

Analog vs. Digital?

When last I talked with IBM on their earlier version chip I wondered why they used digital logic to create it rather than analog. They said to be able to better follow along the technology curve of normal chip electronics digital was the way to go.

It seemed to me at the time that if you really  wanted to simulate a brains neural processing then you would want to use an analog approach and this should use much less power. I wrote a couple of posts on the subject, one of which was on MIT’s analog neuromorphic chip (see my MIT builds analog neuromorphic chip post) and the other was on why analog made more sense than digital technology for neuromorphic computation (see my Analog neural simulation or Digital neuromorphic computing vs. AI post).

The funny thing is that IBM’s TrueNorth chip uses a lot less power (1000X, milliwatts vs watts) than normal CMOS chips in e use today. Not sure why this would be the case with digital logic but if this is true maybe there’s more of a potential to utilize these sorts of chips in wider applications beyond just traditional AI domains.

How do you program it?

I would really like to get a deeper look at the specs for TrueNorth and its programming model.  But there was a conference last year where IBM presented three technical papers on TrueNorth architecture and programming capabilities (see MIT Technical Report: IBM scientists show blueprints for brain like computing).

Apparently the 1M programming spike neurons are organized into blocks of 256 neurons each (with a prodigious amount of “configurable” synapses as well). These seem equivalent to what I would call a computational unit. One programs these blockss with “corelets” which map out the neural activity that the 256-neuron blocks can perform. Also these corelets “programs” can be linked together or one be subsumed within another sort of like subroutines.  IBM as of last year had a library of 150 corelets which do stuff like detect visual artifacts, motion in a visual image, detect color, etc.

Scale-out neuromorphic chips?

The abstract of the Journal Science paper talked specifically about a communications network interface that allows the TrueNorth chips to be “tiled in two dimensions” to some arbitrary size. So it is apparent that with the TrueNorth design, IBM has somehow extended a within chip block interface that allows corelets to call one another, to go off chip as well. With this capability they have created a scale-out model with the TrueNorth chip.

Unclear why they felt it had to go only two dimensional rather than three but, it seems to mimic the sort of cortex layer connections we have in our brains today. But even with only two dimensional scaling there are all sorts of interesting topologies that are possible.

There doesn’t appear to be any theoretical limit to the number of chips that can be connected in this fashion but I would suppose they would all need to be on a single board or at least “close” together because there’s some sort of time frame that couldn’t be exceeded for propagation delay, i.e., the time it takes for a spike to transverse from one chip to the farthest chip in the chain couldn’t exceed say 10msec. or so.

So how close are we to brain level computations?

In one of my previous post I reported Wikipedia stating that  a typical brain has 86B neurons with between 100M and 500M synapses. I was able to find the 86B number reference today but couldn’t find the 100M to 500M synapses quote again.  However, if these numbers are close to the truth, the ratio between human neurons and synapses is much less in a human brain than in the TrueNorth chip. And TrueNorth would need about 86,000 chips connected together to match the neuronal computation of a human brain.

I suppose the excess synapses in the TrueNorth chip is due to the fact that electronic connection have to be fixed in place for a neuron to neuron connection to exist. Whereas in the brain, we can always grow synapse connections as needed. Also, I read somewhere (can’t remember where) that a human brain at birth has a lot more synapse connections than an adult brain and that part of the learning process that goes on during early life is to trim excess synapses down to something that is more manageable or at least needed.

So to conclude, we (or at least IBM) seem to be making good strides in coming up with a neuromorphic computational model and physical hardware, but we are still six or seven generations away from a human brain’s capabilities (assuming a 1000 of these chips could be connected together into one “brain”).  If a neuromorphic chip generation takes ~2 years then we should be getting pretty close to human levels of computation by 2028 or so.

The Tech Review article said that the 5B transistors on TrueNorth are more transistors than any other chip that IBM has produced. So they seem to be at current technology capabilities with this chip design (which is probably proof that their selection of digital logic was a wise decision).

Let’s just hope it doesn’t take it 18 years of programming/education to attain college level understanding…

Comments?

Photo Credit(s): New 20x [view of mouse cortex] by Robert Cudmore

Vacuum tubes on silicon

Read an interesting article the other day about researchers at NASA having invented a vacuum tube on a chip (see ExtremeTech, Vacuum tube strikes back). Their report was based on an IEEE Spectrum article called Introducing the Vacuum Transistor.

Computers started out early in the last century being mechanical devices (card sorters), moved up to electronic sorters/calculators/computers with vacuum tubes and eventually transitioned to solid state devices with the silicon transistor. Since then the MOS and CMOS transister have pretty much ruled the world of electronic devices.

Vacuum tube?

Vacuum tubes had a number of problems not the least of which was power consumption, size and reliability. It was nothing for a vacuum tube to burn out every couple of times it was powered on and the ENIAC (panel pictured here) had over 17,000 of them, took over 200 sq meters of space, used a lot (150KW) of power and weighed (27 metric) tons.

Of course each vacuum tube was the equivalent of just one transistor and the latest generation Intel Quad Core processors have over 2B transistors in them. So to implement an Intel Quad Core processor with vacuum tubes this might take over 3,000 football fields of space and over 17GW for power/cooling.

There were plenty of niceties with vacuum tubes not the least of which was their nice ruler flat frequency response, ability to support much higher frequencies, significantly less prone to noise and had less problems with radiation than transistors.  This last item meant that vacuum tubes were less susceptible to electromagnetic pulses. Many modern musical/instrument amplifiers are still made today using vacuum tube technology due to their perceived better sound.

But the main problems was their size and power consumption. If you could only shrink a vacuum tube to the size of a MOS field effect transistor (FET) and correspondingly reduce its power consumption, then you would have something.

NASA shrinks the vacuum tube

NASA researchers have shrunk the vacuum tube to nanometer dimensions in a vacuum- channel transistor. They believe it can be fabricated on standard CMOS technology lines and that it can operate at 460GHz. 

This new vacuum-channel transistor marries the benefits of vacuum tubes to the fabrication advantages of MOSFET technology. Making them as small as MOSFET transistors eliminates all of the problems with vacuum tube technology and handily solves a serious problem or two with MOSFETs.

07OLVacuumtransistors-1403115198821

One problem with MOSFET technology today is that we can no longer speed it up any faster than a 4-5GHz.  This limit was reached in 2004 when Intel and others determined that clock speed couldn’t be sped up much more without serious problems resulting and as a result, they started using additional transistors to offer multi-core processor chips.  A lot of time and money is continuing to be spent on seeing how best to offer even more cores but in the end there’s only so much parallelism that can be achieved in most applications and this limits the speed ups that can be attained with multi-core architectures.

But a shrunken vacuum tube doesn’t seem to have the same issues with higher clock speeds.  Also, there is a serious reduction in power consumption that accrues along with reduction in size.

The vacuum in a vacuum tube was there to inhibit electrons from being interfered with by gases. With the vacuum-channel transistor they don’t think they need a vacuum anymore due to the reduction of size and power being used but there’s a little problem on how to creating a helium filled enclosure which they feel will work instead of a vacuum. NASA feels that with todays chip packaging this shouldn’t be a problem.

Also, their current prototypes use 10V but other researchers have reduced other vacuum-channel transistors to use only 1-2v. As of yet the NASA researchers haven’t fabricated their vacuum-channel transistors on a real CMOS line but that’s the next major hurdle.

Imagine a much faster IT

A 400GHz processor in your desktop and maybe a 200GHz processor in your phone/tablet could all be possible with vacuum-channel transistors. They would be so much faster than today’s multi-core systems, that it would be almost impossible to compare the two. Yes there are some apps where multi-core could speed things up considerably but something that’s 10X faster than todays processors would operate much faster than a 10 core CPU. And it still doesn’t mean you couldn’t have multi-core vacuum-channel systems as well.

SSD or NAND flash storage is essentially based on CMOS transistors and the speed of flash is a somewhat of a function of the speed of its transistors.  A 400GHz vacuum-channel transistor could speed up flash storage by an order of magnitude or more. Flash access times are already at the 7µsec level (see my posts on MCS and UltraDIMM storage here and here).  How much of that 7µsec access time is due to the memory channel aand how much is a function of the SanDisk SSD storage is an open question. But whatever portion is on the SSD side could be potentially reduced by a factor of 10 or more with the use of vacuum-channel transistors.

From a disk perspective there are myriad issues that effect how much data can be stored linearly on a disk platter. But one of them is the speed of switching of electromagnetic  (GMR) head and the electronics. Vacuum-channel transistors should be able to eliminate that issue at least in the electronics and maybe with some work in the head as well so disk densities would no longer have to worry about switching speeds. Similar issues apply to magnetic tape densities as well.

Unclear to me how faster switching time would impact network transmission speeds. But it seems apparent that optical transmission times have already reached some sort of limit based on light frequencies used for transmission. However, electronic networking transfer speeds may be able to be enhanced significantly with faster speed switching.

Naturally, WIFI and other forms of radio transmission are seriously impeded by the current frequency and power of electronic switching. That’s one of the reasons why radio stations still depend somewhat on vacuum tubes. However, with vacuum-channel transistors problems with switching speed go away.  Indeed, NASA researchers believe that their vacuum-channel transistors should be able to reach terahertz (1000GHz) transmission switching. Which might make WIFI almost faster than any direct connect networking today.

~~~~
Comments?

Photo Credit(s): ENIAC panel (rear) by Erik Pittit, The Vacuum Tube Transistor from IEEE Spectrum