Vacuum tubes on silicon

Read an interesting article the other day about researchers at NASA having invented a vacuum tube on a chip (see ExtremeTech, Vacuum tube strikes back). Their report was based on an IEEE Spectrum article called Introducing the Vacuum Transistor.

Computers started out early in the last century being mechanical devices (card sorters), moved up to electronic sorters/calculators/computers with vacuum tubes and eventually transitioned to solid state devices with the silicon transistor. Since then the MOS and CMOS transister have pretty much ruled the world of electronic devices.

Vacuum tube?

Vacuum tubes had a number of problems not the least of which was power consumption, size and reliability. It was nothing for a vacuum tube to burn out every couple of times it was powered on and the ENIAC (panel pictured here) had over 17,000 of them, took over 200 sq meters of space, used a lot (150KW) of power and weighed (27 metric) tons.

Of course each vacuum tube was the equivalent of just one transistor and the latest generation Intel Quad Core processors have over 2B transistors in them. So to implement an Intel Quad Core processor with vacuum tubes this might take over 3,000 football fields of space and over 17GW for power/cooling.

There were plenty of niceties with vacuum tubes not the least of which was their nice ruler flat frequency response, ability to support much higher frequencies, significantly less prone to noise and had less problems with radiation than transistors.  This last item meant that vacuum tubes were less susceptible to electromagnetic pulses. Many modern musical/instrument amplifiers are still made today using vacuum tube technology due to their perceived better sound.

But the main problems was their size and power consumption. If you could only shrink a vacuum tube to the size of a MOS field effect transistor (FET) and correspondingly reduce its power consumption, then you would have something.

NASA shrinks the vacuum tube

NASA researchers have shrunk the vacuum tube to nanometer dimensions in a vacuum- channel transistor. They believe it can be fabricated on standard CMOS technology lines and that it can operate at 460GHz. 

This new vacuum-channel transistor marries the benefits of vacuum tubes to the fabrication advantages of MOSFET technology. Making them as small as MOSFET transistors eliminates all of the problems with vacuum tube technology and handily solves a serious problem or two with MOSFETs.

07OLVacuumtransistors-1403115198821

One problem with MOSFET technology today is that we can no longer speed it up any faster than a 4-5GHz.  This limit was reached in 2004 when Intel and others determined that clock speed couldn’t be sped up much more without serious problems resulting and as a result, they started using additional transistors to offer multi-core processor chips.  A lot of time and money is continuing to be spent on seeing how best to offer even more cores but in the end there’s only so much parallelism that can be achieved in most applications and this limits the speed ups that can be attained with multi-core architectures.

But a shrunken vacuum tube doesn’t seem to have the same issues with higher clock speeds.  Also, there is a serious reduction in power consumption that accrues along with reduction in size.

The vacuum in a vacuum tube was there to inhibit electrons from being interfered with by gases. With the vacuum-channel transistor they don’t think they need a vacuum anymore due to the reduction of size and power being used but there’s a little problem on how to creating a helium filled enclosure which they feel will work instead of a vacuum. NASA feels that with todays chip packaging this shouldn’t be a problem.

Also, their current prototypes use 10V but other researchers have reduced other vacuum-channel transistors to use only 1-2v. As of yet the NASA researchers haven’t fabricated their vacuum-channel transistors on a real CMOS line but that’s the next major hurdle.

Imagine a much faster IT

A 400GHz processor in your desktop and maybe a 200GHz processor in your phone/tablet could all be possible with vacuum-channel transistors. They would be so much faster than today’s multi-core systems, that it would be almost impossible to compare the two. Yes there are some apps where multi-core could speed things up considerably but something that’s 10X faster than todays processors would operate much faster than a 10 core CPU. And it still doesn’t mean you couldn’t have multi-core vacuum-channel systems as well.

SSD or NAND flash storage is essentially based on CMOS transistors and the speed of flash is a somewhat of a function of the speed of its transistors.  A 400GHz vacuum-channel transistor could speed up flash storage by an order of magnitude or more. Flash access times are already at the 7µsec level (see my posts on MCS and UltraDIMM storage here and here).  How much of that 7µsec access time is due to the memory channel aand how much is a function of the SanDisk SSD storage is an open question. But whatever portion is on the SSD side could be potentially reduced by a factor of 10 or more with the use of vacuum-channel transistors.

From a disk perspective there are myriad issues that effect how much data can be stored linearly on a disk platter. But one of them is the speed of switching of electromagnetic  (GMR) head and the electronics. Vacuum-channel transistors should be able to eliminate that issue at least in the electronics and maybe with some work in the head as well so disk densities would no longer have to worry about switching speeds. Similar issues apply to magnetic tape densities as well.

Unclear to me how faster switching time would impact network transmission speeds. But it seems apparent that optical transmission times have already reached some sort of limit based on light frequencies used for transmission. However, electronic networking transfer speeds may be able to be enhanced significantly with faster speed switching.

Naturally, WIFI and other forms of radio transmission are seriously impeded by the current frequency and power of electronic switching. That’s one of the reasons why radio stations still depend somewhat on vacuum tubes. However, with vacuum-channel transistors problems with switching speed go away.  Indeed, NASA researchers believe that their vacuum-channel transistors should be able to reach terahertz (1000GHz) transmission switching. Which might make WIFI almost faster than any direct connect networking today.

~~~~
Comments?

Photo Credit(s): ENIAC panel (rear) by Erik Pittit, The Vacuum Tube Transistor from IEEE Spectrum

Thinly provisioned compute clouds

Thin provisioning has been around in storage since StorageTek’s Iceberg hit the enterprise market in 1995.  However, thin provisioning has never taken off for system servers or virtual machines (VMs).

But recently a paper out of MIT Making cloud computing more efficient discusses some recent research that came up with the idea of monitoring system activity to model and predict application performance.

So how does this enable thinly provision VMs?

With a model like this in place, one could concievably provide a thinly provisioned virtual server that could guarantee a QoS and still minimize resource consumption.  For example, have the application VM just consume the resources needed at any instant in time which could be adjusted as demands on the system change.  Thus, as an application  needs grew, more resources could be supplied and as needs shrink, resources could be given up for other uses.

With this sort of server QoS, certain classes of application VMs would need to have variable or no QoS to be sacrificed in times of need to those that required guaranteed QoS. But in a cloud service environment a multiplicity of service classes like these could be supplied at different price points.

Thin provisioning grew up in storage because it’s relatively straightforward for a storage subsystem to understand capacity demands at any instant in time.  A storage system only needs to monitor data write activity and if a data block was written or consumed then it would be backed by real storage. If it had never been written, then it was relatively easy to fabricate a block of zeros if it ever was read.

Prior to thinly provisioned storage, fat provisioning required that storage be configured to the maximum capacity required of it. Similarly, with fully (or fat) provisioned VMs, they must be configured for peak workloads. With the advent of thin provisioning on storage wasted resources (capacity in the case of storage) could be shared across multiple thinly provisioned volumes (LUNs) thereby freeing up these resources for other users.

Problems with server thin provisioning

I see some potential problems with the model and my assumptions as to how thinly provisioned VM would wore. First, the modeled performance is a lagging indicator at best.  Just as system transactions start to get slower, a hypervisor would need to interrupt the VM to add more physical (or virtual) resources.  Naturally during the interruption system performance would suffer.

It would be helpful if resources could be added to a VM dynamically, in real time without impacting the applications running in the VM. But it seems to me that adding physical or virtual CPU cores,  memory, bandwidth, etc., to a VM would require at least some sort of interruption to a pair of VMs [the one giving up the resource(s) and the one gaining the freed up resource(s)].

Similar issues occur for thinly provisioned storage. As storage is consumed for a thinly provisioned volume, allocating more physical capacity takes some amount of storage subsystem resources and time to accomplish.

How does the model work?

It appears that the software model works by predicting system performance based on a limited set of measurements. Indeed, their model is bi-modal. That is there are two approaches:

  • Black box model – tracks server or VM indictors such as “number and type of user requests” as well as system performance and uses AI to correlate the two. This works well for moderate fluctuations in demand but doesn’t help when requests for services falls beyond those boundaries.
  • Grey box model – is more sophisticated and is based on an understanding of a specific database functionality, such as how frequently they flush host buffers, commit transactions to disk logs, etc.  In this case, they are able to predict system performance when demand peaks at 4X to 400X current system requirements.

They have implemented the grey box model for MySQL and are in the process of doing the same for PostGres.

Model validation and availability

They tested their prediction algorithm against published TPC-C benchmark results and were able to come within 80% accuracy for CPU use and 99% accuracy for disk bandwidth consumption.

It appears that the team has released their code as open source. At least one database vendor, Teradata is porting it over to their own database machine to better allocate physical resources to data warehouse queries.

It seems to me that this would be a natural for cloud compute providers and even more important for hypervisor solutions such as vSphere, Hyper-V, etc.  Anyplace one could use more flexibility in assigning virtual or physical resources to an application or server would find use for this performance modeling.

~~~~

Now, if they could just do something to help create thinly provisioned highways, …

Image: Intel Team Inside Facebook Data Center By IntelFreePress

IBM boosts System z processing speed

At this week’s Hot Chips Conference Brian Curran, IBM Distinguished Engineer discussed their recently announced, new faster processing chip for System z mainframe environments that runs at 5.2Ghz.  (FYI, the first 31 minutes of the YouTube video link above are from Brian’s session and the first 10 minutes provides a good overview of the chip.)

Brian discussed System z environments which mainly run large mission critical applications such as OLTP, which use large instruction and data caches.  Also System Z is now being used for Linux consolidation with 1000s of Linux machines running on a mainframe.

The numbers

The new z196 processing core provides up to a 40% improvement executing mainframe applications.  Also, the new processor chip was measured at 50 Billion instructions per second (Bips).

In addition, the z196 achieved a remarkable 40% code thread constant improvement and another 20-30% throughput performance improvement was attainable through re-compilation.  Moreover, they have shown a sustained system execution throughput (multi-thread/multi-application) of 400 Bips.  All this was done without increasing energy consumption over current generation System z processing chips.

Cache everywhere and lots of it

The z196 chip is a 45nm 1.4B transistor, quad core processor with two onboard, special purpose co-processors for cryptographic and compression acceleration. The z196 processing chip has 64KB L1 private I-cache (instruction) and 128KB private D-cache (data), with a 1.5MB private L2 cache. The two L1 & L2 SRAM caches are replicated for each of the four cores.  There is an onboard shared 24MB eDRAM L3 cache as well. With a full 5.2Ghz clock speed across all cores in the z196 quad-core processor group.

Each z196 processing core supports out-of-order instruction execution with a 40 instruction window size.   Further, all data is protected with ECC and hardened with parity and/or duplication for processing steps.

Six of these z196 processing chips combine together to form a processor node on a multi-chip module (MCM).  There is an industry first additional 192MB eDRAM L4 cache shared across the six processing chips on a MCM.  Each System z MCM can interface with up to 750GB of main memory.

In a System z processing frame there can be up to four MCMs, which then provides a total of 96 processing cores.  With the four MCMs, System z can address ~3TB of main memory.  Each MCM is fully interconnected with all other MCMs in a processing frame via a pair of redundant fabric interfaces.

System z is a CISC architecture which with the Z196 has passed the 1000 instruction count barrier (1079 instructions).  Whew, glad I am not coding in Assembler anymore.

IBM formerly announced the chip a month ago and it will be in shipping System z product later this year.

There was some mention by WSJ blogs of Power systems 7+ going up to 5.5Ghz   but I couldn’t locate a more definitive source for that news.

Comments?

Image: Z10 by Roberto Berlim