## A new way to compute

I read an article the other day on using using random pulses rather than digital numbers to compute with, see Computing with random pulses promises to simplify circuitry and save power, in IEEE Spectrum. Essentially they encode a number as a probability in a random string of bits and then use simple logic to compute with. This approach was invented in the early days of digital logic and was called stochastic computing.

# Stochastic numbers?

It’s pretty easy to understand how such logic can work for fractions. For example to represent 1/4, you would construct a bit stream that had one out of every four bits, on average, as a 1 and the rest 0’s. This could easily be a random string of bits which have an average of 1 out of every 4 bits as a one.

A nice result of such a numerical representation is that it easily results in more precision as you increase the length of the bit stream. The paper calls this progressive precision.

Progressive precision helps stochastic computing be more fault tolerant than standard digital logic. That is, if the string has one bit changed it’s not going to make that much of a difference from the original string and computing with an erroneous number like this will probably result in similar results to the correct number.  To have anything like this in digital computation requires parity bits, ECC, CRC and other error correction mechanisms and the logic required to implement these is extensive.

# Stochastic computing

Another advantage of stochastic computation and using a probability  rather than binary (or decimal) digital representation, is that most arithmetic functions are much simpler to implement.

They discuss two examples in the original paper:

• Multiplication – Multiplying two probabilistic bit streams together is as simple as ANDing the two strings.

• Addition – Adding two probabilistic bit strings together just requires a multiplexer, but you end up with a bit string that is the sum of the two divided by two.

I see a couple of problems with stochastic computing:,

• How do you represent  an irrational number, such as the square root of 2;
• How do you represent integers or for that matter any value greater than 1.0 in a probabilistic bit stream; and
• How do you represent negative values in a bit stream.

I suppose irrational numbers could be represented by taking a near-by, close approximation of the irrational number. For instance, using 1.4 for the square root of two, or 1.41, or 1.414, …. And this way you could get whatever (progressive) precision that was needed.

As for integers greater than 1.0, perhaps they could use a floating point representation, with two defined bit strings, one representing the mantissa (fractional part) and the other an exponent. We would assume that the exponent rather than being a probability from 0..1.0, would be inverted and represent 1.0…∞.

Negative numbers are a different problem. One way to supply negative numbers is to use something akin to complemetary representation. For example, rather than the probabilistic bit stream representing 0.0 to 1.0 have it represent -0.5 to 0.5. Then progressive precision would work for negative numbers as well a positive numbers.

One major downside to stochastic numbers and computation is that high precision arithmetic is very difficult to achieve.  To perform 32 bit precision arithmetic would require a bit streams that were  2³² bits long. 64 bit precision would require streams that were  2**64th bits long.

# Good uses for stochastic computing

One advantage of simplified logic used in stochastic computing is it needs a lot less power to compute. One example in the paper they use for stochastic computers is as a retinal sensor for in the body visual augmentation. They developed a neural net that did edge detection that used a stochastic front end to simplify the logic and cut down on power requirements.

Other areas where stochastic computing might help is for IoT applications. There’s been a lot of interest in IoT sensors being embedded in streets, parking lots, buildings, bridges, trucks, cars etc. Most have a need to perform a modest amount of edge computing and then send information up to the cloud or some edge consolidator intermediate

Many of these embedded devices lack access to power, so they will need to make do with whatever they can find.  One approach is to siphon power from ambient radio (see this  Electricity harvesting… article), temperature differences (see this MIT … power from daily temperature swings article), footsteps (see Pavegen) or other mechanisms.

The other use for stochastic computing is to mimic the brain. It appears that the brain encodes information in pulses of electric potential. Computation in the brain happens across exhibitory and inhibitory circuits that all seem to interact together.  Stochastic computing might be an effective way, low power way to simulate the brain at a much finer granularity than what’s available today using standard digital computation.

~~~~

Not sure it’s all there yet, but there’s definitely some advantages to stochastic computing. I could see it being especially useful for in body sensors and many IoT devices.

Photo Credit(s):  The logic of random pulses

2 bit by 2 bit multiplier, By Sodaboy1138 (talk) (Uploads) – Own work, CC BY-SA 3.0, wikimedia

AND ANSI Labelled, By Inductiveload – Own work, Public Domain, wikimedia

2 Input multiplexor

A battery free implantable neural sensor, MIT Technology Review article

Integrating neural signal and embedded system for controlling a small motor, an IntechOpen article

## Blockchains go mainstream…

I read an article a while back on Finland’s use of blockchain technology to provide bank accounts and identity services to immigrants (see  MIT TechReview article about Finland).

Blockchains were originally invented as a way of supporting financial transactions outside the current, government monitored, financial marketplace. With Finland’s experiment, the government is starting to use blockchains to support the unbanked and monitoring their financial activity – go figure.

## Debit cards on blockchain

Finland’s using a Helsinki based startup MONI, to assign a MONI card, essentially a prepaid MasterCard, to all immigrants. An immigrant can use their MONI card to pay for anything online or in real life, use it as a direct deposit account or to receive and track the use of government assistance.

Underlying the MONI card is public blockchain technology. That is MONI  is not using normal credit card services to support it’s bank accounts, MONI money transfers are done through the use of public blockchains.

MONI accounts are essentially (crypto currency) wallets but used as a debit card. The user merely enters a series of numbers into web forms or uses their MONI card at a credit card terminals throughout Europe. Transferring money between MONI users anywhere in the World is also free and instantaneous.

Finland also sees an immutable record of all immigrant financial transactions,  that can be monitored to track immigrant (financial) integration into the country.

MONI is intending to make this service more broadly available. A MONI card account costs €2/month and MONI take’s a small cut out of each monetary transaction.

## IDs on blockchain

I read another article the other day “Microsoft to implement blockchain-based ID system” in CoinTelegraph about using blockchains as a universal digital ID.

India has over the last decade, implemented a digital government ID using biometrics (see Aadhaar wikipedia article). Other countries have been moving to e-government where use of government services is implemented over the Internet (see EU article on eGovernment in Lithuania). Such eGovernment services depend on a digitized population registry.

Although it’s unclear whether Aadhaar and Lithuania make use of blockchain technology for their ID services, Microsoft’s definitely looking to blockchains to provide unique accounts/digital IDs to it’s population of users.

User signon’s has been a prevalent problem of the web for years. Each and every web and mobile App requires a person to signon to personalize their App. Nowadays, many Apps support using Google ID or Facebook ID for a single signon and there are other technologies being offered that provide similar services. Using a blockchain ID could easily support a single signon service.

The blockchain ID (wallet) public key could easily be used to encrypt an authentication transaction, identifying the App and the user. This authentication transaction would be processed by the blockchain digital ID service would use the private key to decrypt the transaction and use a backend ID App repository for the user to check to see that the user loging in, is the person that opened the account, acting as a sort of “proof of who you are”

## Storage on blockchain

Filecoin and StorJ are storage providers that use blockchain services to allow others to use your local (or networked) storage to provide storage to the world.

A while back I had written about (free) peer to peer storage and compute services  (see my Free P2P cloud storage … post). But the problem was how do people benefit from hosting the P2P storage or compute. Filecoin and Storj solved this by paying in cryptocurrencies for storage hosted on your hardware.

Filecoin offers a storage auction and hosting service that anyone worldwide can log into and use. The data stored is encrypted end-to-end so that no one can see what’s being stored and the data is also erasure coded so that it  is protected and accessible even with having one or more hosting sites be offline.

Filecoin uses “proofs of storage“, “proofs of space”, “proofs of data possession“, and “proofs of retrievability” as a way to guarantee their storage service works properly. They also use chained “proofs of replication” as “proofs of spacetime” as service validation checks. Proofs of Replication are a way of insuring that storage providers are not deduplicating data copies and charging for non-deduped storage. (See Filecoin’s Proof of Replication paper for more info).

Storj looks somewhat similar to Filecoin, but without as much sophistication behind it.

## Compute on blockchain

Ethereum was invented to support smart contracts that run on blockchain technology. IBM’s HyperLegder OpenLedger project (see our GreyBeardsOnStorga Podcast and RayOnStorage post on Hyperledger) is another example.

Smart contracts are essentially applications that run in a blockchains virtualized environment. Blockchain services are used to run an application and validate that’s it’s run only once. In some cases smart contracts use  external oracles to query as a way to verify something or some action has occurred outside the blockchain. Other oracles can be entirely digital entities that check on a particular commodity price, weather pattern, account value, etc. The oracle becomes a critical step in determining the go no go status of a smartcontract.

Salon, a news providing website, offers readers an option to see advertisements or to allow Salon to use their computer (browser) to mine crypto coins. (See Salon offers… article in CoinDesk).

I believe this offer is made when the website detects a viewer is using  ad blockers.

~~~~

Tthe trend is clear, people, organizations and even governments are looking at blockchain technology to provide basic and advanced services around the world.

If anyone would is interested in providing a pre-paid Visa card via blockchains, please contact me. I’d like to help.

Now if I could just find my GPU’s at a decent price somewhere…

Speaking of advertising… RayOnStorage doesn’t use advertising. But blogging like this takes time and money. If anyone’s interested in helping fund this blog, please consider sending some BTC our way, even 0.0001 BTC would help.

1MqBbAvMo6QbCVD6ZwtbLaPxmcUZGj9Ghw

Photo Credit(s): Blockchain and the public sector on OpenGovAsia.com

Unleash your design teams with single signon on Unifilabs.com

Understanding the difference between P2P and Client-server networks on LinkedIN

Blockgeek’s guide to smart contracts

## GPU growth and the compute changeover

Attended SC17 last month in Denver and Nvidia had almost as big a presence as Intel. Their VR display was very nice as compared to some of the others at the show.

## GPU past

GPU’s were originally designed to support visualization and the computation to render a specific scene quickly and efficiently. In order to do this they were designed with 100s to now 1000s of arithmetically intensive (floating point) compute engines where each engine could be given an individual pixel or segment of an image and compute all the light rays and visual aspects pertinent to that scene in a very short amount of time. This created a quick and efficient multi-core engine to render textures and map polygons of an image.

Image rendering required highly parallel computations and as such more compute engines meant faster scene throughput. This led to todays GPUs that have 1000s of cores. In contrast, standard microprocessor CPUs have 10-60 compute cores today.

## GPUs today

Funny thing, there are lots of other applications for many core engines. For example, GPUs also have a place to play in the development and mining of crypto currencies because of their ability to perform many cryptographic operations a second, all in parallel

Another significant driver of GPU sales and usage today seems to be AI, especially machine learning. For instance, at SC17, visual image recognition was on display at dozens of booths besides Intel and Nvidia. Such image recognition  AI requires a lot of floating point computation to perform well.

I saw one article that said GPUs can speed up Machine Learning (ML) by a factor of 250 over standard CPUs. There’s a highly entertaining video clip at the bottom of the Nvidia post that shows how parallel compute works as compared to standard CPUs.

GPU’s play an important role in speech recognition and image recognition (through ML) as well. So we find that they are being used in self-driving cars, face recognition, and other image processing/speech recognition tasks.

The latest Apple X iPhone has a Neural Engine which my best guess is just another version of a GPU. And the iPhone 8 has a custom GPU.

Tesla is also working on a custom AI engine for its self driving cars.

So, over time, GPUs will have an increasing role to play in the future of AI and crypto currency and as always, image rendering.

Photo Credit(s): SC17 logo, SC17 website;

GTX1070(GP104) vs. GTX1060(GP106) by Fritzchens Fritz;

Intel 2nd Generation core microprocessor codenamed Sandy Bridge wafer by Intel Free Press

## Blockchain, open source and trusted data lead to better SDG impacts

Read an article today in Bitcoin magazine IXO Foundation: A blockchain based response to UN call for [better] data which discusses how the UN can use blockchains to improve their development projects.

The UN introduced the 17 Global Goals for Sustainable Development (SDG) to be achieved in the world by 2030. The previous 8 Millennial Development Goals (MDG) expire this year.

Although significant progress has been made on the MDGs, one ongoing determent to  MDG attainment has been that progress has been very uneven, “with the poorest and economically disadvantaged often bypassed”.  (See WEF, What are Sustainable Development Goals).

Throughout the UN 17 SDG, the underlying objective is to end global poverty  in a sustainable way.

## Impact claims

In the past organizations performing services for the UN under the MDG mandate, indicated they were performing work toward the goals by stating, for example, that they planted 1K acres of trees, taught 2K underage children or distributed 20 tons of food aid.

The problem with such organizational claims is they were left mostly unverified. So the UN, NGOs and other charities funding these projects were dependent on trusting the delivering organization to tell the truth about what they were doing on the ground.

However, impact claims such as these can be independently validated and by doing so the UN and other funding agencies can determine if their money is being spent properly.

## Proving impact

Proofs of Impact Claims can be done by an automated bot, an independent evaluator or some combination of the two . For instance, a bot could be used to analyze periodic satellite imagery to determine whether 1K acres of trees were actually planted or not; an independent evaluator can determine if 2K students are attending class or not, and both bots and evaluators can determine if 20 tons of food aid has been distributed or not.

Such Proofs of Impact Claims then become a important check on what organizations performing services are actually doing.  With over \$1T spent every year on UN’s SDG activities, understanding which organizations actually perform the work and which don’t is a major step towards optimizing the SDG process. But for Impact Claims and Proofs of Impact Claims to provide such feedback but they must be adequately traced back to identified parties, certified as trustworthy and be widely available.

## The ixo Foundation

The ixo Foundation is using open source, smart contract blockchains, personalized data privacy, and other technologies in the ixo Protocol for UN and other organizations to use to manage and provide trustworthy data on SDG projects from start to completion.

Trustworthy data seems a great application for blockchain technology. Blockchains have a number of features used to create trusted data:

1. Any impact claim and proofs of impacts become inherently immutable, once entered into a blockchain.
2. All parties to a project, funders, services and evaluators can be clearly identified and traced using the blockchain public key infrastructure.
3. Any data can be stored in a blockchain. So, any satellite imagery used, the automated analysis bot/program used, as well as any derived analysis result could all be stored in an intelligent blockchain.
4. Blockchain data is inherently widely available and distributed, in fact, blockchain data needs to be widely distributed in order to work properly.

## The ixo Protocol

The ixo Protocol is a method to manage (SDG) Impact projects. It starts with 3 main participants: funding agencies, service agents and evaluation agents.

• Funding agencies create and digitally sign new Impact Projects with pre-defined criteria to identify appropriate service  agencies which can do the work of the project and evaluation agencies which can evaluate the work being performed. Funding agencies also identify Impact Claim Template(s) for the project which identify standard ways to assess whether the project is being performed properly used by service agencies doing the work. Funding agencies also specify the evaluation criteria used by evaluation agencies to validate claims.
• Service agencies select among the open Impact Projects whichever ones they want to perform.  As the service agencies perform the work, impact claims are created according to templates defined by funders, digitally signed, recorded and collected into an Impact Claim Set underthe IXO protocol.  For example Impact Claims could be barcode scans off of food being distributed which are digitally signed by the servicing agent and agency. Impact claims can be constructed to not hold personal identification data but still cryptographically identify the appropriate parties performing the work.
• Evaluation agencies then take the impact claim set and perform the  evaluation process as specified by funding agencies. The evaluation insures that the Impact Claims reflect that the work is being done correctly and that the Impact Project is being executed properly. Impact claim evaluations are also digitally signed by the evaluation agency and agent(s), recorded and widely distributed.

The Impact Project definition, Impact Claim Templates, Impact Claim sets, Impact Claim Evaluations are all available worldwide, in an Global Impact Ledger and accessible to any and all funding agencies, service agencies and evaluation agencies.  At project completion, funding agencies should now have a granular record of all claims made by service agency’s agents for the project and what the evaluation agency says was actually done or not.

Such information can then be used to guide the next round of Impact Project awards to further advance the UN SDGs.

## Ambly project

The Ambly Project is using the ixo Protocol to supply childhood education to underprivileged children in South Africa.

It combines mobile apps with blockchain smart contracts to replace an existing paper based school attendance system.

The mobile app is used to record attendance each day which creates an impact claim which can then be validated by evaluators to insure children are being educated and properly attending class.

~~~

Blockchains have the potential to revolutionize financial services, provide supply chain provenance (e.g., diamonds with Blockchains at IBM), validate company to company contracts (Ethereum enters the enterprise) and now improve UN SDG attainment.

Welcome to the new blockchain world.

Photo Credit(s): What are Sustainable Development Goals, World Economic Forum;

IXO Foundation website

Ambly Project webpage

## A tale of two storage companies – NetApp and Vantara (HDS-Insight Grp-Pentaho)

It was the worst of times. The industry changes had been gathering for a decade almost and by this time were starting to hurt.

The cloud was taking over all new business and some of the old. Flash’s performance was making high performance easy and reducing storage requirements commensurately. Software defined was displacing low and midrange storage, which was fine for margins but injurious to revenues.

Both companies had user events in Vegas the last month, NetApp Insight 2017 last week and Hitachi NEXT2017 conference two weeks ago.

As both companies respond to industry trends, they provide an interesting comparison to watch companies in transition.

## Company role

• NetApp’s underlying theme is to change the world with data and they want to change to help companies do this.
• Vantara’s philosophy is data and processing is ultimately moving into the Internet of things (IoT) and they want to be wherever the data takes them.

Hitachi Vantara is a brand new company that combines Hitachi Data Systems, Hitachi Insight Group and Pentaho (an analytics acquisition) into one organization to go after the IoT market. Pentaho will continue as a separate brand/subsidiary, but HDS and Insight Group cease to exist as separate companies/subsidiaries and are now inside Vantara.

NetApp sees transitions occurring in the way IT conducts business but ultimately, a continuing and ongoing role for IT. NetApp’s ultimate role is as a data service provider to IT.

## Customer problem

• Vantara believes the main customer issue is the need to digitize the business. Because competition is emerging everywhere, the only way for a company to succeed against this interminable onslaught is to digitize everything. That is digitize your manufacturing/service production, sales, marketing, maintenance, any and all customer touch points, across your whole value chain and do it as rapidly as possible. If you don’t your competition will.
• NetApp sees customers today have three potential concerns: 1) how to modernize current infrastructure; 2) how to take advantage of (hybrid) cloud; and 3) how to build out the next generation data center. Modernization is needed to free capital and expense from traditional IT for use in Hybrid cloud and next generation data centers. Most organizations have all three going on concurrently.

Vantara sees the threat of startups, regional operators and more advanced digitized competitors as existential for today’s companies. The only way to keep your business alive under these onslaughts is to optimize your value delivery. And to do that, you have to digitize every step in that path.

NetApp views the threat to IT as originating from LoB/shadow IT originating applications born and grown in the cloud or other groups creating next gen applications using capabilities outside of IT.

## Product direction

• NetApp is looking mostly towards the cloud. At their conference they announced a new Azure NFS service powered by NetApp. They already had Cloud ONTAP and NPS, both current cloud offerings, a software defined storage in the cloud and a co-lo hardware offering directly attached to public cloud (Azure & AWS), respectively.
• Vantara is looking towards IoT. At their conference they announced Lumada 2.0, an Industrial IoT (IIoT) product framework using plenty of Hitachi software functionality and intended to bring data and analytics under one software umbrella.

NetApp is following a path laid down years past when they devised the data fabric. Now, they are integrating and implementing data fabric across their whole product line. With the ultimate goal that wherever your data goes, the data fabric will be there to help you with it.

Vantara is broadening their focus, from IT products and solutions to IoT. It’s not so much an abandoning present day IT, as looking forward to the day where present day IT is just one cog in an ever expanding, completely integrated digital entity which the new organization becomes.

They both had other announcements, NetApp announced ONTAP 9.3, Active IQ (AI applied to predictive service) and FlexPod SF ([H]CI with SolidFire storage) and Vantara announced a new IoT turnkey appliance running Lumada and a smart data center (IoT) solution.

## Who’s right?

They both are.

Digitization is the future, the sooner organizations realize and embrace this, the better for their long term health. Digitization will happen with or without organizations and when it does, it will result in a significant re-ordering of today’s competitive landscape. IoT is one component of organizational digitization, specifically outside of IT data centers, but using IT resources.

In the mean time, IT must become more effective and efficient. This means it has to modernize to free up resources to support (hybrid) cloud applications and supply the infrastructure needed for next gen applications.

One could argue that Vantara is positioning themselves for the long term and NetApp is positioning themselves for the short term. But that denies the possibility that IT will have a role in digitization. In the end both are correct and both can succeed if they deliver on their promise.

## Axellio, next gen, IO intensive server for RT analytics by X-IO Technologies

We were at X-IO Technologies last week for SFD13 in Colorado Springs talking with the team and they showed us their new IO and storage intensive server, the Axellio. They want to sell Axellio to customers that need extreme IOPS, very high bandwidth, and large storage requirements. Videos of X-IO’s sessions at SFD13 are available here.

## The hardware

Axellio comes in 2U appliance with two server nodes. Each server supports  2 sockets of Intel E5-26xx v4 CPUs (4 sockets total) supporting from 16 to 88 cores. Each server node can be configured with up to 1TB of DRAM or it also supports NVDIMMs.

There are two key differentiators to Axellio:

1. The FabricExpress™, a PCIe based interconnect which allows both server nodes to access dual-ported,  2.5″ NVMe SSDs; and
2. Dense drive trays, the Axellio supports up to 72 (6 trays with 12 drives each) 2.5″ NVMe SSDs offering up to 460TB of raw NVMe flash using 6.4TB NVMe SSDs. Higher capacity NVMe SSDS available soon will increase Axellio capacity to 1PB of raw NVMe flash.

They also probably spent a lot of time on packaging, cooling and power in order to make Axellio a reliable solution for edge computing. We asked if it was NEBs compliant and they told us not yet but they are working on it.

Axellio can also be configured to replace 2 drive trays with 2 processor offload modules such as 2x Intel Phi CPU extensions for parallel compute, 2X Nvidia K2 GPU modules for high end video or VDI processing or 2X Nvidia P100 Tesla modules for machine learning processing. Probably anything that fits into Axellio’s power, cooling and PCIe bus lane limitations would also probably work here.

At the frontend of the appliance there are 1x16PCIe lanes of server retained for networking that can support off the shelf NICs/HCAs/HBAs with HHHL or FHHL cards for Ethernet, Infiniband or FC access to the Axellio. This provides up to 2x100GbE per server node of network access.

## Performance of Axellio

With Axellio using all NVMe SSDs, we expect high IO performance. Further, they are measuring IO performance from internal to the CPUs on the Axellio server nodes. X-IO says the Axellio can hit >12Million IO/sec with at 35µsec latencies with 72 NVMe SSDs.

Lab testing detailed in the chart above shows IO rates for an Axellio appliance with 48 NVMe SSDs. With that configuration the Axellio can do 7.8M 4KB random write IOPS at 90µsec average response times and 8.6M 4KB random read IOPS at 164µsec latencies. Don’t know why reads would take longer than writes in Axellio, but they are doing 10% more of them.

Furthermore, the difference between read and write IOP rates aren’t close to what we have seen with other AFAs. Typically, maximum write IOPs are much less than read IOPs. Why Axellio’s read and write IOP rates are so close to one another (~10%) is a significant mystery.

As for IO bandwitdh, Axellio it supports up to 60GB/sec sustained and in the 48 drive lax testing it generated 30.5GB/sec for random 4KB writes and 33.7GB/sec for random 4KB reads. Again much closer together than what we have seen for other AFAs.

Also noteworthy, given PCIe’s bi-directional capabilities, X-IO said that there’s no reason that the system couldn’t be doing a mixed IO workload of both random reads and writes at similar rates. Although, they didn’t present any test data to substantiate that claim.

## Markets for Axellio

They really didn’t talk about the software for Axellio. We would guess this is up to the customer/vertical that uses it.

Aside from the obvious use case as a X-IO’s next generation ISE storage appliance, Axellio could easily be used as an edge processor for a massive fabric of IoT devices, analytics processor for large RT streaming data, and deep packet capture and analysis processing for cyber security/intelligence gathering, etc. X-IO seems to be focusing their current efforts on attacking these verticals and others with similar processing requirements.

## X-IO Technologies’ sessions at SFD13

Other sessions at X-IO include: Richard Lary, CTO X-IO Technologies gave a very interesting presentation on an mathematically optimized way to do data dedupe (caution some math involved); Bill Miller, CEO X-IO Technologies presented on edge computing’s new requirements and Gavin McLaughlin, Strategy & Communications talked about X-IO’s history and new approach to take the company into more profitable business.

Again all the videos are available online (see link above). We were very impressed with Richard’s dedupe session and haven’t heard as much about bloom filters, since Andy Warfield, CTO and Co-founder Coho Data, talked at SFD8.

## Full Disclosure

X-IO paid for our presence at their sessions and they provided each blogger a shirt, lunch and a USB stick with their presentations on it.

## Google releases new Cloud TPU & Machine Learning supercomputer in the cloud

This year they are releasing a new version of their hardware called the Cloud TPU chip and making it available in a cluster on their Google Cloud.  Cloud TPU is in Alpha testing now. As I understand it, access to the Cloud TPU will eventually be free to researchers who promise to freely publish their research and at a price for everyone else.

## What’s different between TPU v1 and Cloud TPU v2

The differences between version 1 and 2 mostly seem to be tied to training Machine Learning Models.

TPU v1 didn’t have any real ability to train machine learning (ML) models. It was a relatively dumb (8 bit ALU) chip but if you had say a ML model already created to do something like understand speech, you could load that model into the TPU v1 board and have it be executed very fast. The TPU v1 chip board was also placed on a separate PCIe board (I think), connected to normal x86 CPUs  as sort of a CPU accelerator. The advantage of TPU v1 over GPUs or normal X86 CPUs was mostly in power consumption and speed of ML model execution.

Cloud TPU v2 looks to be a standalone multi-processor device, that’s connected to others via what looks like Ethernet connections. One thing that Google seems to be highlighting is the Cloud TPU’s floating point performance. A Cloud TPU device (board) is capable of 180 TeraFlops (trillion or 10^12 floating point operations per second). A 64 Cloud TPU device pod can theoretically execute 11.5 PetaFlops (10^15 FLops).

TPU v1 had no floating point capabilities whatsoever. So Cloud TPU is intended to speed up the training part of ML models which requires extensive floating point calculations. Presumably, they have also improved the ML model execution processing in Cloud TPU vs. TPU V1 as well. More information on their Cloud TPU chips is available here.

## So how do you code a TPU?

Both TPU v1 and Cloud TPU are programmed by Google’s open source TensorFlow. TensorFlow is a set of software libraries to facilitate numerical computation via data flow graph programming.

Apparently with data flow programming you have many nodes and many more connections between them. When a connection is fired between nodes it transfers a multi-dimensional matrix (tensor) to the node. I guess the node takes this multidimensional array does some (floating point) calculations on this data and then determines which of its outgoing connections to fire and how to alter the tensor to send to across those connections.

Apparently, TensorFlow works with X86 servers, GPU chips, TPU v1 or Cloud TPU. Google TensorFlow 1.2.0 is now available. Google says that TensorFlow is in use in over 6000 open source projects. TensorFlow uses Python and 1.2.0 runs on Linux, Mac, & Windows. More information on TensorFlow can be found here.

## So where can I get some Cloud TPUs

Google is releasing their new Cloud TPU in the TensorFlow Research Cloud (TFRC). The TFRC has 1000 Cloud TPU devices connected together which can be used by any organization to train machine learning algorithms and execute machine learning algorithms.

I signed up (here) to be an alpha tester. During the signup process the site asked me: what hardware (GPUs, CPUs) and platforms I was currently using to training my ML models; how long does my ML model take to train; how large a training (data) set do I use (ranging from 10GB to >1PB) as well as other ML model oriented questions. I guess there trying to understand what the market requirements are outside of Google’s own use.

Google’s been using more ML and other AI technologies in many of their products and this will no doubt accelerate with the introduction of the Cloud TPU. Making it available to others is an interesting play but this would be one way to amortize the cost of creating the chip. Another way would be to sell the Cloud TPU directly to businesses, government agencies, non government agencies, etc.

I have no real idea what I am going to do with alpha access to the TFRC but I was thinking maybe I could feed it all my blog posts and train a ML model to start writing blog post for me. If anyone has any other ideas, please let me know.

Photo credit(s): From Google’s website on the new Cloud TPU

## Know Fortran, optimize NASA code, make money

Read a number of articles this past week about NASA offering a Fortran optimization contest, the High Performance Fast Computing Contest (HPFCC) for their computational fluid dynamics (CFD) program. They want to speed up CFD by 10X to 1000X and are willing to pay for it.

The contest is being run through HeroX and TopCoder and they are offering \$55K, across the various levels of the contests to the winners.

The FUN3D CVD code (manual) runs on NASA’s Pliedes Linux supercomputer complex which sports over 245K cores. Even when running on the supercomputer complex, a typical CVD FUN3D run takes thousands to millions of core hours!

## The program(s)

FUN3D does a hypersonic fluid analysis over a (fixed) surface which includes a “simulation of mixtures of thermally perfect gases in thermo-chemical equilibrium and non-equilibrium. The routines in PHYSICS_DEPS enable coupling of the new gas modules to the existing FUN3D infrastructure. These algorithms also address challenges in simulation of shocks and boundary layers on tetrahedral grids in hypersonic flows.”

Not sure what all that means but I am certain there’s a number of iterations on multiple Fortran modules, and it does this over a 3D grid of points, which corresponds to both the surface being modeled and the gas mixture, it’s running through at hypersonic speeds. Sounds easy enough.

## The contest(s)

There are two levels to the contest: an Ideation phase (at HeroX) and an architectural phase (at TopCoder). The \$55000 is split up between the HeroX ideation phase which rewards a total of \$20K: \$10K for winner and 2-\$5K runner up prizes and the TopCoder architectural phase which rewards a total of \$35K: \$15K for winner and \$10K for 2nd place and another \$10K for “Qualified improvement candidate”.

The (HeroX) Ideation phase looks for specific new or faster algorithms that could replace current ones in FUN3D which include “exploiting algorithmic developments in such areas as grid adaptation, higher-order methods and efficient solution techniques for high performance computing hardware.”

The (TopCoder) Architecture phase looks at specifically speeding up actual FUN3D code execution. “Ideal submission(s) may include algorithm optimization of the existing code base,  Inter-node dispatch optimization or a combination of the two.  Unlike the Ideation challenge, which is highly strategic, this challenge focuses on measurable improvements of the existing FUN3d suite and is highly tactical.”

Sounds to me that the ideation phase is selecting algorithm designs and the architecture phase is running the new algorithms or just in general speeding up the FUN3D code execution.

## The equation(s)

There’s a Navier-Stokes equation algorithm that get’s called maybe a trillion times until the flow settles down, during a run and any minor improvement here would be obviously significant. Perhaps there are algorithmic changes that can be used, if your an aeronautical engineer or perhaps there are compiler speedups that can be found, if your a fortran expert. Both approaches can be validated/debugged/proved out on a desktop computer.

You have to be a US citizen to access the code and you can apply here. You will receive an email to verify your email address and then once your validated and back on the website, you need to approve the software use agreement. NASA will verify your physical address by sending a letter to you with a passcode to use to finally access the code. The process may take weeks to complete, so if your interested in the contest, best to start now.

## The Fortran(s)

I learned Fortran 66 a long time ago and may have dabbled with Fortran 77 but that’s the last touched fortran. But it’s like riding a bike, once you do it, it’s easy to do it again.

As I understand it the FUN3D uses Fortran 2003 and NASA suggests you use the Gnu Fortran GFortran compiler as the Intel one has some bugs in it. There appears to be a Fortran 2015 but it’s not in main use just yet.

A million core hours, just amazing. If you could save a millisecond out of the routine called a trillion times, you’d save 1 billion seconds, or ~280K core hours.