Learning machine learning – part 1

Saw an article this past week from AWS Re:Invent that they just released their Machine Learning curriculum and materials  free to the public. Google (Cloud Platform and elsewhere) TensorFlow,  (Facebook’s) PyTorch, and Microsoft Azure CNTK frameworks  education is also available and has been for awhile now.

My money is on PyTorch and Tensorflow as being the two frameworks most likely to succeed. However all the above use many open source facilities and there seems to be a lot of cross breeding across them. Both AWS ML solutions and Microsoft CNTK offer PyTorch and TensorFlow frameworks/APIs as one option among many others.  

AWS Machine Learning

I spent about an hour plus looking over the AWS SageMaker tutorial videos in the developer section of AWS machine learning curriculum. Signing up was fairly easy but I already had an AWS login. You also had to enroll/register for the course on your AWS login  but once that was through, you could access courses.

In the comments on the AWS blog post there were a number of entries indicating broken links and other problems but I didn’t have any issues. Then again, I didn’t start at the beginning, only looked at over one series of courses, and was using the websites one week after they were announced at Re:Invent.

Amazon SageMaker is an overarching framework that can be used to perform machine learning on AWS, all the way from gathering, analyzing and modifying the dataset(s), to training the model, to creating a inference engine available as an endpoint that can be used to perform the inferencing.

Amazon also has special purpose API based tools that allow customers to embed intelligence (inferencing) directly into their application, without needing to perform the ML training. These include:

  • Amazon Recognition which provides image (facial and other tagging) recognition services
  • Amazon Polly which provides text to speech services in multilple languages, and
  • Amazon Lex which provides speech recognition technology (used by Alexa) and together with Polly helps embed conversational interfaces into customer applications.

TensorFlow Machine Learning

In the past I looked over the TensorFlow tutorials and recently rechecked them out. I found them much easier to follow this time.

 

The Google IO 2018 video on TensorFlowGetting Started With TensorFlow High Level APIs, takes you through a brief introduction to the Colab(oratory),  a GCP solution that uses TensorFlow and how to use Tensorflow Keras, tf.data and TensorFlow Eager Execution to create machine learning models and perform machine learning.

 Keras on TensorFlow seems to be the easiest approach to  use machine learning technologies. The video spends most of the time discussing a Colab Keras code element,  ~9 lines, that loads a image classification dataset, defines a 1 level (one standard layer and one output layer), trains it, validates it and uses it to perform  inferencing.

The video also touches a bit on tf.data and TensorFlow Eager Execution but the main portion discusses the 9 line TensorFlow Keras machine learning example.

Both Colab and AWS Sagemaker use and discuss Jupiter Notebooks. These appear to be an open source approach to documenting and creating a workflow and executing Python code automatically.

GCP Colab is essentially a GCP-Google Drive based Jupiter notebook execution engine. With Colab you create a Jupiter notebook on google drive and interactively execute it under Colab. You can download your Jupiter notebook files and essentially execute them anywhere else that supports TensorFlow (that supports TensorFlow v1.7 or above, with Keras API support).

In the video, the Google IO   instructors (Josh Gordon and Lawrence Moroney) walk you through building a model to recognize handwritten digits and outputs a classification (0..9) of what the handwritten digit represents.

It uses a standard labeled handwriting to digits labeled data set, called the MNIST database of handwritten digits that’s already been broken up into a training set and a validation set. Josh calls this the “Hello World” of machine learning.

The instructor in the video walks you through the (Jupiter Notebook – Eager Execution-Keras) code that inputs the data set (line 2), builds a 1 level (really two layer, one neural net layer and one output layer) neural network model (lines 3-6), trains the model (line 7), tests/validates the model (line 8) and then uses it to perform an inference (line 9).

Josh spends a little time discussing neural networks and model optimizations and some of the other parameters used in the code above. He has a few visualizations of what this all means but for the most part, the code uses a simple way to build a neural net model and some standard optimization techniques for the network.

He then goes on to discuss tf.data which is an API that can be used to create machine learning datasets and provide this data to the neural net for training or inferencing.  Apparently tf.data has a number of nifty features that allow you to take raw data and transform it into something that can be used to feed neural nets. For example, separating the data into batches, shuffling (randomizing) the batches of data, pre-fetching it so as to not starve the GPU matrix multipliers, etc.

Then it goes into how machine learning is different than regular coding. And show how TensorFlow Eager Execution is really just like Python execution. They go through another example (larger) of machine learning, this one distinguishes between cats and dogs. While they use an open source Python IDE ,  PyCharm, to test and walk through their TF Eager Execution code, setting breakpoints and examining data along the way.

At the end of the video they show a link to a Google crash course on TensorFlow machine learning and they refer to a book Deep Learning with Python by Francois Chollet. They also mention a browser version of TensorFlow which uses Java Script and  your browser to develop, train and perform inferences using TensorFlow Keras machine learning.

~~~~

Never got around to Microsoft’s Azure training other than previewing some websites but plan to look over that soon.

I would have to say that the Google IO session on using TensorFlow high level APIs was a lot more enjoyable (~40 minutes) than the AWS multiple tutorial videos (>>40 minutes) that I watched to learn about SageMaker.

Not a fair comparison as one was a Google IO intro session on TensorFlow high level APIs and the other was a series of actual training videos on Amazon SageMaker and the AWS services you can use to take advantage of it.

But the GCP session left me thinking I can handle learning more and using machine learning (via TensorFlow, Keras, Eager Execution, & tf.data) to actually do something while the SageMaker sessions left me thinking, how much AWS facilities and AWS infrastructure services,  I would need to understand and use to ever get to actually developing a machine learning model.

I suppose one was more of an (AWS SageMaker) infrastructure tutorial  and the other was more of an intro into machine learning using TensorFlow wherever you wanted to execute it.

I think I’m almost ready to start creating and feeding a TensorFlow model with my handwriting and seeing if it can properly interpret it into searchable text. If it can do that, I would be a happy camper

Comments…

Photo credits: 

Screenshos from AWS Sagemaker series of tutorial video 1, 2, 3, 4 & 5, you may need a signin to view them

Screenshots from the Getting Started with TensorFlow High Level APIs YouTube video 

New GraphCore GC2 chips with 2PFlop performance in a Dell Server

I was at Dell EMC Analyst summit this past week and at the show they had a series of sessions describing some of Dell Venture capital investments. One of the sessions was about GraphCore, a UK design firm that’s working on a new AI chip.

Their new GC2 chip is now out and available for customers to use. The new chip offers unprecedented performance for AI NN computations.

Hardware

GraphCore’s new Colossus GC2 chip holds 1216 IPU-Cores™. Each IPU runs at 100GFlops and is capable of running 7 threads. The GC2 chip supports 300MB of memory, with an aggregate of 30TB/s of memory bandwidth.  Each IPU supports low precision floating point arithmetic in completely parallel/concrrent execution. The GC2 chip has 23.6B transistors.

Each GC2 chip supports 80 IPU-Links™ to connect to other GC2 chips with 2.5tbps of chip to chip bandwidth. Further, the chip includes a PCIe Gen 4 x16 link (31.5GB/s) to host processors. And each chip supports up to 8TB/s IPU-Exchange™ on the chip bandwidth for inter chip, IPU to IPU communications.

The GC2 chip is available on a PCIe accelerator board that includes 2 GC2 chips. It’s also available in a Dell server configuration with 8 of their PCIe accelerator boards. In the server, with 2 GC2 chips  per board, it has ~19.5K IPUs with ~2.0PFlops in total of IPU processing power.

Software

GC2 IPUs support GraphCore’s Poplar® software and API’s that allows users to code in many of their favorite AI framework, such as PyTorch and TensorFlow.

At the NIPS 2017 conference GraphCore showed some AI ResNet-50, DeepBench LSTM RNN, and DeepVoice WaveNet performance benchmark results with their GC2 accelerator cards..

The chart above shows DeepBench LSTMN RNN runs comparing their  GC2 accelerator card against an Nvidia P100 GPU board (longer is better).

DeepBench is intended to support a set of workloads that mimic or simulate typical deep neural net types of operations and is used to compare NN hardware systems. The chart above compares DeepBench RNN inference operations with  GC2 accelerator card  vs. Nvidia P100 cards at three levels of response times (<2msec, <5msec. and <7msec.).

As can be seen in the chart, the GraphCore GC2 accelerator card performed significantly (from 182X to 242X) better than the Nvidia accelerator card executing NN inferencing at <5msec and <7msec latency. And was able to perform ~42K Inferences at <2msec latency where Nvidia P100 was unable to do at all.

~~~~
The GC2 chip, accelerator card and Dell EMC servers that run them look to be a significant advance in AI NN computations. We didn’t see any technical spec’s for the server but we assume it comes in a 4U configuration and uses less power than 8 GPUs.

However, at the moment, the servers are sold out. No information on the GC2 accelerator cards but our guess is that they are sold out as well, and probably ditto for the chips. Dell didn’t quote us any pricing on the servers, so its hard to know whether we could afford one, even if they weren’t sold out.

Who wouldn’t want to own a 4U server with 2PFlops performance for their AI apps?

Comments?

Photo Credit(s): Photos taken during Dell EMC Analyst Summit GraphCore presentation

Photos from GraphCore NIPS 2017 presentations

University of Manchester fires up world’s largest neuromorphic computer

Read an article the other day about SpiNNaker, the University of Manchester’s neuromorphic supercomputer (see Live Science Article: Worlds largest supercomputer brain…). There’s also a wikipedia page on SpiNNaker and a SpiNNaker project page.

 

SpiNNaker is part of the European Union Human Brain Project (HBP), Brain Simulation Program.

SpiNNaker supercomputing hardware

(Most of the following information is from the SpiNNaker project home page and SpiNNaker architectural overview page.)

The system has 1 million ARM9 (968) cores and ~7TB of memory, with each core emulating 1000 spiking neurons. With this amount of computing power, it should be able to emulate a 1B (1 billion, 10^9) neuron brain (region).

The system will consist of 1200 PCBs with each PCB containing a 48 chip array and associated networking hardware & memory. Each node contains a SpiNNaker chip with its 18 ARM9 cores.

Each node has two chips stitch bonded together on top of one another. The bottom consists of the 18-ARM9 cores and the top the double DDR memory and networking layer.

Total bisectional networking bandwidth is 5 B packets/second with each packet consisting of 5 or 9 bytes of data.

SpiNNaker operates on 1W per chip or 90KW of power to run the entire machine. Given that each chip is 18 cores and each core is 1000 neurons, this means each neuron simulation takes about 55.5µW of power to run.

You can deploy a single board as IoT solution but @ ~48W per board it may be be too energy consumptive for IoT.

SpiNNaker supercomputing software

According to the home page and the Live Science article, SpiNNaker is intended to be used to model critical segments of the human brain such as the  basal ganglia brain area for the EU HBP brain simulation program.

The system architecture has three tiers, a host machine (layer) which communicates with the monitor layer to start and monitor application execution and uses “ybug” to communicate,  a monitor core (layer) which interacts with ybug at the host and uses “scamp” to communicate with the application processors, and the application processors (layer) consisting of the ARM cores, memory and packet networking hardware which runs the  SpiNNaker Application Runtime Kernel (sark).

Applications which run on sark can consist of spiking neural networks or multi-layer perceptrons (MLP), classical deep learning neural networks.

  • MLP applications use back propagation and a training and inference phases, familiar to any deep learning application and uses a fixed neural network topology.
  • Spiking neural network applications use ongoing learning so there’s no training or inference phases (it’s always learning), use a variable network topology (reconfiguring the ARM core-packet network) and currently supports the PyNN spiking neural network simulator.

Unfortunately most of the links in the SpiNNaker project pages referring to PyNN spiking networking applications are broken. But PyNN is a Python based spiking neural network simulator that can run on a number of different hardware platforms (including sark/SpiNNaker).

Most of the AI groups I’ve talked with mention PyTorch or TensorFlow as AI frameworks of choice these days. But it’s unclear to me whether these two support spiking neural network generation/simulation.

If you want to learn more about programming SpiNNaker please check out their Software for SpiNNaker wiki page.

~~~~

As you may recall, a homo sapiens brain has an estimated 16B to 86B neurons in its average cerebral cortex (see wikipedia “animals listed by neuron count” article, for low estimate, EU’s HBP Brain Simulation page, for high estimate), which puts SpiNNaker today, at about the equivalent of less than a average tufted capuchin cerebral cortex (@1.2B neurons).

Given the above and with SpiNNaker @1B neurons, we are only  4 to 7 generations away from human equivalence. That means we have at most ~14 years left before a 128B spiking neuronal simulation machine is available.

But SpiNNaker today is based on ARM9 cores and ARM11 cores already exist. So, if they redesigned/reimplemented the chip today, it would already be 2X the core count. aake that human equivalence is only a max of 12 years away.

The mean estimate for AGI (artificial general intelligence) seems to be 2040-2050 (see wikipedia Technological Singularity article). But given what University of Manchester’s SpiNNaker is capable of doing today, I don’t think we have that long to wait.

Photo Credits: All photos/charts above are from the SpiNNaker Project pages at the University of Manchester website

Low power spiking AI-Arm edge processing from voltage scaling on ETA TENSAI chip

Read an article in IEEE Spectrum (ETA Compute debuts spiking NN chip for edge AI) about a company producing a new AI IOT chip based on ARM microprocessors with DSPs for dedicated matrix computations.

The new ETA Compute TENSAI chip  technology supports spiking neural networks (NN) as well as more normal, convolutional NN  depending on  edge AI requirements. More  information can be found in an Embedded Computing article (Micropower intelligence for edge devices) and a EE News article (ETA adds spiking NN support to MCU)

We have discussed spiking NN  in prior posts  latest post: IBM using PCM for better AI -round 6). Any spiking NN more closely mimics real biological neurons present in the brains of humans and other life.

ETA also claims that spiking NN perform better unsupervised learning. They included some examples of this in videos. In one video, ETA trains a spiking NN to do the same job as a convolutional NN with 1/10th the pixel data.

In addition, spiking NN only use neural weight values of 0 or 1, whereas normal convolutional NN operations require 8 to 16 bit numbers. So spiking NN arithmetic really only uses addition while convolutional NN need 8-16 bit multiplicative arithmetic to determine resultant weights.

Voltage scaling saves power

The other claim is that the TENSAI chip can perform computations on the order of 10 microwatts of power per megahertz (~10 μW/MHz) using the ARM/DSP combination with voltage scaling

The new TENSAI chip is their 3rd generation with multiple ARM M3 cores and NXP digital signal processors. These cores are implemented in a sub-threshold, asynchronous mode process that allows them to operate at much lower voltage, ~0.2V and at varying clock frequencies. This is called voltage scaling electronics. Doing this took analog design, which ETA considers part of their special IP.

ETA claims the TENSAI chip can operate in listen mode (“Ok Google?”) and only consume 50 microwatts of power and once a key word is discovered, operate in full computational mode with 500 microwatts of power.

I couldn’t find a data sheet for the TENSAI chip, so was unable to see how many TOPS/W it could perform as discussed in prior posts (see: AI processing at the edge post). But it looks to be even more power efficient.

The TENSAI chip was announced at ARM Tech Con(terence) and won the Design Innovation of the Year award at the conference.

Comments?

Photo Credits: Photo and caption from Figure 12 in  A brain inspired architecture…,  AIP Journal of Applied Physics article

Photo from EE News ETA adds spiking NN support to MCU article

Chart from Embedded Computing  Micropower intelligence for edge devices article

Board photo from IEEE Spectrum ETA Compute debuts spiking NN chip for edge AI article

UK Biobank & the data economy – part 2

A couple of weeks back I wrote a post about repositories for all the data that users generate these days and what to do with it.  (See our post on Data banks, deposits, … data economy – part 1).

This past week I read an article (see ScienceDaily Genetics of brain structure … article) which partially exemplifies what that post talked about. The research used publicly available genetic information to tease out brain structure hereditary characteristics.  The Science Daily article was a summary of research done at the University of Oxford using information provided from the UK Biobank.

Biobank as a data bank

The Biobank has recruited 500K participants from the UK,  aged 40-69,  between 2006-2010, to share their anonymized health data with researchers and scientist around the world. The Biobank is set up as a Scottish charity, funded by various health organizations in UK both gov’t and private. 

In addition to information collected during the baseline assessment: 

  • 100K participants have worn a 24 hour health monitoring device for a week and 20K have signed up to repeat this activity.
  • 500K participants are providing have been genotyped (DNA sequencing to determine hereditary genes)
  • 100K participants will be medically scanned (brain, heart, abdomen, bones, carotid artery) with images stored in the Biobank
  • 100K participants have signed up to receive questionnaires asking  about diet, exercise, work history, digestive health and other medical indicators..

There’s more. Biobank is linking to electronic health records (EHR) of participants to track their health over time. The Biobank is also starting to provide blood analysis and other detailed medical measures of subjects in the study.

UK Biobank (data bank) information uses

“UK Biobank is an open access resource. The Resource is open to bona fide scientists, undertaking health-related research that is in the public good. Approved scientists from the UK and overseas and from academia, government, charity and commercial companies can use the Resource. ….” (from UK Biobank scientists page).

Somewhat like open source code, the Biobank resource is made available to anyone (academia as well as industry), that can make valid use of its data BUT any research derived from its data must be published and made freely available to the Biobank and the world.

Biobank’s papers page documents some of the research that has already been published using their data. It lists the paper on genetics of brain study mentioned above and dozens more.

Differences from Data Banks

In the original data bank post:

  1. We thought data was only needed by  AI/deep learning. That seems naive now. The Biobank shows that AI/deep learning is not the only application/research that needs data.
  2. We thought data would be collected by only by hyper-scalars and other big web firms during normal user web activity. But their data is not the only data that matters.
  3. We thought data would be gathered for free. Good data can take many forms, and some may cost money.
  4. We thought profits from selling data would be split between the bank and users and could fund data bank operations. But in the Biobank, funding came from charitable contributions and data is available for free (to valid researchers).

Data banks can be an invaluable resource and may take many forms. Data that’s difficult to find can be gathered by charities and others that use funding to create, operate and gather the specific information needed for targeted research.

Comments?

Photo Credit(s): Bank on it by Alan Levine

Latest MRI – two screws in the kneecap by Becky Stern

Other graphics from the Genetics of brain structure… paper

 

 

IBM using PCM to implement better AI – round 6

Saw a recent article that discussed IBM’s research into new computing architectures that are inspired by brain computational techniques (see A new brain inspired architecture … ). The article reports on research done by IBM R&D into using Phase Change Memory (PCM) technology to implement various versions of computer architectures for AI (see Tutorial: Brain inspired computation using PCM, in the AIP Journal of Applied Physics).

As you may recall, we have been reporting on IBM Research into different computing architectures to support AI processing for quite awhile now, (see: Parts 1, 2, 3, 4, & 5). In our last post, More power efficient deep learning through IBM and PCM, we reported on a unique hybrid PCM-silicon solution to deep learning computation.

Readers should also be familiar with PCM as well as it’s been discussed at length in a number of our posts (see The end of NAND is near, maybe; The future of data storage is MRAM; and New chip architectures with CPU, storage & sensors …). MRAM, ReRAM and current 3D XPoint seem to be all different forms of PCM (I think).

In the current research, IBM discusses three different approaches to support AI  utilizing PCM devices. All three approaches stem from the physical characteristics of PCM.

(Some) PCM physics

FIG. 2. (a) Phase-change memory is based on the rapid and reversible phase transition of certain types of materials between crystalline and amorphous phases by the application of suitable electrical pulses. (b) Transmission electron micrograph of a mushroom-type PCM device in a RESET state. It can be seen that the bottom electrode is blocked by the amorphous phase.

It turns out that PCM devices have many  characteristics that lend themselves to be useful for specialized computation. PCM devices crystalize and melt in order to change state. The properties associated with melting and crystallization of the PCM media cell can be used to support unique forms of computation. Some of these PCM characteristics include::

  • Analog, not digital memory – PCM devices are, at the core, an analog memory device. We mean that they don’t record just a 0 or 1 (actually resistant or conductive) state, but rather a continuum of values between those two.
  • PCM devices have an accumulation capability –   each PCM cell actually  accumulates a level of activation. This means that one cell can be more or less likely to change state depending on prior activity.
  • PCM devices are noisy – PCM cells arenot perfect recorders of state chang signals  but rather have a well known, random noise which impacts the state level attained, that can be used to introduce randomness into processing.

The other major advantage of PCM devices is that they take a lot less power than a GPU-CPU to work.

Three ways to use PCM for AI learning

FIG. 4. “In-memory computing,” computation is performed in place by exploiting the physical attributes of memory devices organized as a “computational memory” unit. For example, if data A is stored in a computational memory unit and if we would like to perform f(A), then it is not required to bring A to the processing unit. This saves energy and time that would have to be spent in the case of conventional computing system and memory unit. Adapted from Ref. 19.

The Applied Physics article describes three ways to use PCM devices in AI learning. These three include:

  1. Computational storage – which uses the analog capabilities of PCM to perform  arithmetic and learning computations. In a sort of combined compute and storage device.
  2. AI co-processor – which uses PCM devices, in an “all PCM nodes connected to all other PCM nodes” operation that could be used to perform neural network learning. In an AI co-processor there would be multiple all connected PCM modules, each emulating a neural network layer.
  3. Spiking neural networks –  which uses PCM activation accumulation characteristics & inherent randomness to mimic, biological spiking neuron activation.
FIG. 11.
A proposed chip architecture for a co-processor for deep learning based on PCM arrays.28

It’s the last approach that intrigues me.

Spiking neural nets (SNN)

FIG. 12. (a) Schematic illustration of a synaptic connection and the corresponding pre- and post-synaptic neurons. The synaptic connection strengthens or weakens based on the spike activity of these neurons; a process referred to as synaptic plasticity. (b) A well-known plasticity mechanism is spike-time-dependent plasticity (STDP), leading to weight changes that depend on the relative timing between the pre- and post-synaptic neuronal spike activities. Adapted from Ref. 31.

Biological neurons accumulate charge from all input (connected) neurons and when they reach some input threshold, generate an output signal or spike. This spike is then used to start the process with another neuron up stream from it

Biological neurons also exhibit randomness in their threshold-spiking process.

Emulating spiking neurons, n today’s neural nets, takes computation.  Also randomness takes more.

But with PCM SNN, both the spiking process and its randomness, comes from device physics. Using PCM to create SNN seems a logical progression.

PCM as storage, as memory, as compute or all the above

In the storage business, we look at Optane (see our 3D Xpoint post) SSDs as blazingly fast storage. Intel has also announced that they will use 3D Xpoint in a memory form factor which should provide sadly slower, but larger memory devices.

But using PCM for compute, is a radical departure from the von Neumann computer architectures we know and love today. HPE has been discussing another new computing architecture with their memristor technology, but only in prototype form.

It seems IBM, is also prototyping hardware done this path.

Welcome to the next computing revolution.

Photo & Caption Credit(s): Photo and caption from Figure 2 in AIP Journal of Applied Physics article

Photo and caption from Figure 4 in AIP Journal of Applied Physics article

Photo and caption from Figure 11 in AIP Journal of Applied Physics article

Photo and caption from Figure 12 in AIP Journal of Applied Physics article

 

 

Screaming IOP performance with StarWind’s new NVMeoF software & Optane SSDs

Was at SFD17 last week in San Jose and we heard from StarWind SAN (@starwindsan) and their latest NVMeoF storage system that they have been working on. Videos of their presentation are available here. Starwind is this amazing company from the Ukraine that have been developing software defined storage.

They have developed their own NVMe SPDK for Windows Server. Intel doesn’t currently offer SPDK for Windows today, so they developed their own. They also developed their own initiator (CentOS Linux) for NVMeoF. The target system was a multicore server running Windows Server with a single Optane SSD that they used to test their software.

Extreme IOP performance consumes cores

During their development activity they tested various configurations. At the start of their development they used a Windows Server with their NVMeoF target device driver. With this configuration and on a bare metal server, they found that they could max out the Optane SSD at 550K 4K random write IOPs at 0.6msec to a single Optane drive.

When they moved this code directly to run under a Hyper-V environment, they were able to come close to this performance at 518K 4K write IOPS at 0.6msec. However, this level of IO activity pegged 100% of 8 cores on their 40 core server.

More IOPs/core performance in user mode

Next they decided to optimize their driver code and move as much as possible into user space and out of kernel space, They continued to use Hyper-V. With this level off code, they were able to achieve the same performance as bare metal or ~551K 4K random write IOP performance at the 0.6msec RT and 2.26 GB/sec level. However, they were now able to perform only pegging 2 cores. They expect to release this initiator and target software in mid October 2018!

They converted this functionality to run under ESX/VMware and were able to see much the same 2 cores pegged, ~551K 4K random write IOPS at 0.6msec RT and 2.26 GB/sec. They will have the ESXi version of their target driver code available sometime later this year.

Their initiator was running CentOS on another server. When they decided to test how far they could push their initiator, they were able to drive 4 Optane SSDs at up to ~1.9M 4K random write IOP performance.

At SFD17, I asked what they could have done at 100 usec RT and Max said about 450K IOPs. This is still surprisingly good performance. With 4 Optane SSDs and consuming ~8 cores, you could achieve 1.8M IOPS and ~7.4GB/sec. Doubling the Optane SSDs one could achieve ~3.6M IOPS, with sufficient initiators and target cores with ~14.8GB/sec.

Optane based super computer?

ORNL Summit super computer, the current number one supercomputer in the world, has a sustained throughput of 2.5 TB/sec over 18.7K server nodes. You could do much the same with 337 CentOS initiator nodes, 337 Windows server nodes and ~1350 Optane SSDs.

This would assumes that Starwind’s initiator and target NVMeoF systems can scale but they’ve already shown they can do 1.8M IOPS across 4 Optane SSDs on a single initiator server. Aand I assume a single target server with 4 Optane SSDs and at least 8 cores to service the IO. Multiplying this by 4 or 400 shouldn’t be much of a concern except for the increasing networking bandwidth.

Of course, with Starwind’s Virtual SAN, there’s no data management, no data protection and probably very little in the way of logical volume management. And the ORNL Summit supercomputer is accessing data as files in a massive file system. The StarWind Virtual SAN is a block device.

But if I wanted to rule the supercomputing world, in a somewhat smallish data center, I might be tempted to put together 400 of StarWind NVMeoF target storage nodes with 4 Optane SSDs each. And convert their initiator code to work on IBM Spectrum Scale nodes and let her rip.

Comments?

New website monetization approaches

Historically, websites have made money by selling wares, services or advertising. In the last two weeks it seems like two new business models are starting to emerge. One more publicly supported and the other less publicly supported.

Europe’s new copyright law

According to an article I read recently (This newly approved European copyright law might break the Internet), Article 11 of Europe’s new Copyright Directive (not quite law yet) will require search engines, news aggregators and other users of Internet content to pay a “link tax” to copyright holders of anything they link to. As a long time blogger, podcaster and content provider, I find this new copyright policy very intriguing.

The article proposes that this will bankrupt small publishers as larger ones will charge less for the traffic. But presently, I get nothing for links to my content. And, I’d be delighted to get any amount – in fact I’d match any large publishers link tax amount that the market demands.

But my main concern is the impact this might have on site traffic. If aggregators pay a link tax, why would they want to use content that charges any tax. Yes at some point aggregators need content. But there are many websites full of content, certainly there would be some willing to forego tax fees for more traffic.

I also happen to be a copyright user. Most of my blog posts are from articles I read on the web. I usually link to an article in the 1st one or two paragraphs (see above and below) of a post and may refer (and link) to more that go deeper into a subject. Will I have to pay a link tax to the content owner?

How much of a link tax is anyone’s guess. I’m not sure it would amount to much. But a link tax, if done judiciously might even raise the quality of the content on the web.

Browser’s of the world, lay down your blockchains

The second article was a recent research paper (Digging into browser based crypto mining). Researchers at RWTH Aachen University had developed a new method to associate mined blocks to mining pools as a way to unearth browser-based mined crypto coins. With this technique they estimated that 1.8% of all Monero coins were mined by CoinHive using participant browsers to mine the coin or ~$250K/month from browser mining.

I see this as steeling compute power. But with that much coin being generated, it might be a reasonable way for an honest website to make some cash from people browsing their web pages. The browsing party would need to be informed of the mining operation in the page’s information, sort of like “we use cookies” today.

Just think, someone creates a WP plugin to do ETH mining and when activated, a WP website pops up a message that says “We mine coins while you browse – OK?”.

In another twist perhaps the websites could share the ETH mined on their browser with the person doing the browsing, similar to airline/hotel travel awards. Today most travel is done on corporate dime, but awards go to the person doing the traveling. Similarly, employees could browse using corporate computers but they would keep a portion of the ETH that’s mined while they browse away… Sounds like a deal.

Other monetization approaches

We’ve tried Google AdSense and other advertising but it only generated pennies a month. So, it wasn’t worth it.

We also sell research and occasionally someone buys some (see SCI Research Shop). And I do sell services but not through my website.

~~~

Not sure a link tax will fly. It would be a race to the bottom and anyone that charged a tax would suffer from less links until they decided to charge a $0 link tax.

Maybe if every link had a tax associated with it, whether the site owner wanted it or not there could be a level playing field. Recording, paying/receiving and accounting for all these link tax micro payments would be another nightmare altogether.

But a WP plugin, that announces and mines crypto coins with a user’s approval and splits the profit with them might work. Corporate wouldn’t like it but employees would just be browsing websites, where’s the harm in that.

Browse a website and share the mined crypto coin with site owner. Sounds fine to me.

Photo Credit(s): Strasburg – European Parliament|Giorgio Barlocco

Crypto News Daily – Telegram cancels ICO…

Photo of Bitcoin, Etherium and Litecoin|QuoteInspector