Google releases new Cloud TPU & Machine Learning supercomputer in the cloud

Last year about this time Google released their 1st generation TPU chip to the world (see my TPU and HW vs. SW … post for more info).

This year they are releasing a new version of their hardware called the Cloud TPU chip and making it available in a cluster on their Google Cloud.  Cloud TPU is in Alpha testing now. As I understand it, access to the Cloud TPU will eventually be free to researchers who promise to freely publish their research and at a price for everyone else.

What’s different between TPU v1 and Cloud TPU v2

The differences between version 1 and 2 mostly seem to be tied to training Machine Learning Models.

TPU v1 didn’t have any real ability to train machine learning (ML) models. It was a relatively dumb (8 bit ALU) chip but if you had say a ML model already created to do something like understand speech, you could load that model into the TPU v1 board and have it be executed very fast. The TPU v1 chip board was also placed on a separate PCIe board (I think), connected to normal x86 CPUs  as sort of a CPU accelerator. The advantage of TPU v1 over GPUs or normal X86 CPUs was mostly in power consumption and speed of ML model execution.

Cloud TPU v2 looks to be a standalone multi-processor device, that’s connected to others via what looks like Ethernet connections. One thing that Google seems to be highlighting is the Cloud TPU’s floating point performance. A Cloud TPU device (board) is capable of 180 TeraFlops (trillion or 10^12 floating point operations per second). A 64 Cloud TPU device pod can theoretically execute 11.5 PetaFlops (10^15 FLops).

TPU v1 had no floating point capabilities whatsoever. So Cloud TPU is intended to speed up the training part of ML models which requires extensive floating point calculations. Presumably, they have also improved the ML model execution processing in Cloud TPU vs. TPU V1 as well. More information on their Cloud TPU chips is available here.

So how do you code a TPU?

Both TPU v1 and Cloud TPU are programmed by Google’s open source TensorFlow. TensorFlow is a set of software libraries to facilitate numerical computation via data flow graph programming.

Apparently with data flow programming you have many nodes and many more connections between them. When a connection is fired between nodes it transfers a multi-dimensional matrix (tensor) to the node. I guess the node takes this multidimensional array does some (floating point) calculations on this data and then determines which of its outgoing connections to fire and how to alter the tensor to send to across those connections.

Apparently, TensorFlow works with X86 servers, GPU chips, TPU v1 or Cloud TPU. Google TensorFlow 1.2.0 is now available. Google says that TensorFlow is in use in over 6000 open source projects. TensorFlow uses Python and 1.2.0 runs on Linux, Mac, & Windows. More information on TensorFlow can be found here.

So where can I get some Cloud TPUs

Google is releasing their new Cloud TPU in the TensorFlow Research Cloud (TFRC). The TFRC has 1000 Cloud TPU devices connected together which can be used by any organization to train machine learning algorithms and execute machine learning algorithms.

I signed up (here) to be an alpha tester. During the signup process the site asked me: what hardware (GPUs, CPUs) and platforms I was currently using to training my ML models; how long does my ML model take to train; how large a training (data) set do I use (ranging from 10GB to >1PB) as well as other ML model oriented questions. I guess there trying to understand what the market requirements are outside of Google’s own use.

Google’s been using more ML and other AI technologies in many of their products and this will no doubt accelerate with the introduction of the Cloud TPU. Making it available to others is an interesting play but this would be one way to amortize the cost of creating the chip. Another way would be to sell the Cloud TPU directly to businesses, government agencies, non government agencies, etc.

I have no real idea what I am going to do with alpha access to the TFRC but I was thinking maybe I could feed it all my blog posts and train a ML model to start writing blog post for me. If anyone has any other ideas, please let me know.

Comments?

Photo credit(s): From Google’s website on the new Cloud TPU

 

AI’s Image recognition success feeds sound recognition improvements

I must do reCAPTCHA at least a dozen times a week for various websites I use. It’s become a real pain. And the fact that I know that what I am doing is helping some AI image recognition program do a better job of identifying street signs, mountains, or shop fronts doesn’t reduce my angst.

But that’s the thing with deep learning, machine learning, re-inforcement learning, etc. they all need massive amounts of annotated data that’s a correct interpretation of a scene in order to train properly.

Computers to the rescue

So, when I read a recent article in MIT News that Computers learn to recognize sounds by watching video, I was intrigued. What the researchers at MIT have done is use advanced image recognition to annotate film clips with the names of things that are making sounds on the film. They then fed this automatically annotated data into a sound identifying algorithm to improve its recognition capability.

They used this approach to train their sound recognition system to be  able to identify natural and artificial sounds like bird song, speaking in crowds, traffic sounds, etc.

They tested their newly automatically trained sound recognition against standard labeled sound sets and was able to categorize sound with a 92% accuracy for a 10 category data set and with a 74% accuracy with a 50 category dataset. Humans are able categorize these sounds with a 96% and 81% accuracy, respectively.

AI’s need for annotation

The problem with machine learning is that it needs a massive, properly annotated data set in order to learn properly. But getting annotated data takes too long or is too expensive to do for many things that we want AI for.

Using one AI tool to annotate data to train another AI tool is sort of bootstrapping AI technology. It’s acute trick but may have only limited application. I could only think of only a few more applications of similar technology:

  • Use chest strap or EKG technology to annotate audio clips of heart beat sounds at a wrist or other appendage to train a system to accurately determine pulse rates through sound alone.
  • Use wave monitoring technology to annotate pictures and audio clips of sea waves to train a system to accurately determine wave levels for better tsunami detection.
  • Use image recognition to annotate pictures of food and then use this train a system to recognize food smells (if they ever find a way to record smells).

But there may be many others. Just further refinement of what they have used could lead to finer grained people detection. For example, as (facial) image recognition gets better, it’s possible to annotate speaking film clips to train a sound recognition system to identify people from just hearing their speech. Intelligence applications for such technology are significant.

Nonetheless, I for one am happy that the next reCAPTCHA won’t be having me identify river sounds in a matrix of 9 sound clips.

But I fear there’s enough GreyBeards on Storage podcast recordings and Storage Field Day video clips already available to train a system to identify Ray’s and for sure, Howard’s voice anywhere on the planet…

Comments?

Photo Credit(s): Wave by Matthew Potter; Waves crashing on Puget Sound by mikeskatieDay 16: Podcasting by Laura Blankenship

The fragility of public cloud IT

I have been reading AntiFragile again (by Nassim Taleb). And although he would probably disagree with my use of his concepts, it appears to me that IT is becoming more fragile, not less.

For example, recent outages at major public cloud providers display increased fragility for IT. Yet these problems, although almost national in scope, seldom deter individual organizations from their migration to the cloud.

Tragedy of the cloud commons

The issues are somewhat similar to the tragedy of the commons. When more and more entities use a common pool of resources, occasionally that common pool can become degraded. But because no-one really owns the common resources no one has any incentive to improve the situation.

Now the public cloud, although certainly a common pool of resources, is also most assuredly owned by corporations. So it’s not a true tragedy of the commons problem. Public cloud corporations have a real incentive to improve their services.

However, the fragility of IT in general, the web, and other electronic/data services all increases as they become more and more reliant on public cloud, common infrastructure. And I would propose this general IT fragility is really not owned by any one person, corporation or organization, let alone the public cloud providers.

Pre-cloud was less fragile, post-cloud more so

In the old days of last century, pre-cloud, if a human screwed up a CLI command the worst they could happen was to take out a corporation’s data services. Nowadays, post-cloud, if a similar human screws up a CLI command, the worst that can happen is that major portions of the internet services of a nation go down.

Strange Clouds by michaelroper (cc) (from Flickr)

Yes, over time, public cloud services have become better at not causing outages, but they aren’t going away. And if anything, better public cloud services just encourages more corporations to use them for more data services, causing any subsequent cloud outage to be more impactful, not less

The Internet was originally designed by DARPA to be more resilient to failures, outages and nuclear attack. But by centralizing IT infrastructure onto public cloud common infrastructure, we are reversing the web’s inherent fault tolerance and causing IT to be more susceptible to failures.

What can be done?

There are certainly things that can be done to improve the situation and make IT less fragile in the short and long run:

  1. Use the cloud for non-essential or temporary data services, that don’t hurt a corporation, organization or nation when outages occur.
  2. Build in fault-tolerance, automatic switchover for public cloud data services to other regions/clouds.
  3. Physically partition public cloud infrastructure into more regions and physically separate infrastructure segments within regions, such that any one admin has limited control over an amount of public cloud infrastructure.
  4. Divide an organizations or nations data services across public cloud infrastructures, across as many regions and segments as possible.
  5. Create a National Public IT Safety Board, not unlike the one for transportation, that does a formal post-mortem of every public cloud outage, proposes fixes, and enforces fix compliance.

The National Public IT Safety Board

The National Transportation Safety Board (NTSB) has worked well for air transportation. It relies on the cooperation of multiple equipment vendors, airlines, countries and other parties. It performs formal post mortems on any air transportation failure. It also enforces any fixes in processes, procedures, training and any other activities on equipment vendors, maintenance services, pilots, airlines and other entities that can impact public air transport safety. At the moment, air transport is probably the safest form of transportation available, and much of this is due to the NTSB

We need something similar for public (cloud) IT services. Yes most public cloud companies are doing this sort of work themselves in isolation, but we have a pressing need to accelerate this process across cloud vendors to improve public IT reliability even faster.

The public cloud is here to stay and if anything will become more encompassing, running more and more of the worlds IT. And as IoT, AI and automation becomes more pervasive, data processes that support these services, which will, no doubt run in the cloud, can impact public safety. Just think of what would happen in the future if an outage occurred in a major cloud provider running the backend for self-guided car algorithms during rush hour.

If the public cloud is to remain (at this point almost inevitable) then the safety and continuous functioning of this infrastructure becomes a public concern. As such, having a National Public IT Safety Board seems like the only way to have some entity own IT’s increased fragility due to  public cloud infrastructure consolidation.

~~~~

In the meantime, as corporations, government and other entities contemplate migrating data services to the cloud, they should consider the broader impact they are having on the reliability of public IT. When public cloud outages occur, all organizations suffer from the reduced public perception of IT service reliability.

Photo Credits: Fragile by Bart Everson; Fragile Planet by Dave Ginsberg; Strange Clouds by Michael Roper

Mixed progress on self-driving cars

Read an article the other day on the progress in self-driving cars in NewsAtlas (DMV reports self-driving cars are learning — fast). More details are available from their source (CA [California] DMV [Dept. of Motor Vehicles] report).

The article reported on what’s called disengagement events that occurred on CA roads. This is where a driver has to take over from the self-driving automation to deal with a potential mis-queue, mistake, or accident.

Waymo (Google) way out ahead

It appears as if Waymo, Google’s self-driving car spin out, is way ahead of the pack. It reported only 124 disengages for 636K mi (~1M km) or ~1 disengage every ~5.1K mi (~8K km). This is ~4.3X better rate than last year, 1 disengage for every ~1.2K mi (1.9K km).

Competition far behind

Below I list some comparative statistics (from the DMV/CA report, noted above), sorted from best to worst:

  • BMW: 1 disengage 638 mi (1027 km)
  • Ford: 3 disengages for 590 mi (~950 km) or 1 disengage every ~197 mi (~317 km);
  • Nissan: 23 disengages for 3.3K mi (3.5K km) or 1 disengage every ~151 mi (~243 km)
  • Cruise (GM) automation: had 181 disengagements for ~9.8K mi (~15.8K km) or 1 disengage every ~54 mi (~87 km)
  • Delphi: 149 disengages for ~3.1K mi (~5.0K km) or 1 disengage every ~21 mi (~34 km);

There was no information on previous years activities so no data on how competitors had improved over the last year.

Please note: the report only applies to travel on California (CA) roads. Other competitors are operating in other countries and other states (AZ, PA, & TX to name just a few). However, these rankings may hold up fairly well when combined with other state/country data. Thousand(s) of kilometers should be adequate to assess self-driving cars disengagement rates.

Waymo moving up the (supply chain) stack

In addition, according to a Recode, (The Google car was supposed to disrupt the car industry) article, Waymo is moving from a (self-driving automation) software supplier to a hardware and software supplier to the car industry.

Apparently, Google has figured out how to reduce their sensor (hardware) costs by a factor of 10X, bringing the sensor package down from $75K to $7.5K, (most probably due to a cheaper way to produce Lidar sensors – my guess).

So now Waymo is doing about ~65 to ~1000 X more (CA road) miles than any competitor, has a much (~8 to ~243 X) better disengage rate and is  moving to become a major auto supplier in both hardware and software.

It’s going to be an interesting century.

If the 20th century was defined by the emergence of the automobile, the 21st will probably be defined by dominance of autonomous operations.

Comments?

Photo credits: Substance E′TS; and Waymo on the road

 

Hitachi and the coming IoT gold rush

img_7137Earlier this week I attended Hitachi Summit 2016 along with a number of other analysts and Hitachi executives where Hitachi discussed their current and ongoing focus on the IoT (Internet of Things) business.

We have discussed IoT before (see QoM1608: The coming IoT tsunami or not, Extremely low power transistors … new IoT applications). Analysts and companies predict  ~200B IoT devices by 2020 (my QoM prediction is 72.1B 0.7 probability). But in any case there’s a lot of IoT activity going to come online, very shortly. Hitachi is already active in IoT and if anything, wants it to grow, significantly.

Hitachi’s current IoT business

Hitachi is uniquely positioned to take on the IoT business over the coming decades, having a number of current businesses in industrial processes, transportation, energy production, water management, etc. Over time, all these industries and more are becoming much more data driven and smarter as IoT rolls out.

Some metrics indicating the scale of Hitachi’s current IoT business, include:

  • Hitachi is #79 in the Fortune Global 500;
  • Hitachi’s generated $5.4B (FY15) in IoT revenue;
  • Hitachi IoT R&D investment is $2.3B (over 3 years);
  • Hitachi has 15K customers Worldwide and 1400+ partners; and
  • Hitachi spends ~$3B in R&D annually and has 119K patents

img_7142Hitachi has been in the OT (Operational [industrial] Technology) business for over a century now. Hitachi has also had a very successful and ongoing IT business (Hitachi Data Systems) for decades now.  Their main competitors in this IoT business are GE and Siemans but neither have the extensive history in IT that Hitachi has had. But both are working hard to catchup.

Hitachi Rail-as-a-Service

img_7152For one example of what Hitachi is doing in IoT, they have recently won a 27.5 year Rail-as-a-Service contract to upgrade, ticket, maintain and manage all new trains for UK Rail.  This entails upgrading all train rolling stock, provide upgraded rail signaling, traffic management systems, depot and station equipment and ticketing services for all of UK Rail.

img_7153The success and profitability of this Hitachi service offering hinges on their ability to provide more cost efficient rail transport. A key capability they plan to deliver is predictive maintenance.

Today, in UK and most other major rail systems, train high availability is often supplied by using spare rolling stock, that’s pre-positioned and available to call into service, when needed. With Hitachi’s new predictive maintenance capabilities, the plan is to reduce, if not totally eliminate the need for spare rolling stock inventory and keep the new trains running 7X24.

img_7145Hitachi said their new trains capture 48K data items and generate over ~25GB/train/day. All this data, will be fed into their new Hitachi Insight Group Lumada platform which includes Pentaho, HSDP (Hitachi Streaming Data Platform) and their Content Analytics to analyze train data and determine how best to keep the trains running. Behind all this analytical power will no doubt be HDS HCP object store used to keep track of all the train sensor data and other information, Hitachi UCP servers to process it all, and other Hitachi software and hardware to glue it all together.

The new trains and services will be rolled out over time, but there’s a pretty impressive time table. For instance, Hitachi will add 120 new high speed trains to UK Rail by 2018.  About the only thing that Hitachi is not directly responsible for in this Rail-as-a-Service offering, is the communications network for the trains.

Hitachi other IoT offerings

Hitachi is actively seeking other customers for their Rail-as-a-service IoT service offering. But it doesn’t stop there, they would like to offer smart-water-as-a-service, smart-city-as-a-service, digital-energy-as-a-service, etc.

There’s almost nothing that Hitachi currently supplies as industrial products that they wouldn’t consider offering in an X-as-a-service solution. With HDS Lumada Analytics, HCP and HDS storage systems, Hitachi UCP converged infrastructure, Hitachi industrial products, and Hitachi consulting services, together they are primed to take over the IoT-industrial products/services market.

Welcome to the new Hitachi IoT world.

Comments?

TPU and hardware vs. software innovation (round 3)

tpu-2At Google IO conference this week, they revealed (see Google supercharges machine learning tasks …) that they had been designing and operating their own processor chips in order to optimize machine learning.

They called the new chip, a Tensor Processing Unit (TPU). According to Google, the TPU provides an order of magnitude more power efficient machine learning over what’s achievable via off the shelf GPU/CPUs. TensorFlow is Google’s open sourced machine learning  software.

This is very interesting, as Google and the rest of the hype-scale hive seem to have latched onto open sourced software and commodity hardware for all their innovation. This has led the industry to believe that hardware customization/innovation is dead and the only thing anyone needs is software developers. I believe this is incorrect and that hardware innovation combined with software innovation is a better way, (see Commodity hardware always loses and Better storage through hardware posts).
Continue reading “TPU and hardware vs. software innovation (round 3)”

At Scale conference keynote, Facebook video experience re-engineered

11990439_1644273839179047_2244380699715442158_nThe At Scale conference happened this past week in LA. Jay Parikh, Global Head of Engineering and Infrastructure at Facebook, kicked off the conference by talking about how Facebook is attempting to conquer some of it’s intrinsic problems, as it scales up from over 1B users today. I was unable to attend the conference but watched a video of the keynote (on Facebook of course).

The At Scale community is a group of large, hyper-scale, web companies such as Google, Microsoft, Twitter, and of course Facebook, among a gaggle of others that all have problems trying to scale up their infrastructure to handle more and more users activities. They had 1800 people registered for the At Scale 2015 conference on Monday, double last years count. The At Scale community are trying to push the innovation level of the industry faster, through a community of companies that need to work at hyper-scale.

Facebook’s video problem

At Facebook the current hot problem that’s impacting customer satisfaction seems to be video uploads and playback (downloads). The issues with Facebook’s video experience are multifaceted and range from the time it takes to successfully upload a video, to the bandwidth it takes to playback a video to the -system requirements to support live streaming video to 100,000s of users.

Facebook started as a text only service, migrated to a photo oriented service, but now is quickly moving to a video oriented user experience. But it doesn’t stop there they can see on the horizon that augmented and virtual reality will become a significant driver of activity for Facebook uses?!

Daily video 1B last year now at 4B video views/day. They also launched a new service lately, LiveMentions, which was a live streaming service for celebrities (real time video streams). Several celebrities were live streaming to 150K of their subscribers. So video has become and will continue as the main consumer of bandwidth at Facebook.

Struggling to enhance the Facebook user’s video experience over the past year, they have come up with three key engineering principles that have helped them: Planning, Iteration and Performance.

Planning

Facebook is already operating a terabit scale network, so doing something to its network wrong is going to cause major problems, around the world. As a result, Facebook engineering focused early on, into incorporating lots of instrumentation in their network and infrastructure services. This has allowed them to constantly monitor the activity of their users across their infrastructure to identify problems and solutions.

One metric Parikh talked about was “playback success rate”, this is the percentage where the video starts to play in under 1 second for a facebook user.  One chart he showed, was a playback success rate colored ove a world map  but aggregated (averaged) at the country level. But with their instrumentation Facebook was able to drill down to regions within a country and  even cities within a region. This allows engineering to identify problems at almost any level of granularity they need.

One key take away to Planing, is if you have the instrumentation in place, have people to monitor and mine the data and are willing to address the problems that crop up, then you can create a more flexible, efficient and effective environment and build a better product for your users.

Iteration

Iteration is not just about feature deployment, but it’s also about the Facebook user experience. Their instrumentation had told them that they were doing ok on video uploads but it turns out that when they looked at the details, they saw that some customers were not having a satisfactory video upload experience. For instance, one Facebook engineer had to wait 82 hours to upload a video.

The Facebook world is populated with 10s of thousands of unique devices with different memory, compute and storage. They had to devise approaches that could optimize the encoding for all the different devices, some of which was done on mobile phones.

They also had to try to optimize the network stack for different devices and mobile networking technologies. Parikh had another map showing network connectivity. Surprise, most of the world is not on LTE, and a vast majority of world is on 2G and 3G cellular networks. So via iteration Facebook went about improving video upload by 1% here and 1% there, but with Facebook’s user base, these improvements impact millions of users. They used cross functional teams to address the problems they uncovered.

However, video uploads problems were not just in device and connectivity realms. Turns out they had a big cancel upload button on their screen after the start of the video upload. This was sometimes clicked by mistake and they found that almost 10% of users hit the cancel upload. So they went through and re-examined the whole user experience to try to eliminate other hindrances to successful video uploads.

Performance

The key take away from this segment of the talk was that performance has to be considered from the get go of a new service or service upgrade. It is impossible to improve performance after the fact, especially for At Scale environments.

In my CS classes, the view was make it work and then make it work fast.  What Facebook has found is that you never have the time after a product has shipped to make it fast. As soon as it works, they had to move on to the next problem.

As a result if performance is not built in from the start, not a critical requirement/feature of a system architecture and design, it never gets addressed. Also if all you focus on is making it work then the design and all the code is built around feature functionality. Changing working functionality later to improve performance is an impossible task and typically represents a re-architecture/re-design/re-implementation of the functionality.

For instance, Facebook used to do video encoding in serial on a single server. It often took a long time (10 to 30 minutes). Engineering reimplemented their video encoding to partition the video and distribute the encoding across multiple servers. Doing this, sped up encoding time considerably.

But they didn’t stop there, with such a diverse user networking environment, they felt that they could save bandwidth and better optimize user playback if could reduce playback video size. They were able to take their machine learning/AI investments that Facebook has made and apply this to distributed video encoding. They were able to analyze the video scene by scene and opportunistically reduce bandwidth load and storage size but still maintain video  playback quality. By implementing the new video encoding process they have achieved double digit reductions in bandwidth requirements for playback.

Another example of the importance of performance was the LiveMentions feature discussed above. Celebrities often record streams in places with poor networking infrastructure. So in order to insure a good streaming experience Facebook  had to implement variable bit rate video upload to adjust upload bandwidth requirements based on networking environmentr. Moreover, once a celebrity starts a live stream all the fans in the world get notified. then there’s a thundering herd (boot storms anyone) to start watching the video stream. In order to support this mass streaming, Facebook implemented stream blocking, which holds off the start of a live stream viewing until they have cached enough of the video stream at their edge servers, worldwide. This guaranteed that all the fans had a good viewing experience, once it started.

There were a couple more videos of the show sessions but I didn’t have time to review them.  But Facebook sounds like a fun place to work, especially for infrastructure performance experts.

~~~~

Comments?

Existential threats

Not sure why but lately I have been hearing a lot about existential events. These are events that threaten the existence of humanity itself.

Massive Solar Storm

A couple of days ago I read about the Carrington Event which was a massive geomagnetic solar storm in 1859. Apparently it wreaked havoc with the communications infrastructure of the time (telegraphs). Researchers have apparently been able to discover other similar events in earth’s history by analyzing ice cores from Greenland which indicate that events of this magnitude occur once every 500 years and smaller events typically occur multiple times/century.

Unclear to me what a solar storm of the magnitude of the Carrington Event would do to the world as we know it today, but we are much more dependent on electronic communications, radio, electronic power, etc. If such an event were to take out, 50% of our electro-magnetic infrastructure, such as frying power transformers, radio transceivers, magnetic storage/motors/turbines, etc. civilization as we know it would be brought back to the mid 1800’s but with a 21st century population.

This would last until we could rebuild all the lost infrastructure, at tremendous cost. During this time we would be dependent on animal-human-water power, paper-optical based communications/storage, and animal-wind transport.

It appears that any optical based communication/computer systems would remain intact but powering them would be problematic without working transformers and generators.

One article (couldn’t locate this) stated that the odds of another Carrington Event happening is 12%  by 2022. But the ice core research seems to indicate that it should be higher than this. By my reckoning, it’s been 155 years since the last event, which means we are ~1/3rd of the way through the next 500 years, so I would expect the probability of a similar event happening to be ~1/3 at this point and rising slightly every year until it happens again.

Superintelligence

I picked up a book called Superintelligence: Paths, Dangers, Strengths by Nick Bostrom last week and started reading it last night. It’s about the dangers of AI gaining the ability to improve itself and after that becoming not just equivalent to Human Level Intelligence (HMLI) but greatly exceeding HMLI at a super-HMLI level (Superintelligent). This means some Superintelligent entity that would have more intelligence than our current population of humans today, by many orders of magnitude.

Bostrom discusses the take off processes that would lead to Superintelligence and some of the ways we could hope to control it. But his belief is that trying to install any of these controls after it has reached HMLI would be fruitless.

I haven’t finished the book but what I have read so far, has certainly scared me.

Bostrom presents three scenarios for a Superintelligence take off: slow take off, fast take off and medium take off. He believes that in a slow take off scenario there may be many opportunities to control the emerging Superintelligence. In a moderate or medium take off, we would know that something is wrong but would have only some limited opportunity to control it. In the fast take off (literally 18months from HMLI to Superintelligence in one scenario Bostrom presents), the likelihood of controlling it after it starts are non-existent.

The later half of Bostrom’s book discusses potential control mechanisms and other ways to moderate the impacts of superintelligence.  So far I don’t see much hope for mankind in the controls he has proposed. But l am only half way through the book and hope to see more substantial mechanisms in the 2nd half.

In the end, any Superintelligence could substantially alter the resources of the world and the impact this would have on humanity is essentially unpredictable. But by looking at recent history, one can see how other species have faired as humanity has altered the resources of the earth. Humanity’s rise has led to massive species die offs, for any species that happened to lie in the way of human progress.

The first part of Bostrom’s book discusses some estimates as to when the world will reach AI with HMLI. Most experts believe that we will see HMLI like this with a 90% probability by the year 2075 and a 50% probability by the year 2050. As for the duration of take off to superintelligence ,the expert opinions are mixed and he believes that they highly underestimate the speed of take off.

Humanity’s risks

The search for extra-terristial intelligence has so far found nothing. One of the parameters for the odds of a successful search was the number of inhabitable planets in the universe. But the another parameter is the ability of a technological civilization to survive long enough to be noticed – the likelihood of a civilization to survive any existential risk that comes up.

Superintelligence and massive solar storms represent just two such risks but there are a multitude of others that can be identified today, and tomorrow’s technological advances will no doubt give rise to more.

Existential risks like these are ever-present and appear to be growing as our technolgical prowess grows. My only problem is that today the study of existential risks seem at best, ad hoc today and at worst, outright disregard.

I believe the best policy is to recognize known existential risks, have some intelligent debate on how probably they are and how we could potentially check them. There really needs to be some systematic study of existential risks around the world bringing academics and technologists together to understand and to mitigate them. The threats to humanity are real, we can continue to ignore them, study a few that gain human interest, or actively seek out and mitigate all of them we can.

Comments?

Photo Credit(s): C3-class Solar Flare Erupts on Sept. 8, 2010 [Detail] by NASA Goddard’s space flight center photo stream