The fragility of public cloud IT

I have been reading AntiFragile again (by Nassim Taleb). And although he would probably disagree with my use of his concepts, it appears to me that IT is becoming more fragile, not less.

For example, recent outages at major public cloud providers display increased fragility for IT. Yet these problems, although almost national in scope, seldom deter individual organizations from their migration to the cloud.

Tragedy of the cloud commons

The issues are somewhat similar to the tragedy of the commons. When more and more entities use a common pool of resources, occasionally that common pool can become degraded. But because no-one really owns the common resources no one has any incentive to improve the situation.

Now the public cloud, although certainly a common pool of resources, is also most assuredly owned by corporations. So it’s not a true tragedy of the commons problem. Public cloud corporations have a real incentive to improve their services.

However, the fragility of IT in general, the web, and other electronic/data services all increases as they become more and more reliant on public cloud, common infrastructure. And I would propose this general IT fragility is really not owned by any one person, corporation or organization, let alone the public cloud providers.

Pre-cloud was less fragile, post-cloud more so

In the old days of last century, pre-cloud, if a human screwed up a CLI command the worst they could happen was to take out a corporation’s data services. Nowadays, post-cloud, if a similar human screws up a CLI command, the worst that can happen is that major portions of the internet services of a nation go down.

Strange Clouds by michaelroper (cc) (from Flickr)

Yes, over time, public cloud services have become better at not causing outages, but they aren’t going away. And if anything, better public cloud services just encourages more corporations to use them for more data services, causing any subsequent cloud outage to be more impactful, not less

The Internet was originally designed by DARPA to be more resilient to failures, outages and nuclear attack. But by centralizing IT infrastructure onto public cloud common infrastructure, we are reversing the web’s inherent fault tolerance and causing IT to be more susceptible to failures.

What can be done?

There are certainly things that can be done to improve the situation and make IT less fragile in the short and long run:

  1. Use the cloud for non-essential or temporary data services, that don’t hurt a corporation, organization or nation when outages occur.
  2. Build in fault-tolerance, automatic switchover for public cloud data services to other regions/clouds.
  3. Physically partition public cloud infrastructure into more regions and physically separate infrastructure segments within regions, such that any one admin has limited control over an amount of public cloud infrastructure.
  4. Divide an organizations or nations data services across public cloud infrastructures, across as many regions and segments as possible.
  5. Create a National Public IT Safety Board, not unlike the one for transportation, that does a formal post-mortem of every public cloud outage, proposes fixes, and enforces fix compliance.

The National Public IT Safety Board

The National Transportation Safety Board (NTSB) has worked well for air transportation. It relies on the cooperation of multiple equipment vendors, airlines, countries and other parties. It performs formal post mortems on any air transportation failure. It also enforces any fixes in processes, procedures, training and any other activities on equipment vendors, maintenance services, pilots, airlines and other entities that can impact public air transport safety. At the moment, air transport is probably the safest form of transportation available, and much of this is due to the NTSB

We need something similar for public (cloud) IT services. Yes most public cloud companies are doing this sort of work themselves in isolation, but we have a pressing need to accelerate this process across cloud vendors to improve public IT reliability even faster.

The public cloud is here to stay and if anything will become more encompassing, running more and more of the worlds IT. And as IoT, AI and automation becomes more pervasive, data processes that support these services, which will, no doubt run in the cloud, can impact public safety. Just think of what would happen in the future if an outage occurred in a major cloud provider running the backend for self-guided car algorithms during rush hour.

If the public cloud is to remain (at this point almost inevitable) then the safety and continuous functioning of this infrastructure becomes a public concern. As such, having a National Public IT Safety Board seems like the only way to have some entity own IT’s increased fragility due to  public cloud infrastructure consolidation.


In the meantime, as corporations, government and other entities contemplate migrating data services to the cloud, they should consider the broader impact they are having on the reliability of public IT. When public cloud outages occur, all organizations suffer from the reduced public perception of IT service reliability.

Photo Credits: Fragile by Bart Everson; Fragile Planet by Dave Ginsberg; Strange Clouds by Michael Roper

AWS vs. Azure security setup for Linux

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

I have been doing some testing with both Azure and Amazon Web Services (AWS) these last few weeks and have observed a significant difference in the security setups for both of these cloud services, at least when it comes to Linux compute instances and cloud storage.

First, let me state at the outset, all of my security setups for both AWS and Amazon was done through using the AWS console or the Azure (classic) portal. I believe anything that can be done with the portal/console for both AWS and Azure can also be done in the CLI or the REST interface. I only used the portal/console for these services, so can’t speak to the ease of using AWS’s or Azure’s CLI or REST services.


EC2 instance security is pretty easy to setup and use, at least for Linux users:

  • When you set up an (Linux) EC2 instance you are asked to set up a Public Key Infrastructure file (.pem) to be used for SSH/SFTP/SCP connections. You just need to copy this file to your desktop/laptop/? client system. When you invoke SSH/SFTP/SCP, you use the “-i” (identity file) option and specify the path to the (.PEM) certificate file. The server is already authorized for this identity. If you lose it, AWS services will create another one for you as an option when connecting to the machine.
  • When you configure the AWS instance, one (optional) step is to configure its security settings. And one option for this is to allow connections only from ‘my IP address’, how nice. You don’t even have to know your IP address, AWS just figures it out for itself and configures it.

That’s about it. Unclear to me how well this secures your EC2 instance but it seems pretty secure to me. As I understand it, a cyber criminal would need to know and spoof your IP address to connect to or control remotely the EC2 instance. And if they wanted to use SSH/SFTP/SCP they would either have to access to the identity file. I don’t believe I ever set up a password for the EC2 instance.

As for EBS storage, there’s no specific security associated with EBS volumes. Its security is associated with the EC2 instance it’s attached to. It’s either assigned/attached to an EC2 instance and secured there, or it’s unassigned/unattache. For unattached volumes, you may be able to snapshot it (to an S3 bucket within your administration control) or delete it (if it’s unattached, but for either of these you have to be an admin for the EC2 domain.

As for S3 bucket security, I didn’t see any S3 security setup that mimicked the EC2 instance steps outlined above. But in order to use AWS automated billing report services for S3, you have to allow the service to have write access to your S3 buckets. You do this by supplying an XML-like security policy, and applying this to all S3 buckets you wish to report on (or maybe it’s store reports in). AWS provides a link to the security policy page which just so happens to have the XML-like file you will need to do this. All I did was copy this text and insert it into a window that was opened when I said I wanted to apply a security policy to the bucket.

I did find that S3 bucket security, made me allow public access (I think, can’t really remember exactly) to the S3 bucket to be able to list and download objects from the bucket from the Internet. I didn’t like this, but it was pretty easy to turn on. I left this on. But this PM I tried to find it again (to disable it) but couldn’t seem to locate where it was.

From my perspective all the AWS security setup for EC2 instances, storage, and S3 was straightforward to use and setup, it seemed pretty secure and allowed me to get running with only minimal delay.

For Azure

First, I didn’t find the more modern, new Azure portal that useful but then I am a Mac user, and it’s probably more suitable for Windows Server admins. The classic portal was as close to the AWS console as I could find and once I discovered it, I never went back.

Setting up a Linux compute instance under Azure was pretty easy, but I would say the choices are a bit overwhelming and trying to find which Linux distro to use was a bit of a challenge. I settled on SUSE Enterprise, but may have made a mistake (EXT4 support was limited to RO – sigh). But configuring SUSE Enterprise Linux without any security was even easier than AWS.

However, Azure compute instance security was not nearly as straightforward as in AWS. In fact, I could find nothing similar to securing your compute instance to “My IP” address like I did in AWS. So, from my perspective my Azure instances are not as secure.

I wanted to be able to SSH/SFTP/SCP into my Linux compute instances on Azure just like I did on AWS. But, there was no easy setup of the identity file (.PEM) like AWS supported. So I spent some time, researching how to create a Cert file with the Mac (didn’t seem able to create a .PEM file). Then more time researching how to create a Cert file on my Linux machine. This works but you have to install OpenSSL, and then issue the proper “create” certificate file command, with the proper parameters. The cert file creation process asks you a lot of questions, one for a pass phrase, and then for a network (I think) phrase. Of course, it asks for name, company, and other identification information, and at the end of all this you have created a set of cert files on your linux machine.

But there’s a counterpart to the .pem file that needs to be on the server to authorize access. This counterpart needs to be placed in a special (.ssh/authorized) directory and I believe needs to be signed by the client needing to be authorized. But I didn’t know if the .cert, .csr, .key or .pem file needed to be placed there and I had no idea how to” sign it”. After spending about a day and a half  on all this, I decided to abandon the use of an identity file and just use a password. I believe this provides less security than an identity file.

As for BLOB storage, it was pretty easy to configure a PageBlob for use by my compute instances. It’s security seemed to be tied to the compute instance it was attached to.

As for my PageBlob containers, there’s a button on the classic portal to manage access keys to these. But it said once generated, you will need to update all VMs that access these storage containers with the new keys. Not knowing how to do that. I abandoned all security for my container storage on Azure.

So, all in all, I found Azure a much more manual security setup for Linux systems than AWS and in the end, decided to not even have the same level of security for my Linux SSH/SFTP/SCP services that I did on AWS. As for container security, I’m not sure if there’s any controls on the containers at this point. But I will do some more research to find out more.

In all fairness, this was trying to setup a Linux machine on Azure, which appears  more tailored for Windows Server environments. Had I been in an Active Directory group, I am sure much of this would have been much easier. And had I been configuring Windows compute instances instead of Linux, all of this would have also been much easier, I believe.


All in all, I had fun using AWS and Azure services these last few weeks, and I will be doing more over the next couple of months. So I will let you know what else I find as significant differences between AWS and Azure. So stay tuned.


Learning to live with lattices or say goodbye to security

safe 'n green by Robert S. Donovan (cc) (from flickr)
safe ‘n green by Robert S. Donovan (cc) (from flickr)

Read an article the other day in Quantum Magazine: A tricky path to quantum encryption about the problems that will occur in current public key cryptology (PKC) schemes when quantum computing emerges over the next five to 30 years.  With advances in quantum computing our current PKC scheme that depends on the difficulty of factoring large numbers will be readily crackable. At that time, all current encrypted traffic, used by banks, the NSA, the internet, etc. will no longer be secure.

NSA, NIST, & ETSI looking at the problem

So there’s a search on for quantum-resistant cryptology (see this release from ETSI [European Telecommunications Standard Institute], this presentation from NIST [{USA} National Institute of Standards &Technology], and this report from Schneier on Security on NSA’s [{USA} National Security Agency] Plans for Post-Quantum world ). There are a number of alternatives being examined by all these groups but the most promising at the moment depends on multi-dimensional (100s of dimensions) mathematical lattices.


According to Wikipedia a lattice is a 3-dimensional space of equidistant points. Apparently, for security reasons, they had to increase the number of dimensions significantly beyond 3.

A secret is somehow inscribed in a route (vector) through this 500-dimensional lattice between two points: an original  point (the public key) in the lattice and another arbitrary point, somewhere nearby in the lattice. The problem from a cryptographic sense is that finding a route, in a 500 dimensional lattice, is a difficult task when you only have one of the points.

But can it be efficient for digital computers of today to use?

So the various security groups have been working on divising efficient algorithms for multi-dimensional public key encryption over the past decade or so. But they have run into a problem.

Originally, the (public) keys for a 500-dimensional lattice PKC were on the order of MBs, so they have been restricting the lattice computations to utilize smaller keys and in effect reducing the complexity of the underlying lattice. But in the process they have now reduced the security of the lattice PKC scheme. So they are having to go back to longer keys, more complex lattices and trying to ascertain which approach leaves communications secure but is efficient enough to implement by digital computers and communications links of today.

Quantum computing

The problem is that quantum computers provide a much faster way to perform certain calculations like factoring a number. Quantum computing can speed up this factorization, by on the order of the square root of a number, as compared to normal digital computing of today.

Its possible that similar quantum computing calculations for lattice routes between points could also be sped up by an equivalent factor.  So even when we all move to lattice based PKC, it’s still possible for quantum computers to crack the code hopefully, it just takes longer.

So the mathematics behind PKC will need to change over the next 5 years or so as quantum computing becomes more of a reality. The hope is that this change will will at least keep our communications secure, at least until the next revolution in computing comes along, or quantum computing becomes even faster than that envisioned today.


Another Y2K-like problem, this time Internet routers are the problem

Read an article today in Wired about The Internet has grown too big for its aging infrastructure showing up as a serious problem that’s soon to be more widespread.

This Y2K-like problem is associated with the Border Gateway Protocol  (BGP) routing tables entries which represent IP address prefixes.  Internet routers keep BGP tables in Tertiary Content Addressable Memory (TCAM, sort of like a virtual memory page table only for router addresses) and there are physical limits as to how many BGP entries will fit into any specific Internet router.  Some routers crash when they exceed their TCAM limit and others just ignore the BGP entries that exceed their limits – neither approach seems workable long term.

Apparently we are approaching one of those hard and fast limits, at least for older routers, as the BGP routing tables reach over 512K entries.  As of May 2014, there were in excess of 500,000 BGP prefixes (table entries).

Smoking gun points to …

It appears that this time Verizon was the perpetrator. Yesterday they added 15K BGP entries to the Internet BGP table, kicking some routers over their 512K limit. This was no doubt in anticipation of some growth in Internet addresses on their networks.

The result was that LiquidWeb’s network went down. Supposedly they have an older Cisco 7600 router and the latest addition to BGP entries exceeded its TCAM capacity, crashing their router. Oops!

Verizon quickly withdrew the offending 15K BGP entry addition and things seem back to normal for the moment. But we are once again close to some arbitrary computerized limit. Only this problem won’t happen at midnight December 31st. It won’t take that long to exceed the current BGP entry limits again and next time it might not be that easy to back out.

But it’s almost like there’s no stopping it…

Just guessing here but these types of routers probably have similar limits for BGP entries exceeding 1024K entries, 2048K, 4096K, etc. With the number of internet connected devices growing exponentially, especially with the Internet of Things, I predict similar problems over the coming years. Indeed, we went from ~400K to ~500K BGP entries in just under two years and the rate of growth seems to be accelerating.

It’s really just a matter of time before even todays routers run out of TCAM slots. Y2K-like, only this time there’s no way to stop it from happening again and again in the future.  I suppose it would be better if the routers just ignored the new BGP entries rather than crashing but that would seem to put some segment of Internet routers out of their reach?  There’s got to be a way to intelligently ignore some updates or summarize prefix updates when a router runs out of TCAM entries.

Welcome to the new 512K problem.



Photo Credit(s): Cisco 7609 @ itb for INHERENT by Affan Basalamah




Replacing the Internet?

safe 'n green by Robert S. Donovan (cc) (from flickr)
safe ‘n green by Robert S. Donovan (cc) (from flickr)

Was reading an article the other day from TechCrunch that said Servers need to die to save the Internet. This article talked about a startup called MaidSafe which is attempting to re-architect/re-implement/replace the Internet into a Peer-2-Peer, mesh network and storage service which they call the SAFE (Secure Access for Everyone) network. By doing so, they hope to eliminate the need for network servers and storage.

Sometime in the past I wrote a blog post about Peer-2-Peer cloud storage (see Free P2P Cloud Storage and Computing if  interested). But it seems MaidSafe has taken this to a more extreme level. By the way the acronym MAID used in their name stands for Massive Array of Internet Disks, sound familiar?

Crypto currency eco-system

The article talks about MaidSafe’s SAFE network ultimately replacing the Internet but at the start it seems more to be a way to deploy secure, P2P cloud storage.  One interesting aspect of the MaidSafe system is that you can dedicate a portion of your Internet connected computers’ storage, computing and bandwidth to the network and get paid for it. Assuming you dedicate more resources than you actually use to the network you will be paid safecoins for this service.

For example, users that wish to participate in the SAFE network’s data storage service run a Vault application and indicate how much internal storage to devote to the service. They will be compensated with safecoins when someone retrieves data from their vault.

Safecoins are a new BitCoin like internet currency. Currently one safecoin is worth about $0.02 but there was a time when BitCoins were worth a similar amount. MaidSafe organization states that there will be a limit to the number of safecoins that can ever be produced (4.3Billion) so there’s obviously a point when they will become more valuable if MaidSafe and their SAFE network becomes successful over time. Also, earned safecoins can be used to pay for other MaidSafe network services as they become available.

Application developers can code their safecoin wallet-ids directly into their apps and have the SAFE network automatically pay them for application/service use.  This should make it much easier for App developers to make money off their creations, as they will no longer have to use advertising support, or provide differenct levels of product such as free-simple user/paid-expert use types of support to make money from Apps.  I suppose in a similar fashion this could apply to information providers on the SAFE network. An information warehouse could charge safecoins for document downloads or online access.

All data objects are encrypted, split and randomly distributed across the SAFE network

The SAFE network encrypts and splits any data up and then randomly distributes these data splits uniformly across their network of nodes. The data is also encrypted in transit across the Internet using rUDPs (reliable UDPs) and SAFE doesn’t use standard DNS services. Makes me wonder how SAFE or Internet network nodes know where rUDP packets need to go next without DNS but I’m no networking expert. Apparently by encrypting rUDPs and not using DNS, SAFE network traffic should not be prone to deep packet inspection nor be easy to filter out (except of course if you block all rUDP traffic).  The fact that all SAFE network traffic is encrypted also makes it much harder for intelligence agencies to eavesdrop on any conversations that occur.

The SAFE network depends on a decentralized PKI to authenticate and supply encryption keys. All SAFE network data is either encrypted by clients or cryptographically signed by the clients and as such, can be cryptographically validated at network endpoints.

The each data chunk is replicated on, at a minimum, 4 different SAFE network nodes which provides resilience in case a network node goes down/offline. Each data object could potentially be split up into 100s to 1000s of data chunks. Also each data object has it’s own encryption key, dependent on the data itself which is never stored with the data chunks. Again this provides even better security but the question becomes where does all this metadata (data object encryption key, chunk locations, PKI keys, node IP locations, etc.) get stored, how is it secured, and how is it protected from loss. If they are playing the game right, all this is just another data object which is encrypted, split and randomly distributed but some entity needs to know how to get to the meta-data root element to find it all in case of a network outage.

Supposedly, MaidSafe can detect within 20msec. if a node is no longer available and reconfigure the whole network. This probably means that each SAFE network node and endpoint is responsible for some network transaction/activity every 10-20msec, such as a SAFE network heartbeat to say it is still alive.

It’s unclear to me whether the encryption key(s) used for rUDPs and the encryption key used for the data object are one and the same, functionally related, or completely independent? And how a “decentralized PKI”  and “self authentication” works is beyond me but they published a paper on it, if interested.

For-profit open source business model

MaidSafe code is completely Open Source (available at MaidSafe GitHub) and their APIs are freely available to anyone and require no API key. They also have multiple approved and pending patents which have been provided free to the world for use, which they use in a defensive capacity.

MaidSafe says it will take a 5% cut of all safecoin transactions over the SAFE network. And as the network grows their revenue should grow commensurately. The money will be used to maintain the core network software and  MaidSafe said that their 5% cut will be shared with developers that help develop/fix the core SAFE network code.

They are hoping to have multiple development groups maintaining the code. They currently have some across Europe and in California in the US. But this is just a start.

They are just now coming out of stealth, have recently received $6M USD investment (by auctioning off MaidSafeCoins a progenitor of safecoins) but have been in operation now, architecting/designing/developing the core code now for 8+ years now, which probably qualifies them for the longest running startup on the planet.

Replacing the Internet

MaidSafe believes that the Internet as currently designed is too dependent on server farms to hold pages and other data. By having a single place where network data is held, it’s inherently less secure than by having data spread out, uniformly/randomly across a multiple nodes. Also the fact that most network traffic is in plain text (un-encrypted) means anyone in the network data path can examine and potentially filter out data packets.

I am not sure how the SAFE network can be used to replace the Internet but then I’m no networking expert. For example, from my perspective, SAFE is dependent on current Internet infrastructure to store and forward rUDPs on along its trunk lines and network end-paths. I don’t see how SAFE can replace this current Internet infrastructure especially with nodes only present at the endpoints of the network.

I suppose as applications and other services start to make use of SAFE network core capabilities, maybe the SAFE network can become more like a mesh network and less dependent on the current hub and spoke current Internet we have today.  As a mesh network, node endpoints can store and forward packets themselves to locally accessed neighbors and only go out on Internet hubs/trunk lines when they have to go beyond the local network link.

Moreover, the SAFE can make any Internet infrastructure less vulnerable to filtering and spying. Also, it’s clear that SAFE applications are no longer executing in data center servers somewhere but rather are actually executing on end-point nodes of the SAFE network. This has a number of advantages, namely:

  • SAFE applications are less susceptible to denial of service attacks because they can execute on many nodes.
  • SAFE applications are inherently more resilient because the operate across multiple nodes all the time.
  • SAFE applications support faster execution because the applications could potentially be executing closer to the user and could potentially have many more instances running throughout the SAFE network.

Still all of this doesn’t replace the Internet hub and spoke architecture we have today but it does replace application server farms, CDNs, cloud storage data centers and probably another half dozen Internet infrastructure/services I don’t know anything about.

Yes, I can see how MaidSafe and its SAFE network can change the Internet as we know and love it today and make it much more secure and resilient.

Not sure how having all SAFE data being encrypted will work with search engines and other web-crawlers but maybe if you want the data searchable, you just cryptographically sign it. This could be both a good and a bad thing for the world.

Nonetheless, you have to give the MaidSafe group a lot of kudos/congrats for taking on securing the Internet and making it much more resilient. They have an active blog and forum that discusses the technology and what’s happening to it and I encourage anyone interested more in the technology to visit their website to learn more



EMCworld 2013 Day 2

IMG_1382The first session of the day was with  Joe Tucci EMC Chairman and CEO.  He talked about the trends transforming IT today. These include Mobile, Cloud, Big Data and Social Networking. He then discussed  IDC’s 1st, 2nd and 3rd computing platform framework where the first was mainframe, the second was client-server and the third is mobile. Each of these platforms had winers and losers.  EMC wants definitely to be one of the winners in the coming age of mobile and they are charting multiple paths to get there.

Mainly they will use Pivotal, VMware, RSA and their software defined storage (SDS) product to go after the 3rd platform applications.  Pivotal becomes the main enabler to help companies gain value out of the mobile-social networking-cloud computing data deluge.  SDS helps provide the different pathways for companies to access all that data. VMware provides the software defined data center (SDDC) where SDS, server virtualization and software defined networking (SDN) live, breathe and interoperate to provide services to applications running in the data center.

Joe started talking about the federation of EMC companies. These include EMC, VMware, RSA and now Pivotal. He sees these four brands as almost standalone entities whose identities will remain distinct and seperate for a long time to come.

Joe mentioned the internet of things or the sensor cloud as opening up new opportunities for data gathering and analysis that dwarfs what’s coming from mobile today. He quoted IDC estimates that says by 2020 there will be 200B devices connected to the internet, today there’s just 2 to 3B devices connected.

Pivotal’s debut

Paul Maritz, Pivotal CEO got up and took us through the Pivotal story. Essentially they have three components a data fabric, an application development fabric and a cloud fabric. He believes the mobile and internet of things will open up new opportunities for organizations to gain value from their data wherever it may lie, that goes well beyond what’s available today. These activities center around consumer grade technologies  which 1) store and reason over very large amounts of data; 2) use rapid application development; and 3) operate at scale in an entirely automated fashion.

He mentioned that humans are a serious risk to continuous availability. Automation is the answer to the human problem for the “always on”, consumer grade technologies needed in the future.

Parts of Pivotal come from VMware, Greenplum and EMC with some available today in specific components. However by YE they will come out with Pivotal One which will be the first framework with data, app development and cloud fabrics coupled together.

Paul called Pivotal Labs as the special forces of his service organization helping leading tech companies pull together the awesome apps needed for the technology of tomorrow, consisting of Extreme programming, Agile development and very technically astute individuals.  Also, CETAS was mentioned as an analytics-as-a-service group providing such analytics capabilities to gaming companies doing log analysis but believes there’s a much broader market coming.

IMG_1393Paul also showed some impressive numbers on their new Pivotal HD/HAWQ offering which showed it handled many more queries than Hive and Cloudera/Impala. In essence, parts of Pivotal are available today but later this year the whole cloud-app dev-big data framework will be released for the first time.

IMG_1401Next up was a media-analyst event where David Goulden, EMC President and COO gave a talk on where EMC has come from and where they are headed from a business perspective.

Then he and Joe did a Q&A with the combined media and analyst community.  The questions were mostly on the financial aspects of the company rather than their technology, but there will be a more focused Q&A session tomorrow with the analyst community.

IMG_1403 Joe was asked about Vblock status. He said last quarter they announced it had reached a $1B revenue run rate which he said was the fastest in the industry.  Joe mentioned EMC is all about choice, such as Vblock different product offerings, VSpex product offerings and now with ViPR providing more choice in storage.

Sometime today Joe had mentioned that they don’t really do custom hardware anymore.  He said of the 13,000 engineers they currently have ~500 are hardware engineers. He also mentioned that they have only one internally designed ASIC in current shipping product.

Then Paul got up and did a Q&A on Pivotal.  He believes there’s definitely an opportunity in providing services surrounding big data and specifically mentioned CETAS as offering analytics-as-a-service as well as Pivotal Labs professional services organization.  Paul hopes that Pivotal will be $1B revenue company in 5yrs.  They already have $300M so it’s well on its way to get there.

IMG_1406Next, there was a very interesting media and analyst session that was visually stimulating from Jer Thorp, co-founder of The Office for Creative Research. And about the best way to describe him is he is a data visualization scientist.

IMG_1409He took some NASA Kepler research paper with very dry data and brought it to life. Also he did a number of analyzes of public Twitter data and showed twitter user travel patterns, twitter good morning analysis, twitter NYT article Retweetings, etc.  He also showed a video depicting people on airplanes around the world. He said it is a little known fact but over a million people are in the air at any given moment of the day.

Jer talked about the need for data ethics and an informed data ownership discussion with people about the breadcrumbs they leave around in the mobile connected world of today. If you get a chance, you should definitely watch his session.IMG_1410

Next Juergen Urbanski, CTO T-Systems got up and talked about the importance of Hadoop to what they are trying to do. He mentioned that in 5 years, 80% of all new data will land on Hadoop first.  He showed how Hadoop is entirely different than what went before and will take T-Systems in vastly new directions.

Next up at EMCworld main hall was Pat Gelsinger, VMware CEO’s keynote on VMware.  The story was all about Software Defined Data Center (SDDC) and the components needed to make this happen.   He said data was the fourth factor of production behind land, capital and labor.

Pat said that networking was becoming a barrier to the realization of SDDC and that they had been working on it for some time prior to the Nicera acquisition. But now they are hard at work merging the organic VMware development with Nicera to create VMware NSX a new software defined networking layer that will be deployed as part of the SDDC.

Pat also talked a little bit about how ViPR and other software defined storage solutions will provide the ease of use they are looking for to be able to deploy VMs in seconds.

Pat demo-ed a solution specifically designed for Hadoop clusters and was able to configure a hadoop cluster with about 4 clicks and have it start deploying. It was going to take 4-6 minutes to get it fully provisioned so they had a couple of clusters already configured and they ran a pseudo Hadoop benchmark on it using visual recognition and showed how Vcenter could be used to monitor the cluster in real time operations.

Pat mentioned that there are over 500,000 physical servers running Hadoop. Needless to say VMware sees this as a prime opportunity for new and enhanced server virtualization capabilities.

That’s about it for the major keynotes and media sessions from today.

Tomorrow looks to be another fun day.

Bringing Internet to rural Africa using TV

Read an article the other day from BBC, named TV white space connecting rural Africa about how radio spectrum designed for TV is being used to bring Internet access to rural Africa.

The group promoting TV for Internet connectivity is the 4Afrika Initiative from Microsoft.  Their stated intent is to engage in the economic development of Africa to improve its global competitiveness.

Why TV?

Apparently, the TV spectrum has a number of attributes that make it very useful to provide Internet connectivity.  In the article they talked about 400mhz as being very resilient that propagates well around natural obstructions, through walls and goes long distances.

Although these days, Africa has plenty of undersea cables connecting it to the rest of the world, getting fiber connectivity to rural Africa has been too costly to date.  So if the last mile (or in the case of rural Africa, 100km) problem can be solved then Internet access can be available to all communities.

But the main problem is that this spectrum is usually licensed to TV stations. On the other hand, Africa probably has plenty of TV spectrum not currently being used for active broadcasting, especially across rural Africa.  As such, using this “white space” in TV signals to provide Internet access is a great alternative use of the spectrum.

With a solar powered base station libraries, schools, healthcare centers, government offices, etc., in rural Africa can now be connected to the Internet.  Presently many of these rural Africa locations have no electricity and no telephone lines whatsoever.

Providing internet access to such locations will enable e-learning, more informed access to agricultural markets as well as a plethora of advanced communications technologies currently absent from their villages.

Why Microsoft?

Microsoft has been actively engaged in Africa for over 20 years now.  And more  storage vendors have started listing Africa as a blossoming market for their gear, where they are all engaged in upgrading IT and telecommunications infrastructure. Microsoft has an interesting graphic on their involvement in Africa over the past two decades (see 4Africa Infographic).

We have discussed the emergence of mobile and cloud as a leap-frog technologies propelling Africa and especially Kenya into the information economy, (please see Mobile health (mHealth) takes off in Kenya and Is cloud a leabfrog technology posts). But Internet access is even broader than the just mobile or cloud and is certainly complementary (and for cloud, a necessary infrastructure) for both these technologies.

Africa, welcome to the Information Economy…


Photo Credits: DIY antenna (bottlenet) by robin.elaine

EU vs. US on data protection

Prison Planet by AZRainman (cc) (from Flickr)
Prison Planet by AZRainman (cc) (from Flickr)

Last year I was at SNW and talking to a storage admin from a large, international company who mentioned how data protection policies in EU were forcing them to limit where data gets copied and replicated.  Some of their problem was due to different countries having dissimilar legislation regarding data privacy and protection.

However, their real concern was how to effectively and automatically sanitize this information. It seems they would like to analyze it off shore but still adhere to EU country’s data protection legislation.

Recently, there has been more discussions in the EU about data protection requirement (See NY Times post on Consumer Data Protection Laws, an Ocean Apart and the Ars Technica post Proposed EU data protection reform could start a “trade war”).  It seems, EU proposals are becoming even more at odds with current US data protection environment.

Compartmentalized US data privacy

In the US, data protection seems much more compartmentalized and decentralized. We have data protection for health care information, video rentals, credit reports, etc. Each with their own provisions and protection regime.

This allows companies in different markets pretty much internal control over what they do with customer information but tightly regulates what happens with the data as it moves outside that environment.

Within such an data protection regime an internet company can gather all the information they want on a person’s interaction with their web services and that way better target services and advertising for the user.

EU’s broader data protection regime

In contrast, EU countries have a much broader regime in place that covers any and all personal information.  The EU wants to ultimately control how much information can be gathered by a company about what a person does online and provide an expunge on demand capability directly to the individual.

EU’s proposed new rules would standardize data privacy rules across the 27 country region but would also strengthen them in the process.  Doing so, would make it much harder to personalize services and the presumption is that the internet companies trying to do so would not make as much revenue in the EU because of this.

Although US companies and government officials have been lobbying heavily to change the new proposals it appears to be backfiring and causing a backlash.  EU considers the US position to be biased to commerce and commercial interests whereas, US considers the EU position to be more biased to the individual.

US data privacy is evolving

On this side of the Atlantic, the privacy tide may be rising as well.  Recently, the President has recently proposed a “Consumer Privacy Bill Of Rights” which would enshrine some of the same privacy rights present in the EU proposals. For instance, such a regime would include rights for individuals to see any and all information company’s have on them, rights to correct such information and rights to limit how much information companies collect on individuals.

This all sounds a lot closer to what the EU currently has and where they seem to want to go.

However, how this plays out in Congress and what ultimately emerges as data protection and privacy legislation is another matter. But for the moment it seems that governments on both sides of the Atlantic are pushing for more data protection not less.