Replacing the Internet?

safe 'n green by Robert S. Donovan (cc) (from flickr)
safe ‘n green by Robert S. Donovan (cc) (from flickr)

Was reading an article the other day from TechCrunch that said Servers need to die to save the Internet. This article talked about a startup called MaidSafe which is attempting to re-architect/re-implement/replace the Internet into a Peer-2-Peer, mesh network and storage service which they call the SAFE (Secure Access for Everyone) network. By doing so, they hope to eliminate the need for network servers and storage.

Sometime in the past I wrote a blog post about Peer-2-Peer cloud storage (see Free P2P Cloud Storage and Computing if  interested). But it seems MaidSafe has taken this to a more extreme level. By the way the acronym MAID used in their name stands for Massive Array of Internet Disks, sound familiar?

Crypto currency eco-system

The article talks about MaidSafe’s SAFE network ultimately replacing the Internet but at the start it seems more to be a way to deploy secure, P2P cloud storage.  One interesting aspect of the MaidSafe system is that you can dedicate a portion of your Internet connected computers’ storage, computing and bandwidth to the network and get paid for it. Assuming you dedicate more resources than you actually use to the network you will be paid safecoins for this service.

For example, users that wish to participate in the SAFE network’s data storage service run a Vault application and indicate how much internal storage to devote to the service. They will be compensated with safecoins when someone retrieves data from their vault.

Safecoins are a new BitCoin like internet currency. Currently one safecoin is worth about $0.02 but there was a time when BitCoins were worth a similar amount. MaidSafe organization states that there will be a limit to the number of safecoins that can ever be produced (4.3Billion) so there’s obviously a point when they will become more valuable if MaidSafe and their SAFE network becomes successful over time. Also, earned safecoins can be used to pay for other MaidSafe network services as they become available.

Application developers can code their safecoin wallet-ids directly into their apps and have the SAFE network automatically pay them for application/service use.  This should make it much easier for App developers to make money off their creations, as they will no longer have to use advertising support, or provide differenct levels of product such as free-simple user/paid-expert use types of support to make money from Apps.  I suppose in a similar fashion this could apply to information providers on the SAFE network. An information warehouse could charge safecoins for document downloads or online access.

All data objects are encrypted, split and randomly distributed across the SAFE network

The SAFE network encrypts and splits any data up and then randomly distributes these data splits uniformly across their network of nodes. The data is also encrypted in transit across the Internet using rUDPs (reliable UDPs) and SAFE doesn’t use standard DNS services. Makes me wonder how SAFE or Internet network nodes know where rUDP packets need to go next without DNS but I’m no networking expert. Apparently by encrypting rUDPs and not using DNS, SAFE network traffic should not be prone to deep packet inspection nor be easy to filter out (except of course if you block all rUDP traffic).  The fact that all SAFE network traffic is encrypted also makes it much harder for intelligence agencies to eavesdrop on any conversations that occur.

The SAFE network depends on a decentralized PKI to authenticate and supply encryption keys. All SAFE network data is either encrypted by clients or cryptographically signed by the clients and as such, can be cryptographically validated at network endpoints.

The each data chunk is replicated on, at a minimum, 4 different SAFE network nodes which provides resilience in case a network node goes down/offline. Each data object could potentially be split up into 100s to 1000s of data chunks. Also each data object has it’s own encryption key, dependent on the data itself which is never stored with the data chunks. Again this provides even better security but the question becomes where does all this metadata (data object encryption key, chunk locations, PKI keys, node IP locations, etc.) get stored, how is it secured, and how is it protected from loss. If they are playing the game right, all this is just another data object which is encrypted, split and randomly distributed but some entity needs to know how to get to the meta-data root element to find it all in case of a network outage.

Supposedly, MaidSafe can detect within 20msec. if a node is no longer available and reconfigure the whole network. This probably means that each SAFE network node and endpoint is responsible for some network transaction/activity every 10-20msec, such as a SAFE network heartbeat to say it is still alive.

It’s unclear to me whether the encryption key(s) used for rUDPs and the encryption key used for the data object are one and the same, functionally related, or completely independent? And how a “decentralized PKI”  and “self authentication” works is beyond me but they published a paper on it, if interested.

For-profit open source business model

MaidSafe code is completely Open Source (available at MaidSafe GitHub) and their APIs are freely available to anyone and require no API key. They also have multiple approved and pending patents which have been provided free to the world for use, which they use in a defensive capacity.

MaidSafe says it will take a 5% cut of all safecoin transactions over the SAFE network. And as the network grows their revenue should grow commensurately. The money will be used to maintain the core network software and  MaidSafe said that their 5% cut will be shared with developers that help develop/fix the core SAFE network code.

They are hoping to have multiple development groups maintaining the code. They currently have some across Europe and in California in the US. But this is just a start.

They are just now coming out of stealth, have recently received $6M USD investment (by auctioning off MaidSafeCoins a progenitor of safecoins) but have been in operation now, architecting/designing/developing the core code now for 8+ years now, which probably qualifies them for the longest running startup on the planet.

Replacing the Internet

MaidSafe believes that the Internet as currently designed is too dependent on server farms to hold pages and other data. By having a single place where network data is held, it’s inherently less secure than by having data spread out, uniformly/randomly across a multiple nodes. Also the fact that most network traffic is in plain text (un-encrypted) means anyone in the network data path can examine and potentially filter out data packets.

I am not sure how the SAFE network can be used to replace the Internet but then I’m no networking expert. For example, from my perspective, SAFE is dependent on current Internet infrastructure to store and forward rUDPs on along its trunk lines and network end-paths. I don’t see how SAFE can replace this current Internet infrastructure especially with nodes only present at the endpoints of the network.

I suppose as applications and other services start to make use of SAFE network core capabilities, maybe the SAFE network can become more like a mesh network and less dependent on the current hub and spoke current Internet we have today.  As a mesh network, node endpoints can store and forward packets themselves to locally accessed neighbors and only go out on Internet hubs/trunk lines when they have to go beyond the local network link.

Moreover, the SAFE can make any Internet infrastructure less vulnerable to filtering and spying. Also, it’s clear that SAFE applications are no longer executing in data center servers somewhere but rather are actually executing on end-point nodes of the SAFE network. This has a number of advantages, namely:

  • SAFE applications are less susceptible to denial of service attacks because they can execute on many nodes.
  • SAFE applications are inherently more resilient because the operate across multiple nodes all the time.
  • SAFE applications support faster execution because the applications could potentially be executing closer to the user and could potentially have many more instances running throughout the SAFE network.

Still all of this doesn’t replace the Internet hub and spoke architecture we have today but it does replace application server farms, CDNs, cloud storage data centers and probably another half dozen Internet infrastructure/services I don’t know anything about.

Yes, I can see how MaidSafe and its SAFE network can change the Internet as we know and love it today and make it much more secure and resilient.

Not sure how having all SAFE data being encrypted will work with search engines and other web-crawlers but maybe if you want the data searchable, you just cryptographically sign it. This could be both a good and a bad thing for the world.

Nonetheless, you have to give the MaidSafe group a lot of kudos/congrats for taking on securing the Internet and making it much more resilient. They have an active blog and forum that discusses the technology and what’s happening to it and I encourage anyone interested more in the technology to visit their website to learn more

~~~~

Comments?

Biggest data security breach yet in the US

safe 'n green by Robert S. Donovan (cc) (from flickr)
safe ‘n green by Robert S. Donovan (cc) (from flickr)

Read an article today about a data breach that hit a number of financial institutions and retail centers that handle credit card information (Please see Five Indicted In New Jersey For Largest Known Data Breach Conspiracy).  The institutions, in total, breached ~160M credit cards among other confidential information.

The hackers were from Russia and Ukraine and used an “SQL injection” attack with malware to cover their tracks. SQL injection appends SQL commands to the end of an entry field which then gets interpreted as a valid SQL command that can then me used to dump an SQL database.

This indictment documents the largest data breach in US judicial history.  However, Verizon’s 2013 Data Breach Investigation Report (DBIR) indicates that there were 621 confirmed data breaches in 2012 which compromised 44 million records and for the nine year history collected in VERIS Community Database over 1.1Billion records have been compromised. So it’s hard to tell if this is a World record or just a US one. Small consolation to the customers and the institutions which lost the information.

Data security to the rescue?

In the data storage industry we talk a lot about data encryption of data-in-flight and data-at-rest. It’s unclear to me whether data storage encryption services would have done anything to help mitigate this major data breach as the perpetuators gained SQL command access to a database which would normally have plain text access to the data.

However, there are other threats where data storage encryption can help. Just a couple of years ago,

  • A commercial bank’s backup tapes were lost/stolen which contained over 1 million bank records containing social security information and other sensitive data.
  • A government laptop was stolen containing over 28 million discharged veterans social security numbers.

These are just two examples but I am sure there were more where proper data-at-rest encryption would have saved the data from being breached.

Data encryption is not enough

Nevertheless, data encryption is only one layer in a multi-faceted/multi-layered security perimeter that needs to be in place to reduce and someday perhaps, eliminate the risk of losing confidential customer information.

Apparently, SQL injection can be defeated by proper filtering or strongly typing all user input fields.  Not exactly sure how hard this would be to do, but if it could be used to save the security of 160 Million credit cards and potentially defeat one of the top ten web application vulnerabilities, it should have been a high priority on somebody’s to-do list.

Comments?

Enhanced by Zemanta

CommVault’s Simpana 9 release

CommVault annoucned a new release of their data protection product today – Simpana® 9.  The new software provides significantly enhanced support for VM backup, new source-level deduplication capabilities and other enhanced facilities.

Simpana 9 starts by defining 3 tiers of data protection based on their Snapshot Protection Client (SPC):

  • Recovery tier – using SPC application consistent hardware snapshots can be taken utilizing storage interfaces to create content aware granular level recovery.  Simpana 9 SPC now supports EMC, NetApp, HDS, Dell, HP, and IBM (including LSI) storage snapshot capabilities.  Automation supplied with Simpana 9 allows the user to schedule hardware snapshots at various intervals throughout the day such  that they can be used to recover data without delay.
  • Protection tier – using mounted snapshot(s) provided by SPC above, Simpana 9 can create an extract or physical backup set copy to any disk type (DAS, SAN, NAS) providing a daily backup for retention purposes. This data can be deduplicated and encrypted for increased storage utilization and data security.
  • Compliance tier – selective backup jobs can then be sent to cloud storage and/or archive appliances such as HDS HCP or Dell DX for long term retention and compliance, preserving CommVault’s deduplication and encryption.  Alternatively, compliance data can be sent to the cloud.  CommVault’s previous cloud storage support included Amazon S3, Microsoft Azure, Rackspace, Iron Mountain and Nirvanix, with Simpana 9, they now add EMC Atmos providers and Mezeo to the mix.

Simpana 9 VM backup support

Simpana 9 also introduces a SnapProtect Enable Virtual Server Agent (VSA) to speed up virtual machine datastore backups.  With VSA’s support for storage hardware snapshot backups and VMware facilities to provide application consistent backups, virtual server environments can now scale to 1000s of VMs without concern for backup’s processing and IO impact to ongoing activity.  VSA snapshots can be mounted afterwards to a proxy server and using VMware services extract file level content which CommVault can then data deduplicate, encrypt and offload to other media that allows for granular content recovery.

In addition, Simpana 9 supports auto-discovery of virtual machines with auto-assignment of data protection policies.  As such, VM guests can be automatically placed into an appropriate, pre-defined data protection regimen without the need for operator intervention after VM creation.

Also with all the meta-data content cataloguing, Simpana 9 now supplies a light weight file-oriented Storage Resources Manager capability via the CommVault management interface.  Such services can provide detailed file level analytics for VM data without the need for VM guest agents.

Simpana 9 new deduplication support

CommVault’s 1st gen deduplication with Simpana 7 was at the object level.  With Simpana 8 deduplication occured at the block level providing content aware variable block sizes and added software data encryption support for disk or tape backup sets.  With today’s release, Simpana 9 shifts some deduplication processing out to the source (the client) increasing backup data throughput by reducing data transfer. All this sounds similar to EMC’s Data Domain Boost capability introduced earlier this year .

Such a change takes advantage of the CommVault’s intelligent Data Agent (iDA) running in the clients to provide pre-deduplication hashing and list creation rather than doing this all at CommVault’s Media Agent node, reducing data to be transferred.  Further, CommVault’s data deduplication can be applied across a number of clients for a global deduplication service that spans remote clients as well as a central data center repositories.

Simpana 9 new non-CommVault backup reporting and migration capabilities

Simpana 9 provides a new data collector for NetBackup versions 6.0, 6.5, and 7.0 and TSM 6.1 which allows CommVault to discover other backup services in the environment, extract backup policies, client configurations, job histories, etc. and report on these foreign backup processes.  In addition, once their data collertor is in place, Simpana 9 also supports automated procedures that can roll out and convert all these other backup services to CommVault data protection over a weekend, vastly simplifying migration from non-CommVault to Simpana 9 data protection.

Simpana 9 new software licensing

CommVault is also changing their software licensing approach to include more options for capacity based licensing. Previously, CommVault supported limited capacity based licensing but mostly used CommVault architectural component level licensing.  Now, they have expanded the capacity licensing offerings and both licensing modes are available so the customer can select whichever approach proves best for them.  With CommVault’s capacity-based licensing, usage can be tracked on the fly to show when customers may need to purchase a larger capacity license.

Probably other enhancements I missed here as Simpana 9 was a significant changeover from Simpana 8. Nonetheless, this version’s best feature was their enhanced approach to VM backups, allowing more VMs to run on a single server without concern for backup overhead.  The fact that they do source-level pre-deduplication processing just adds icing to the cake.

What do you think?

Chart of the month: SPC-1 LRT performance results

Chart of the Month: SPC-1 LRT(tm) performance resultsThe above chart shows the top 12 LRT(tm) (least response time) results for Storage Performance Council’s SPC-1 benchmark. The vertical axis is the LRT in milliseconds (msec.) for the top benchmark runs. As can be seen the two subsystems from TMS (RamSan400 and RamSan320) dominate this category with LRTs significantly less than 2.5msec. IBM DS8300 and it’s turbo cousin come in next followed by a slew of others.

The 1msec. barrier

Aside from the blistering LRT from the TMS systems one significant item in the chart above is that the two IBM DS8300 systems crack the <1msec. barrier using rotating media. Didn’t think I would ever see the day, of course this happened 3 or more years ago. Still it’s kind of interesting that there haven’t been more vendors with subsystems that can achieve this.

LRT is probably most useful for high cache hit workloads. For these workloads the data comes directly out of cache and the only thing between a server and it’s data is subsystem IO overhead, measured here as LRT.

Encryption cheap and fast?

The other interesting tidbit from the chart is that the DS5300 with full drive encryption (FDE), (drives which I believe come from Seagate) cracks into the top 12 at 1.8msec exactly equivalent with the IBM DS5300 without FDE. Now FDE from Seagate is a hardware drive encryption capability and might not be measurable at a subsystem level. Nonetheless, it shows that having data security need not reduce performance.

What is not shown in the above chart is that adding FDE to the base subsystem only cost an additional US$10K (base DS5300 listed at US$722K and FDE version at US$732K). Seems like a small price to pay for data security which in this case is simply turn it on, generate keys, and forget it.

FDE is a hard drive feature where the drive itself encrypts all data written and decrypts all data read to from a drive and requires a subsystem supplied drive key at power on/reset. In this way the data is never in plaintext on the drive itself. If the drive were taken out of the subsystem and attached to a drive tester all one would see is ciphertext. Similar capabilities have been available in enterprise and SMB tape drives is the past but to my knowledge the IBM DS5300 FDE is the first disk storage benchmark with drive encryption.

I believe the key manager for the DS5300 FDE is integrated within the subsystem. Most shops would need a separate, standalone key manager for more extensive data security. I believe the DS5300 can also interface with an standalone (IBM) key manager. In any event, it’s still an easy and simple step towards increased data security for a data center.

The full report on the latest SPC results will be up on my website later this week but if you want to get this information earlier and receive your own copy of our newsletter – email me at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_Newsletter.