One startup that caught my eye was SpaceBelt from Cloud Constellation Corporation, that’s planning to put PB (4X library of congress) of data storage in a constellation of LEO satellites.
The LEO storage pool will be populated by multiple nodes (satellites) with a set of geo-synchronous access points to the LEO storage pool. Customers use ground based secure terminals to talk with geosynchronous access satellites which communicate to the LEO storage nodes to access data.
Their main selling points appear to be data security and availability. The only way to access the data is through secured satellite downlinks/uplinks and then you only get to the geo-synchronous satellites. From there, those satellites access the LEO storage cloud directly. Customers can’t access the storage cloud without going through the geo-synchronous layer first and the secured terminals.
The problem with terrestrial data is that it is prone to security threats as well as natural disasters which take out a data center or a region. But with all your data residing in a space cloud, such concerns shouldn’t be a problem. (However, gaining access to your ground stations is a whole different story.
AWS and Lockheed-Martin supply new ground station service
The other company of interest is not a startup but a link up between Amazon and Lockheed Martin (see: Amazon-Lockheed Martin …) that supplies a new cloud based, satellite ground station as a service offering. The new service will use Lockheed Martin ground stations.
Currently, the service is limited to S-Band and attennas located in Denver, but plans are to expand to X-Band and locations throughout the world. The plan is to have ground stations located close to AWS data centers, so data center customers can have high speed, access to satellite data.
There are other startups in the ground station as a service space, but none with the resources of Amazon-Lockheed. All of this competition is just getting off the ground, but a few have been leasing idle ground station resources to customers. The AWS service already has a few big customers, like DigitalGlobe.
One thing we have learned, is that the appeal of cloud services is as much about the ecosystem that surrounds it, as the service offering itself. So having satellite ground stations as a service is good, but having these services, tied directly into other public cloud computing infrastructure, is much much better. Google, Microsoft, IBM are you listening?
Data centers in space
Why stop at storage? Wouldn’t it be better to support both storage and computation in space. That way access latencies wouldn’t be a concern. When terrestrial disasters occur, it’s not just data at risk. Ditto, for security threats.
Having whole data centers, would represent a whole new stratum of cloud computing. Also, now IT could implement space native applications.
If Microsoft can run a data center under the oceans, I see no reason they couldn’t do so in orbit. Especially when human flight returns to NASA/SpaceX. Just imagine admins and service techs as astronauts.
And yet, security and availability aren’t the only threats one has to deal with. What happens to the space cloud when war breaks out and satellite killers are set loose.
Yes, space infrastructure is not subject to terrestrial disasters or internet based security risks, but there are other problems besides those and war that exist such as solar storms and space debris clouds. .
In the end, it’s important to have multiple, non-overlapping risk profiles for your IT infrastructure. That is each IT deployment, may be subject to one set of risks but those sets are disjoint with another IT deployment option. IT in space, that is subject to solar storms, space debris, and satellite killers is a nice complement to terrestrial cloud data centers, subject to natural disasters, internet security risks, and other earth-based, man made disasters.
On the other hand, a large, solar storm like the 1859 one, could knock every data system on the world or in orbit, out. As for under the sea, it probably depends on how deep it was submerged!!
We have discussed blockchain before (see my post on BlockStack). Blockchains can be used to implement an immutable ledger useful for smart contracts, electronic asset tracking, secured financial transactions, etc.
BlockStack was being used to implement Private Key Infrastructure and to implement a worldwide, distributed file system.
IBM’s Blockchain-as-a-service offering has a plugin based consensus that can use super majority rules (2/3+1 of members of a blockchain must agree to ledger contents) or can use consensus based on parties to a transaction (e.g. supplier and user of a component).
BitCoin (an early form of blockchain) consensus used data miners (performing hard cryptographic calculations) to determine the shared state of a ledger.
There can be any number of blockchains in existence at any one time. Microsoft Azure also offers Blockchain as a service.
The potential for blockchains are enormous and very disruptive to middlemen everywhere. Anywhere ledgers are used to keep track of assets, information, money, etc, that undergo transformations, transitions or transactions as they are further refined, produced and change hands, can be easily tracked in blockchains. The only question is can these assets, information, currency, etc. be digitally fingerprinted and can that fingerprint be read/verified. If such is the case, then blockchains can be used to track them.
New uses for Blockchain
IBM showed a demo of their new supply chain management service based on z Systems blockchain in action. IBM component suppliers record when they shipped component(s), shippers would record when they received the component(s), port authorities would record when components arrived at port, shippers would record when parts cleared customs and when they arrived at IBM facilities. Not sure if each of these transitions were recorded, but there were a number of records for each component shipment from supplier to IBM warehouse. This service is live and being used by IBM and its component suppliers right now.
Leanne Kemp, CEO Everledger, presented another example at IBM Edge (presumably built on z Systems Hyperledger service) used to track diamonds from mining, to cutter, to polishing, to wholesaler, to retailer, to purchaser, and beyond. Apparently the diamonds have a digital bar code/fingerprint/signature that’s imprinted microscopically on the diamond during processing and can be used to track diamonds throughout processing chain, all the way to end-user. This diamond blockchain is used for fraud detection, verification of ownership and digitally certify that the diamond was produced in accordance of the Kimberley Process.
Everledger can also be used to track any other asset that can be digitally fingerprinted as they flow from creation, to factory, to wholesaler, to retailer, to customer and after purchase.
Why z System blockchains
What makes z Systems a great way to implement blockchains is its securely, isolated partitioning and advanced cryptographic capabilities such as z System functionality accelerated hashing, signing & securing and hardware based encryption to speed up blockchain processing. z Systems also has FIPS-140 level 4 certification which can provide the highest security possible for blockchain and other security based operations.
From IBM’s perspective blockchains speak to the advantages of the mainframe environments. Blockchains are compute intensive, they require sophisticated cryptographic services and represent formal systems of record, all traditional strengths of z Systems.
Aside from the service offering, IBM has made numerous contributions to the Hyperledger project. I assume one could just download the z Systems code and run it on any LinuxONE processing environment you want. Also, since Hyperledger is Linux based, it could just as easily run in any OpenPower server running an appropriate version of Linux.
Blockchains will be used to maintain the system of record of the future just like mainframes maintained the systems of record of today and the past.
At a former employer, I first talked with StorageGRID (Bycast at the time) a decade or so ago. At that time, they were focused on medical and healthcare verticals and had a RAIN (redundant array of independent nodes) storage solution. It has come a long way.
StorageGRID Business is booming
On the call, NetApp announced they sold 50PB of StorageGRID in FY’16 with 20PB of that in the last quarter and also reported 270% Y/Y revenue growth, which means they are starting to gain some traction in the marketplace. Are we seeing an acceleration of object storage adoption?
As you may recall, StorageGRID comes in a software only solution that runs on just about any white box server with DAS or as two hardware appliances: the SG5612 (12 drive); and the SG5660 (60 drive) nodes. You can mix and match any appliance with any white box software only solution, they don’t have to have the same capacity or performance. But all nodes need network and controller/admin node(s) access.
Somewhere during Bycast’s journey they developed support for tape archives and information lifecycle management (ILM) for objects. The previous generation, StorageGrid 10.2 had a number of features, including:
S3 cloud archive support that allowed objects to be migrated to AWS S3 as they were no longer actively accessed
NAS bridge support that allowed CIFS/SMB or NFS access to StorageGRID objects, which could also be read as S3 objects for easier migration to/from object storage;
Hierarchical erasure codingoption that was optimized for efficiently storing large objects;
Node level erasure coding support that can be used to rebuild data for node drive failures, without having to go outside the node data retrieval;
Object byte-granular range read support that allowed users to read an object at any byte offset without requiring rebuild;
Support for OpenStack Swift API that made StorageGRID objects natively available to any OpenStack service; and
Software support for running as Docker containers or as a VM under VMware ESX, or OpenStack KVM that allowed StorageGRID software to run just about anywhere.
StorageGRID present and future
But customers complained StorageGRID was too complex to install and update which required too much hand holding by NetApp professional services. StorageGRID Webscale 10.3 was targeted to address these deficiencies. Some of the features in StorageGrid 10.3, include:
Radically simplified, more modern UI, new dashboard and policy wizard/editor, so that it’s a lot easier to manage the StorageGRID. All features of the UI are also available via RESTfull API access and the UI is the same for white box, software only implementations as well as appliance configurations.
Simplified automated installation scripts, so that installations that used to take multiple steps, separate software installs and required professional services support, now use a full-solution software stack install, take only minutes and can be done by the customers alone;
S3 object versioning support, so that objects can have multiple versions, limited via the UI, if needed, but provide a snapshot-like capability for S3 data that protects against object accidental deletion.
ILM policy change predictions/modeling, so that admins can now see how changes to ILM policies will impact StorageGRID.
Even more flexibility in DAS storage, so that future StorageGRID configurations can support 10TB drives and 6TB FIPS-140 drive encryption support, which adds to the current drive capacity and data security options already available in StorageGRID.
To top it all off, StorageGRID 10.3 improves performance for both small (30KB) and large (300MB) object get/puts.
Small S3 Load Data Router (LDR, 1-thread) object performance has improved ~4X for both PUTs and GETs; and
Large S3 LDR (1-thread) object performance has improved ~2X for PUTs and ~4X for GETs.
Object storage market heating up
Apparently, service providers are adopting object storage to provide competition to AWS, Azure and Google cloud storage for backup and storage archives as well as for DR as a service. Also, many media and other customers managing massive data repositories are turning to object storage to support their multi-site, very large file libraries. And as more solution vendors support S3 object protocols for data access and archive, something like StorageGRID can become their onsite-offsite storage alternative.
And Amazon, Azure and Google are starting to realize that most enterprise customers are not going to leap to the cloud for everything they do. So, some sort of hybrid solution is needed for the long term. Having an on premises and off premises object storage solution that can also archive/migrate data to the cloud is a great hybrid alternative that takes enterprises one step closer to the cloud.
I have been doing some testing with both Azure and Amazon Web Services (AWS) these last few weeks and have observed a significant difference in the security setups for both of these cloud services, at least when it comes to Linux compute instances and cloud storage.
First, let me state at the outset, all of my security setups for both AWS and Amazon was done through using the AWS console or the Azure (classic) portal. I believe anything that can be done with the portal/console for both AWS and Azure can also be done in the CLI or the REST interface. I only used the portal/console for these services, so can’t speak to the ease of using AWS’s or Azure’s CLI or REST services.
EC2 instance security is pretty easy to setup and use, at least for Linux users:
When you set up an (Linux) EC2 instance you are asked to set up a Public Key Infrastructure file (.pem) to be used for SSH/SFTP/SCP connections. You just need to copy this file to your desktop/laptop/? client system. When you invoke SSH/SFTP/SCP, you use the “-i” (identity file) option and specify the path to the (.PEM) certificate file. The server is already authorized for this identity. If you lose it, AWS services will create another one for you as an option when connecting to the machine.
When you configure the AWS instance, one (optional) step is to configure its security settings. And one option for this is to allow connections only from ‘my IP address’, how nice. You don’t even have to know your IP address, AWS just figures it out for itself and configures it.
That’s about it. Unclear to me how well this secures your EC2 instance but it seems pretty secure to me. As I understand it, a cyber criminal would need to know and spoof your IP address to connect to or control remotely the EC2 instance. And if they wanted to use SSH/SFTP/SCP they would either have to access to the identity file. I don’t believe I ever set up a password for the EC2 instance.
As for EBS storage, there’s no specific security associated with EBS volumes. Its security is associated with the EC2 instance it’s attached to. It’s either assigned/attached to an EC2 instance and secured there, or it’s unassigned/unattache. For unattached volumes, you may be able to snapshot it (to an S3 bucket within your administration control) or delete it (if it’s unattached, but for either of these you have to be an admin for the EC2 domain.
As for S3 bucket security, I didn’t see any S3 security setup that mimicked the EC2 instance steps outlined above. But in order to use AWS automated billing report services for S3, you have to allow the service to have write access to your S3 buckets. You do this by supplying an XML-like security policy, and applying this to all S3 buckets you wish to report on (or maybe it’s store reports in). AWS provides a link to the security policy page which just so happens to have the XML-like file you will need to do this. All I did was copy this text and insert it into a window that was opened when I said I wanted to apply a security policy to the bucket.
I did find that S3 bucket security, made me allow public access (I think, can’t really remember exactly) to the S3 bucket to be able to list and download objects from the bucket from the Internet. I didn’t like this, but it was pretty easy to turn on. I left this on. But this PM I tried to find it again (to disable it) but couldn’t seem to locate where it was.
From my perspective all the AWS security setup for EC2 instances, storage, and S3 was straightforward to use and setup, it seemed pretty secure and allowed me to get running with only minimal delay.
First, I didn’t find the more modern, new Azure portal that useful but then I am a Mac user, and it’s probably more suitable for Windows Server admins. The classic portal was as close to the AWS console as I could find and once I discovered it, I never went back.
Setting up a Linux compute instance under Azure was pretty easy, but I would say the choices are a bit overwhelming and trying to find which Linux distro to use was a bit of a challenge. I settled on SUSE Enterprise, but may have made a mistake (EXT4 support was limited to RO – sigh). But configuring SUSE Enterprise Linux without any security was even easier than AWS.
However, Azure compute instance security was not nearly as straightforward as in AWS. In fact, I could find nothing similar to securing your compute instance to “My IP” address like I did in AWS. So, from my perspective my Azure instances are not as secure.
I wanted to be able to SSH/SFTP/SCP into my Linux compute instances on Azure just like I did on AWS. But, there was no easy setup of the identity file (.PEM) like AWS supported. So I spent some time, researching how to create a Cert file with the Mac (didn’t seem able to create a .PEM file). Then more time researching how to create a Cert file on my Linux machine. This works but you have to install OpenSSL, and then issue the proper “create” certificate file command, with the proper parameters. The cert file creation process asks you a lot of questions, one for a pass phrase, and then for a network (I think) phrase. Of course, it asks for name, company, and other identification information, and at the end of all this you have created a set of cert files on your linux machine.
But there’s a counterpart to the .pem file that needs to be on the server to authorize access. This counterpart needs to be placed in a special (.ssh/authorized) directory and I believe needs to be signed by the client needing to be authorized. But I didn’t know if the .cert, .csr, .key or .pem file needed to be placed there and I had no idea how to” sign it”. After spending about a day and a half on all this, I decided to abandon the use of an identity file and just use a password. I believe this provides less security than an identity file.
As for BLOB storage, it was pretty easy to configure a PageBlob for use by my compute instances. It’s security seemed to be tied to the compute instance it was attached to.
As for my PageBlob containers, there’s a button on the classic portal to manage access keys to these. But it said once generated, you will need to update all VMs that access these storage containers with the new keys. Not knowing how to do that. I abandoned all security for my container storage on Azure.
So, all in all, I found Azure a much more manual security setup for Linux systems than AWS and in the end, decided to not even have the same level of security for my Linux SSH/SFTP/SCP services that I did on AWS. As for container security, I’m not sure if there’s any controls on the containers at this point. But I will do some more research to find out more.
In all fairness, this was trying to setup a Linux machine on Azure, which appears more tailored for Windows Server environments. Had I been in an Active Directory group, I am sure much of this would have been much easier. And had I been configuring Windows compute instances instead of Linux, all of this would have also been much easier, I believe.
All in all, I had fun using AWS and Azure services these last few weeks, and I will be doing more over the next couple of months. So I will let you know what else I find as significant differences between AWS and Azure. So stay tuned.
Read an article the other day in Quantum Magazine: A tricky path to quantum encryption about the problems that will occur in current public key cryptology (PKC) schemes when quantum computing emerges over the next five to 30 years. With advances in quantum computing our current PKC scheme that depends on the difficulty of factoring large numbers will be readily crackable. At that time, all current encrypted traffic, used by banks, the NSA, the internet, etc. will no longer be secure.
According to Wikipedia a lattice is a 3-dimensional space of equidistant points. Apparently, for security reasons, they had to increase the number of dimensions significantly beyond 3.
A secret is somehow inscribed in a route (vector) through this 500-dimensional lattice between two points: an original point (the public key) in the lattice and another arbitrary point, somewhere nearby in the lattice. The problem from a cryptographic sense is that finding a route, in a 500 dimensional lattice, is a difficult task when you only have one of the points.
But can it be efficient for digital computers of today to use?
So the various security groups have been working on divising efficient algorithms for multi-dimensional public key encryption over the past decade or so. But they have run into a problem.
Originally, the (public) keys for a 500-dimensional lattice PKC were on the order of MBs, so they have been restricting the lattice computations to utilize smaller keys and in effect reducing the complexity of the underlying lattice. But in the process they have now reduced the security of the lattice PKC scheme. So they are having to go back to longer keys, more complex lattices and trying to ascertain which approach leaves communications secure but is efficient enough to implement by digital computers and communications links of today.
The problem is that quantum computers provide a much faster way to perform certain calculations like factoring a number. Quantum computing can speed up this factorization, by on the order of the square root of a number, as compared to normal digital computing of today.
Its possible that similar quantum computing calculations for lattice routes between points could also be sped up by an equivalent factor. So even when we all move to lattice based PKC, it’s still possible for quantum computers to crack the code hopefully, it just takes longer.
So the mathematics behind PKC will need to change over the next 5 years or so as quantum computing becomes more of a reality. The hope is that this change will will at least keep our communications secure, at least until the next revolution in computing comes along, or quantum computing becomes even faster than that envisioned today.
Move over Dropbox, Box and all you synch&share wannabees, there’s a new synch and share in town.
At SFD7 last month, we were visiting with Connected Data where CEO, Geoff Barrell was telling us all about what was wrong with today’s cloud storage solutions. In front of all the participants was this strange, blue glowing device. As it turns out, Connected Data’s main product is the File Transporter, which is a private file synch and share solution.
All the participants were given a new, 1TB Transporter system to take home. It was an interesting sight to see a dozen of these Transporter towers sitting in front of all the bloggers.
I was quickly, established a new account, installed the software, and activated the client service. I must admit, I took it upon myself to “claim” just about all of the Transporter towers as the other bloggers were still paying attention to the presentation. Sigh, they later made me give back (unclaim) all but mine, but for a minute there I had about 10TB of synch and share space at my disposal.
So what is it. The Transporter is both a device and an Internet service, where you own the storage and networking hardware.
The home-office version comes as a 1 or 2TB 2.5” hard drive, in a tower configuration that plugs into a base module. The base module runs a secured version of Linux and their synch and share control software.
As tower power on, it connects to the Internet and invokes the Transporter control service. This service identifies the node, who owns it, and provides access to the storage on the Transporter to all desktops, laptops, and mobile applications that have access to it.
At initiation of the client service on a desktop/laptop it creates (by default) a new Transporter directory (folder). Files that are placed in this directory are automatically synched to the Transporter tower and then synchronized to any and all online client devices that have claimed the tower.
Apparently you can have multiple towers that are claimed to the same account. I personally tested up to 10 ;/ and it didn’t appear as if there was any substantive limit beyond that but I’m sure there’s some maximum count somewhere.
A couple of nice things about the tower. It’s your’s so you can move it to any location you want. That means, you could take it with you to your hotel or other remote offices and have a local synch point.
Also, initial synchronization can take place over your local network so it can occur as fast as your LAN can handle it. I remember the first time I up-synched 40GB to DropBox, it seemed to take weeks to complete and then took less time to down-synch for my laptop but still days of time. With the tower on my local network, I can synch my data much faster and then take the tower with me to my other office location and have a local synch datastore. (I may have to start taking mine to conferences. Howard (@deepstorage.net, co-host on our GreyBeards on Storage podcast) had his operating in all the subsequent SFD7 sessions.
The Transporter also allows sharing of data. Steve immediately started sharing all the presentations on his Transporter service so the bloggers could access the data in real time.
They call the Transporter a private cloud but in my view, it’s more a private synch and share service.
The Transporter people were all familiar to the SFD crowd as they were formerly with Drobo which was at a previous SFD sessions (see SFD1). And like Drobo, you can install any 2.5″ disk drive in your Transporter and it will work.
There’s workgroup and business class versions of the Transporter storage system. The workgroup versions are desktop configurations (looks very much like a Drobo box) that support up to 8TB or 12TB supporting 15 or 30 users respectively. The also have two business class, rack mounted appliances that have up to 12TB or 24TB each and support 75 or 150 users each. The business class solution has onboard SSDs for meta-data acceleration. Similar to the Transporter tower, the workgroup and business class appliances are bring your own disk drives.
Connected Data’s presentation
Geoff’s discussion (see SFD7 video) was a tour of the cloud storage business model. His view was that most of these companies are losing money. In fact, even Amazon S3/Glacier appears to be bleeding money, although this may not stop Amazon. Of course, DropBox and other synch and share services all depend on cloud storage for their datastores. So, the lack of a viable, profitable business model threatens all of these services in the long run.
But the business model is different when a customer owns the storage. Here the customer owns the actual storage cost. The only thing that Connected Data provides is the client software and the internet service that runs it. Pricing for the 1TB and 2TB transporters with disk drives are $150 and $240.
Having a Transporter
One thing I don’t like is the lack of data-at-rest encryption. They use TLS for data transfers across your LAN and the Internet. But the nice thing about having possession of the actual storage is that you can move it around. But the downside is that you may move it to less secure environments (like conference hotel rooms). And as with the any disk storage, someone can come up to the device and steel the disk. Whether the data would be easily recognizable is another question but having it be encrypted would put that question to rest. There’s some indication on the Transporter support site that encryption may be coming for the business class solution. But nothing was said about the Transporter tower.
On the Mac, the Transporter folder has the shared folders as direct links (real sub-folders) but the local data is under a Transporter Library soft link. It turns out to be a hidden file (“.Transporter Library”) under the Transporter folder. When you Control click on this file your are given the option to view deleted files. You can also do this with shared files as well.
One problem with synch and share services is once someone in your collaboration group deletes some shared files they are gone (over time) from all other group users. Even if some of them wanted them. Transporter makes it a bit easier to view these files and save them elsewhere. But I assume at some point they have to be purged to free up space.
When I first installed the Transporter, it showed up as a network node on my finder shared servers. But the latest desktop version (3.1.17) has removed this.
Also some of the bloggers complained about files seeing files “in flux” or duplicates of the shared files but with unusual file suffixes appended to them, such as ” filename124224_f367b3b1-63fa-4d29-8d7b-a534e0323389.jpg”. Enrico (@ESignoretti) opened up a support ticket on this and it’s supposedly been fixed in the latest desktop and was a temporary filename used only during upload and should have been deleted-renamed after the upload was completed. I just uploaded 22MB with about 40 files and didn’t see any of this.
I really want encryption as I wanted one transporter in a remote office and another in the home office with everything synched locally and then I would hand carry the remote one to the other location. But without encryption this isn’t going to work for me. So I guess I will limit myself to just one and move it around to wherever I want to my data to go.
Here are some of the other blog posts by SFD7 participants on Transporter:
Was reading an article the other day from TechCrunch that said Servers need to die to save the Internet. This article talked about a startup called MaidSafe which is attempting to re-architect/re-implement/replace the Internet into a Peer-2-Peer, mesh network and storage service which they call the SAFE (Secure Access for Everyone) network. By doing so, they hope to eliminate the need for network servers and storage.
Sometime in the past I wrote a blog post about Peer-2-Peer cloud storage (see Free P2P Cloud Storage and Computing if interested). But it seems MaidSafe has taken this to a more extreme level. By the way the acronym MAID used in their name stands for Massive Array of Internet Disks, sound familiar?
Crypto currency eco-system
The article talks about MaidSafe’s SAFE network ultimately replacing the Internet but at the start it seems more to be a way to deploy secure, P2P cloud storage. One interesting aspect of the MaidSafe system is that you can dedicate a portion of your Internet connected computers’ storage, computing and bandwidth to the network and get paid for it. Assuming you dedicate more resources than you actually use to the network you will be paid safecoins for this service.
For example, users that wish to participate in the SAFE network’s data storage service run a Vault application and indicate how much internal storage to devote to the service. They will be compensated with safecoins when someone retrieves data from their vault.
Safecoins are a new BitCoin like internet currency. Currently one safecoin is worth about $0.02 but there was a time when BitCoins were worth a similar amount. MaidSafe organization states that there will be a limit to the number of safecoins that can ever be produced (4.3Billion) so there’s obviously a point when they will become more valuable if MaidSafe and their SAFE network becomes successful over time. Also, earned safecoins can be used to pay for other MaidSafe network services as they become available.
Application developers can code their safecoin wallet-ids directly into their apps and have the SAFE network automatically pay them for application/service use. This should make it much easier for App developers to make money off their creations, as they will no longer have to use advertising support, or provide differenct levels of product such as free-simple user/paid-expert use types of support to make money from Apps. I suppose in a similar fashion this could apply to information providers on the SAFE network. An information warehouse could charge safecoins for document downloads or online access.
All data objects are encrypted, split and randomly distributed across the SAFE network
The SAFE network encrypts and splits any data up and then randomly distributes these data splits uniformly across their network of nodes. The data is also encrypted in transit across the Internet using rUDPs (reliable UDPs) and SAFE doesn’t use standard DNS services. Makes me wonder how SAFE or Internet network nodes know where rUDP packets need to go next without DNS but I’m no networking expert. Apparently by encrypting rUDPs and not using DNS, SAFE network traffic should not be prone to deep packet inspection nor be easy to filter out (except of course if you block all rUDP traffic). The fact that all SAFE network traffic is encrypted also makes it much harder for intelligence agencies to eavesdrop on any conversations that occur.
The SAFE network depends on a decentralized PKI to authenticate and supply encryption keys. All SAFE network data is either encrypted by clients or cryptographically signed by the clients and as such, can be cryptographically validated at network endpoints.
The each data chunk is replicated on, at a minimum, 4 different SAFE network nodes which provides resilience in case a network node goes down/offline. Each data object could potentially be split up into 100s to 1000s of data chunks. Also each data object has it’s own encryption key, dependent on the data itself which is never stored with the data chunks. Again this provides even better security but the question becomes where does all this metadata (data object encryption key, chunk locations, PKI keys, node IP locations, etc.) get stored, how is it secured, and how is it protected from loss. If they are playing the game right, all this is just another data object which is encrypted, split and randomly distributed but some entity needs to know how to get to the meta-data root element to find it all in case of a network outage.
Supposedly, MaidSafe can detect within 20msec. if a node is no longer available and reconfigure the whole network. This probably means that each SAFE network node and endpoint is responsible for some network transaction/activity every 10-20msec, such as a SAFE network heartbeat to say it is still alive.
It’s unclear to me whether the encryption key(s) used for rUDPs and the encryption key used for the data object are one and the same, functionally related, or completely independent? And how a “decentralized PKI” and “self authentication” works is beyond me but they published a paper on it, if interested.
For-profit open source business model
MaidSafe code is completely Open Source (available at MaidSafe GitHub) and their APIs are freely available to anyone and require no API key. They also have multiple approved and pending patents which have been provided free to the world for use, which they use in a defensive capacity.
MaidSafe says it will take a 5% cut of all safecoin transactions over the SAFE network. And as the network grows their revenue should grow commensurately. The money will be used to maintain the core network software and MaidSafe said that their 5% cut will be shared with developers that help develop/fix the core SAFE network code.
They are hoping to have multiple development groups maintaining the code. They currently have some across Europe and in California in the US. But this is just a start.
They are just now coming out of stealth, have recently received $6M USD investment (by auctioning off MaidSafeCoins a progenitor of safecoins) but have been in operation now, architecting/designing/developing the core code now for 8+ years now, which probably qualifies them for the longest running startup on the planet.
Replacing the Internet
MaidSafe believes that the Internet as currently designed is too dependent on server farms to hold pages and other data. By having a single place where network data is held, it’s inherently less secure than by having data spread out, uniformly/randomly across a multiple nodes. Also the fact that most network traffic is in plain text (un-encrypted) means anyone in the network data path can examine and potentially filter out data packets.
I am not sure how the SAFE network can be used to replace the Internet but then I’m no networking expert. For example, from my perspective, SAFE is dependent on current Internet infrastructure to store and forward rUDPs on along its trunk lines and network end-paths. I don’t see how SAFE can replace this current Internet infrastructure especially with nodes only present at the endpoints of the network.
I suppose as applications and other services start to make use of SAFE network core capabilities, maybe the SAFE network can become more like a mesh network and less dependent on the current hub and spoke current Internet we have today. As a mesh network, node endpoints can store and forward packets themselves to locally accessed neighbors and only go out on Internet hubs/trunk lines when they have to go beyond the local network link.
Moreover, the SAFE can make any Internet infrastructure less vulnerable to filtering and spying. Also, it’s clear that SAFE applications are no longer executing in data center servers somewhere but rather are actually executing on end-point nodes of the SAFE network. This has a number of advantages, namely:
SAFE applications are less susceptible to denial of service attacks because they can execute on many nodes.
SAFE applications are inherently more resilient because the operate across multiple nodes all the time.
SAFE applications support faster execution because the applications could potentially be executing closer to the user and could potentially have many more instances running throughout the SAFE network.
Still all of this doesn’t replace the Internet hub and spoke architecture we have today but it does replace application server farms, CDNs, cloud storage data centers and probably another half dozen Internet infrastructure/services I don’t know anything about.
Yes, I can see how MaidSafe and its SAFE network can change the Internet as we know and love it today and make it much more secure and resilient.
Not sure how having all SAFE data being encrypted will work with search engines and other web-crawlers but maybe if you want the data searchable, you just cryptographically sign it. This could be both a good and a bad thing for the world.
Nonetheless, you have to give the MaidSafe group a lot of kudos/congrats for taking on securing the Internet and making it much more resilient. They have an active blog and forum that discusses the technology and what’s happening to it and I encourage anyone interested more in the technology to visit their website to learn more
Snowden at SXSW said last week that it’s up to the vendors to encrypt customer data. I think he was talking mostly about data-in-flight but there’s just a big an exposure for data-at-rest, maybe more so because then, all the data is available, at one sitting.
The documents said that Apple iMessage uses public key encryption where every IOS/OS X device generates a pair of public and private keys (one for messages and one for signing) which are used to encrypt the data while it is transmitted through Apple’s iMessage service. Apple encrypts the data on its iMessage App running in the devices with every destination device’s public key before it’s saved on the iMessage server cloud, which can then be decrypted on the device with its private key whenever the message is received by the device.
It’s a bit more complex for longer messages and attachments but the gist is that this data is encrypted with a random key at the device and is saved in encrypted form while residing iMessage servers. This random key and URI is then encrypted with the destination devices public keys which is then stored on the iMessage servers. Once the destination device retrieves the message with an attachment it has the location and the random key to decrypt the attachment.
According to Apple’s documentation when you start an iMessage you identify the recipient, the app retrieves the public keys for all these devices and then it encrypts the message (with each destination device’s public message key) and signs the message (with the originating device’s private signing key). This way Apple servers never see the plain text message and never holds the decryption keys.
Synch & share data security today
As mentioned in prior posts, I am now a Dropbox user and utilize this service to synch various IOS and OSX device file data. Which means a copy of all this synch data is sitting on Dropbox (AWS S3) servers, someplace (possibly multiple places) in the cloud.
Dropbox data-at-rest security is explained in their How secure is Dropbox document. Essentially they use SSL for data-in-flight security and AES-256 encryption with a random key for data-at-rest security.
This probably makes it easier to support multiple devices and perhaps data sharing because they only need to encrypt/save the data once and can decrypt the data on its servers before sending it through (SSL encrypted, of course) to other devices.
The only problem is that Dropbox holds all the encryption keys for all the data that sits on its servers. I (and possibly the rest of the tech community) would much prefer that the data be encrypted at the customer’s devices and never decrypted again except at other customer devices. This would be true end-to-end data security for sync&share
As far as I know from a data-at-rest security perspective Box looks about the same, so does EMC’s Syncplicity, Oxygen Cloud, and probably all the others. There are some subtle differences about how and where the keys are kept and how many security domains exist in each service, but in the end, the service holds the keys to all data that is encrypted on their storage cloud.
Public key cryptography to the rescue
I think we could do better and public key cryptography should show us the way. I suppose it would probably be easiest to follow the iMessage approach and just encrypt all the data with each device’s public key at the time you create/update the data and send it to the service but,
That would further delay the transfer of new and updated data to the synch service, also further delaying its availability at other devices linked to the login.
That would cause the storage requirement for your sync&share data to be multiplied by the number of devices you wish to synch with.
Synch data-at-rest security
If we just take on the synch side of the discussion first maybe it would be easiest. For example, if a new public and private key pair for encryption and signing were to be assigned to each new device at login to the service then the service could retain a directory of the device’s public keys for data encryption and signing.
The first device to login to a synch service with a new user-id, would assign a single encryption key for all data to be shared by all devices that could use this login. As other devices log into the service, the prime device sends the single service encryption key encrypted using the target device’s public key and signing the message with the source device’s private key. Actually any device in the service ring could do this but the primary device could be used to authenticate the new devices login credentials. Each device’s synch service would have a list of all the public keys for all the devices in the “synch” region.
As data is created or updated there are two segments of each file that are created, the AES-256 encrypted data package using the “synch” region’s random encryption key and the signature package, signed by the device doing the creation/update of the file. Any device could authenticate the signature package at the time it receives a file, as could the service. But ONLY the devices with the AES-256 encryption key would have access to the plain text version of the data.
There are some potential holes in this process, first is that the service could still intercept the random encryption key, at the primary device when it’s created or could retrieve it anytime later at its leisure using the app running in the device. This same exposure exists for the iMessage App running in IOS/OS X devices, the private keys in this instance could be sent to another party at any time. We would need to depend on service guarantees to not do this.
Share data-at-rest security
For Apple’s iMessage attachment security the data is kept in the cloud encrypted by a random key but the key and the URI are sent to the devices when they receive the original message. I suppose this could just as easily work for a file share service but the sharing activity might require a share service app running in the target device to create public-private key pairs and access the file.
Yes this leaves any “shared” data keys being held by the service but it can’t be helped. The data is being shared with others so maybe having it be a little more accessible to prying eyes would be acceptable.
I still prefer the iMessage approach, having multiple copies of encrypted shared data, that is encrypted by each device’s public key. It’s simpler this way, a bit more verifiable and doesn’t need to have as much out-of-channel communication (to send keys to other devices).
Yes it would cost more to store any amount of data and would take longer to transmit, but I feel we would all would be willing to support this extra constraints as long as the service guaranteed that private keys were only kept on devices that have logged into the service.
Data-at-rest and -in-flight security is becoming more important these days. Especially since Snowden’s exposure of what’s happening to web data. I love the great convenience of sync&share services, I just wish that the encryption keys weren’t so vulnerable…