At USENIX ATC conference a couple of weeks ago there was a presentation by a number of researchers on their BlockStack global name space and storage system based on the blockchain based Bitcoin network. Their paper was titled “Blockstack: A global naming and storage system secured by blockchain” (see pg. 181-194, in USENIX ATC’16 proceedings).
Bitcoin blockchain simplified
Blockchain’s like Bitcoin have a number of interesting properties including completely distributed understanding of current state, based on hashing and an always appended to log of transactions.
Blockchain nodes all participate in validating the current block of transactions and some nodes (deemed “miners” in Bitcoin) supply new blocks of transactions for validation.
All blockchain transactions are sent to each node and blockchain software in the node timestamps the transaction and accumulates them in an ordered append log (the “block“) which is then hashed, and each new block contains a hash of the previous block (the “chain” in blockchain) in the blockchain.
The miner’s block is then compared against the non-miners node’s block (hashes are compared) and if equal then, everyone reaches consensus (agrees) that the transaction block is valid. Then the next miner supplies a new block of transactions, and the process repeats. (See wikipedia’s article for more info).
All blockchain transactions are owned by a cryptographic address. Each cryptographic address has a public and private key associated with it.
BlockStack’s distributed global name service
BlockStack is a distributed storage system that uses blockchains on Bitcoin to securely define a global name space. The name’s are all tied to values which representing URI(URL)s to storage systems like AWS S3 but could be any cloud storage service.
BlockStack global name service
The architecture for BlockStack is in four layers. At the bottom is the Blockchain layer. BlockStack name space transactions are encoded into normal BitCoin transactions using special fields to define a name and an operation to be done on the name as well as provide a hash of the current zone file.
The next layer up is the Virtualchain layer which provides a state machine for global name space processing. New names have to be preordered and if that’s successful then they can be registered. Names are created in a internet like domain name environment or zones. Each zone essentially lists all the names in their domain (like “.biz”) and their associated URI (the S3 handle for the file). There can be other information in the name record like a data hash. But in the zone record is the name of the domain and the hash of the entire zone file (consisting of a list of name records underneath that domain/zone name).
The next layer up is the Routing layer. This consists of the contents of the zone files and BlockStack nodes. BlockStack nodes store zone files in a distributed hash table (DHT) peer-to-peer network. Most (?) BlockStack servers store a complete image of all zone files in the system. Zone files are relatively small (4KB/file). Although the paper doesn’t discuss their DHT implementation, any distributed data structure that supports random access to a name-value pair would suffice. Zone file updates are only made if the zone file hash contained in the transaction matches the current hash of the zone file, and the transaction block has been validated.
Finally, at the top is the Storage layer. This contains the data storage and could be any cloud storage that uses a URI or similar locator to provide access.
Data is supposed to be compressed and (private key) encrypted as well as stored with a digital signature based on the public key of the “cryptographic address” owner of the name registered to the zone file.
For WORM (immutable) data, the name record in the domain/zone file includes a hash of the data and its digital signature which used to validate the data. For normal (modifiable) data, no hash is needed as the data is rewritten each time and the data is verified using the public key of the owner and the digital signature of the data.
In the paper the researchers showed some performance information on the “overhead” of BlockStack for reading/writing and storing 1mb, 10mb and 100mb files in Amazon S3.
For the 100MB file the CPU bound overhead is around 3 seconds (verification and serialization time in chart) for the data read and about 2 seconds (signing and serialization time in chart) for data writes.
Mind you their system was written in Python and there are plenty of optimizations that could be made.
IThere’s a lot more in the paper about verifying names, bootstrapping new nodes, (BitCoin) pricing for new name registration, etc. Their solution is up and operating on top of BitCoin (join here) and it’s all open source.
There’s lot’s of potential here, especially for distributed cloud storage name spaces. But a global name space that spans the whole world seems a bit much for data storage and is somewhat redundant with the Internet DNS. But DNS is not nearly as secure as BlockStack, nor does DNS supply any verification of the data located at the URI.
From my perspective it makes an awful lot more sense as a public key infrastructure (PKI) solution, their original solution was a distributed PKI. But I could see something like this for published data sets – maybe.