Scality’s Open Source S3 Driver

The view from Scality’s conference room

We were at Scality last week for Cloud Field Day 1 (CFD1) and one of the items they discussed was their open source S3 driver. (Videos available here).

Scality was on the 25th floor of a downtown San Francisco office tower. And the view outside the conference room was great. Giorgio Regni, CTO, Scality, said on the two days a year it wasn’t foggy out, you could even see Golden Gate Bridge from their conference room.


img_6912As you may recall, Scality is an object storage solution that came out of the telecom, consumer networking industry to provide Google/Facebook like storage services to other customers.

Scality RING is a software defined object storage that supports a full complement of interface legacy and advanced protocols including, NFS, CIGS/SMB, Linux FUSE, RESTful native, SWIFT, CDMI and Amazon Web Services (AWS) S3. Scality also supports replication and erasure coding based on object size.

RING 6.0 brings AWS IAM style authentication to Scality object storage. Scality pricing is based on usable storage and you bring your own hardware.

Giorgio also gave a session on the RING’s durability (reliability) which showed they support 13-9’s data availability. He flashed up the math on this but it was too fast for me to take down:)

Scality has been on the market since 2010 and has been having a lot of success lately, having grown 150% in revenue this past year. In the media and entertainment space, Scality has won a lot of business with their S3 support. But their other interface protocols are also very popular.

Why S3?

It looks as if AWS S3 is becoming the defacto standard for object storage. AWS S3 is the largest current repository of objects. As such, other vendors and solution providers now offer support for S3 services whenever they need an object/bulk storage tier behind their appliances/applications/solutions.

This has driven every object storage vendor to also offer S3 “compatible” services to entice these users to move to their object storage solution. In essence, the object storage industry, like it or not, is standardizing on S3 because everyone is using it.

But how can you tell if a vendor’s S3 solution is any good. You could always try it out to see if it worked properly with your S3 application, but that involves a lot of heavy lifting.

However, there is another way. Take an S3 Driver and run your application against that. Assuming your vendor supports all the functionality used in the S3 Driver, it should all work with the real object storage solution.

Open source S3 driver

img_6916Scality open sourced their S3 driver just to make this process easier. Now, one could just download their S3server driver (available from Scality’s GitHub) and start it up.

Scality’s S3 driver runs ontop of a Docker Engine so to run it on your desktop you would need to install Docker Toolbox for older Mac or Windows systems or run Docker for Mac or Docker for Windows for newer systems. (We also talked with Docker at CFD1).

img_6933Firing up the S3server on my Mac

I used Docker for Mac but I assume the terminal CLI is the same for both.Downloading and installing Docker for Mac was pretty straightforward.  Starting it up took just a double click on the Docker application, which generates a toolbar Docker icon. You do need to enter your login password to run Docker for Mac but once that was done, you have Docker running on your Mac.

Open up a terminal window and you have the full Docker CLI at your disposal. You can download the latest S3 Server from Scality’s Docker hub by executing  a pull command (docker pull scality/s3server), to fire it up, you need to define a new container (docker run -d –name s3server -p 8000:8000 scality/s3server) and then start it (docker start s3server).

It’s that simple to have a S3server running on your Mac. The toolbox approach for older Mac’s and PC’S is a bit more complicated but seems simple enough.

The data is stored in the container and persists until you stop/delete the container. However, there’s an option to store the data elsewhere as well.

I tried to use CyberDuck to load some objects into my Mac’s S3server but couldn’t get it to connect properly. I wrote up a ticket to the S3server community. It seemed to be talking to the right port, but maybe I needed to do an S3cmd to initialize the bucket first – I think.

[Update 2016Sep19: Turns out the S3 server getting started doc said you should download an S3 profile for Cyberduck. I didn’t do that originally because I had already been using S3 with Cyberduck. But did that just now and it now works just like it’s supposed to. My mistake]


Anyways, it all seemed pretty straight forward to run S3server on my Mac. If I was an application developer, it would make a lot of sense to try S3 this way before I did anything on the real AWS S3. And some day, when I grew tired of paying AWS, I could always migrate to Scality RING S3 object storage – or at least that’s the idea.


NetApp updates their StorageGRID Webscale solution

grid001NetApp announced a new version of their object storage solution, the StorageGRID WebScale 10.3.

At a former employer, I first talked with StorageGRID (Bycast at the time) a decade or so ago. At that time, they were focused on medical and healthcare verticals and had a RAIN (redundant array of independent nodes) storage solution.  It has come a long way.

StorageGRID Business is booming

On the call, NetApp announced they sold 50PB of StorageGRID in FY’16 with 20PB of that in the last quarter and also reported 270% Y/Y revenue growth, which means they are starting to gain some traction in the marketplace. Are we seeing an acceleration of object storage adoption?

As you may recall, StorageGRID comes in a software only solution that runs on just about any white box server with DAS or as two hardware appliances: the SG5612 (12 drive); and the SG5660 (60 drive) nodes. You can mix and match any appliance with any white box software only solution, they don’t have to have the same capacity or performance. But all nodes need network and controller/admin node(s) access.

StorageGRID past

grid002Somewhere during Bycast’s journey they developed support for tape archives and information lifecycle management (ILM) for objects. The previous generation, StorageGrid 10.2 had a number of features, including:

  • S3 cloud archive support that allowed objects to be migrated to AWS S3 as they were no longer actively accessed
  • NAS bridge support that allowed CIFS/SMB or NFS access to StorageGRID objects, which could also be read as S3 objects for easier migration to/from object storage;
  • Hierarchical erasure coding option that was optimized for efficiently storing large objects;
  • Node level erasure coding support that can be used to rebuild data for node drive failures, without having to go outside the node data retrieval;
  • Object byte-granular range read support that allowed users to read an object at any byte offset without requiring rebuild;
  • Support for OpenStack Swift API that made StorageGRID objects natively available to any OpenStack service; and
  • Software support for running as Docker containers or as a VM under VMware ESX, or OpenStack KVM that allowed StorageGRID software to run just about anywhere.

StorageGRID present and future

grid003But customers complained StorageGRID was too complex to install and update which required too much hand holding by NetApp professional services. StorageGRID Webscale 10.3 was targeted to address these deficiencies. Some of the features in StorageGrid 10.3, include:

  • Radically simplified, more modern UI, new dashboard and policy wizard/editor, so that it’s a lot easier to manage the StorageGRID. All features of the UI are also available via RESTfull API access and the UI is the same for white box, software only implementations as well as appliance configurations.
  • Simplified automated installation scripts, so that installations that used to take multiple steps, separate software installs and required professional services support, now use a full-solution software stack install, take only minutes and can be done by the customers alone;
  • S3 object versioning support, so that objects can have multiple versions, limited via the UI, if needed, but provide a snapshot-like capability for S3 data that protects against object accidental deletion.
  • grid004ILM policy change predictions/modeling, so that admins can now see how changes to ILM policies will impact StorageGRID.
  • Even more flexibility in DAS storage, so that future StorageGRID configurations can support 10TB drives and 6TB FIPS-140 drive encryption support, which adds to the current drive capacity and data security options already available in StorageGRID.

To top it all off, StorageGRID 10.3 improves performance for both small (30KB) and large (300MB) object get/puts.

  • Small S3 Load Data Router (LDR, 1-thread) object performance has improved ~4X for both PUTs and GETs; and
  • Large S3 LDR (1-thread) object performance has improved ~2X for PUTs and ~4X for GETs.

Object storage market heating up

grid005Apparently, service providers are adopting object storage to  provide competition to AWS, Azure and Google cloud storage for backup and storage archives as well as for DR as a service. Also, many media and other customers managing massive data repositories are turning to object storage to support their multi-site, very large file libraries.  And as more solution vendors support S3 object protocols for data access and archive, something like StorageGRID can become their onsite-offsite storage alternative.

And Amazon, Azure and Google are starting to realize that most enterprise customers are not going to leap to the cloud for everything they do. So, some sort of hybrid solution is needed for the long term. Having an on premises and off premises object storage solution that can also archive/migrate data to the cloud is a great hybrid alternative that takes enterprises one step closer to the cloud.


Hedvig storage system, Docker support & data protection that spans data centers

Hedvig003We talked with Hedvig (@HedvigInc) at Storage Field Day 10 (SFD10), a month or so ago and had a detailed deep dive into their technology. (Check out the videos of their sessions here.)

Hedvig implements a software defined storage solution that runs on X86 or ARM processors and depends on a storage proxy operating in a hypervisor host (as a VM) and storage service nodes. Their proxy and the storage services can execute as separate VMs on the same host in a hyper-converged fashion or on different nodes as a separate storage cluster with hosts doing IO to the storage cluster.

Hedvig’s management team comes from hyper-scale environments (Amazon Dynamo/Facebook Cassandra) so they have lots of experience implementing distributed software defined storage at (hyper-)scale.
Continue reading “Hedvig storage system, Docker support & data protection that spans data centers”

When 64 nodes are not enough

Why would VMware with years of ESX development behind them want to develop a whole new virtualization system for Docker and other container frameworks. Especially since they already have a compatible Docker support in their current product line.

The main reason I can think of is that a 64 node cluster may be limiting to some container services and the likelihood of VMware ESX/vSphere to supporting 1000s of nodes in a single cluster seems pretty unlikely. So given that more and more cloud services are being deployed across 1000s of nodes using container frameworks, VMware had to do something or say goodbye to a potentially lucrative use case for virtualization.

Yes over time VMware may indeed extend vSphere clusters to 128 or even 256 nodes but by then the world will have moved beyond VMware services for these services and where will VMware be then – left behind.

Photon to the rescue

With the new Photon system VMware has an answer to anyone that needs 1000 to 10,000 server cluster environments. Now these customers can easily deploy their services on a VMware Photon Platform which is was developed off of ESX but doesn’t have any cluster limitations of ESX.

Thus, the need for Photon was now. Customers can easily deploy container frameworks that span 1000s of nodes. Of course it won’t be as easy to manage as a 64 node vSphere cluster but it will be easy automated and easier to deploy and easier to scale when necessary, especially beyond 64 nodes.

The claim is that the new Photon will be able to support multiple container frameworks without modification.

So what’s stopping you from taking on the Amazons, Googles, and Apples of the worlds data centers?

  • Maybe storage, but then there’s ScaleIO, and the other software defined storage solutions that are there to support local DAS clusters spanning almost incredible sizes of clusters.
  • Maybe networking, I am not sure just where NSX is in the scheme of things but maybe it’s capable of handling 1000s of nodes and maybe not but networking could be a clear limitation to what how many nodes can be deployed in this sort of environment.

Where does this leave vSphere? Probably continuation of the current trajectory, making easier and more efficient to run VMware clusters and over time extending any current limitations. So for the moment two development streams based off of ESX and each being enhanced for it’s own market.

How much of ESX survived is an open question but it’s likely that Photon will never see the VMware familiar services and operations that is readily available to vSphere clusters.


Photo Credit(s): A first look into Dockerfile system

#VMworld2015 day 1 announcements


IMG_5411It seemed like today was all about the cloud and cloud native apps. Among the many announcements, VMware announced two key new capabilities: VMware integrated containers and the Python Photon Platform.

Containers running on VMware

  • VMware vSphere Integrated Containers is an implementation of containers that runs natively under vSphere. The advantage of this solution is that now when developers fire up a multi-container app,  each container now exists as a separate VM under vSphere and can be managed, monitored and secured just like any other VM in the environment. Previously a multi-container app would be one VM per container engine  containing potentially many containers running under the single VM. But with vSphere Integrated Containers, the container engine and the light weight Linux kernel (Python Photon OS) are now integrated into the ESX hypervisor so each container runs as a native VM. Integrated containers is an follow on to a combination of Project Bonneville, Project Python Photon (OS) and Instant clones. Recall with Instant Clones one can spin up a clone of a VM in less than a second and its memory footprint is 0MB.
  • Python Photon Platform takes container execution to a whole new level, with a new deployment of a hypervisor tailor made to run containers (not VMs). With the Python Photon Platform one natively runs container frameworks underneath the platform. Python Photon Platform consists of Python Photon Machine which is Python Photon OS (lightweight Linux Kernel distro) & the new Microvisor (new light weight hypervisor for container hardware calls) and Python Photon Controller which is a distributed control plane and management API. With Python Photon Platform one can manage 100K to Millions of containers, running under 1000s of container frameworks.

Over time Python Photon Platform is intended to be open sourced. VMware also announced a bundling of Pivotal Cloud Foundry with the Python Photon Platform so as to better run cloud native apps implemented in Cloud Foundry. But the ultimate intent is to provide support for Google Kubernetes, Apache Mesos and any other container framework that comes out.

So now you can run your Docker container apps or any other container app solution in two different ways. One depends on vSphere standard management platform and runs container apps as a standard VMs. The other takes a completely green field approach and runs container frameworks natively in a ground up new hypervisor solution with a new management solution altogether that scales.

The advantage of Python Photon is that it scales to extreme, cloud level types of application environments. Python Photon is intended to run cloud-native apps.

vCloud Air extensions

One of the other major things that VMware demoed today was moving a VM from on premises to vCloud Air and back again – a real crowd pleaser. One VMware Exec said that after MIT had convinced them they needed to be able to move apps from on premises to the cloud for dev-test apps. They then turned around and decided they wanted to move dev-test activity back to their onprem environment and instead wanted to move their production to vCloud Air.

They demoed both capabilities using vMotion to move a VM to vCloud Air and using it again to move it back. The nice thing about all this is that all the security and other attributes of the VM can move to the cloud and back again along with the VM. All the while the VM continued to operate, with no disruption to execution. They mention that it could potentially take hours to move the data for the VM.

IMG_5413There were a number of other capabilities announced today including EVO SDDC (EVO: RACK reborn) which includes a new datacenter management solution. Customers can now roll in a rack of servers and have EVO SDDC manage them and deploy software defined data center on them in a matter of hours. Within EVO SDDC you can have application domains which span racks of servers but provide isolation and management multi-tennancy.

NSX 6.2 was also discussed and essentially is key to extending your networking from on premises to vCloud Air. With NSX 6.2 local routing, micro segmentation security and app firewalls can be configured locally and then be “extended” to the vCloud Air environment.

Lots of moving parts here and I probably missed some key components to these solutions and didn’t cover any of them well enough other than to give a feel for what they are.

But one thing is clear, VMware’s long term strategy is to take your native, on premises VMs to vCloud Air and back again as well as if your Dev-Ops group or any other BU wants to use containers to implement cloud apps, VMware has you covered coming and going.