Object store and hybrid clouds at Cloudian

IMG_4364Out of Japan comes another object storage provider called Cloudian.  We talked with Cloudian at Storage Field Day 7 (SFD7) last month in San Jose (see the videos of their presentations here).

Cloudian history

Cloudian has been out on the market since March of 2011 but we haven’t heard much about them, probably because their focus has been East Asia.  The same day that the  Tōhoku Earthquake and Tsunami hit the company announced Cloudian, an Amazon S3 Compliant Multi-Tenant Cloud Storage solution.

Their timing couldn’t have been better. Japanese IT organizations were beating down their door over the next two years for a useable and (earthquake and tsunami) resilient storage solution.

Cloudian spent the next 2 years, hardening their object storage system, the HyperStore, and now they are ready to take on the rest of the world.

Currently Cloudian has about 20PB of storage under management and are shipping a HyperStore Appliance or a software only distribution of their solution. Cloudian’s solutions support S3 and NFS access protocols.

Their solution uses Cassandra, a highly scaleable, distributed NoSQL database which came out of FaceBook for their meta-data database. This provides a scaleable, non-sharable meta-data data base for object meta-data repository and lookup.

Cloudian creates virtual storage pools on backend storage which can be optimized for small objects, replication or erasure coding and can include automatic tiering to any Amazon S3 and Glacier compatible cloud storage. I would guess this is how they qualify for Hybrid Cloud status.

The HyperStore appliance

Cloudian creates a HyperStore P2P ring structure. Each appliance has Cloudian management console services as well as the HyperStore engine which supports three different data stores: Cassandra, Replicas, and Erasure coding. Unlike Scality, it appears as if one HyperStore Ring must exist in a region. But it can be split across data centers. Unclear what their definition of a “region” is.

HyperStore hardware come in entry level (HSA-2024: 24TB/1U), capacity optimized (HSA-2048: 48TB/1U), performance optimized (HSA-2060: all flash, 60TB/2U

Replication with Dynamic Consistency

The other thing that Cloudian supports is different levels of consistency for replicated data. Most object stores support eventual consistency (see Eventual Data Consistency and Cloud Storage post).  HyperStore supports 3 (well maybe 5) different levels of consistency:

  1. One – object written to one replica and committed there before responding to client
  2. Quorum – object written to N/2+1 replicas before responding to client
    1. Local Quorum – replicas are written to N/2+1 nodes in same data center  before responding to client
    2. Each Quorum – replicas are written to N/2+1 nodes in each data center before responding to client.
  3. All – all replicas must have received and committed the object write before responding to client

There are corresponding read consistency levels as well. The objects writes have a “coordinator” node which handles this consistency. The implication is that consistency could be established on an object basis. Unclear to me whether Read and Write dynamic consistency can be different?

Apparently small objects are also stored in the  Cassandra datastore.  That way HyperStore optimizes for object size. Also, HyperStore nodes can be added to a ring and the system will auto balance the data across the old and new nodes automatically.

Cloudian also support object versioning, ACL, and QoS services as well.

~~~

I was a bit surprised by Cloudian. I thought I knew all the object storage solutions out on the market. But then again they made their major strides in Asia and as an on-premises Amazon S3 solution, rather than a generic object store.

For more on Cloudian from SFD7 see:

Cloudian – Storage Field Day 7 Preview by @VirtualizedGeek (Keith Townsend)

Interesting sessions at SNIA DSI Conference 2015

I attended the SNIA Data Storage Innovation (DSI) Conference in Santa Clara, CA last week and ran into a number of old friends and met a few new ones. While attending the conference, there were a few sessions that seemed to bring the conference to life for me.

Microsoft Software Defined Storage Journey

Jose Barreto, Principal Program Manager – Microsoft, spent a little time on what’s currently shipping with Scale-out File Service, Storage Spaces and other storage components of Windows software defined storage solutions. Essentially, what Microsoft is learning from Azure cloud deployments it is slowly but surely being implemented in Windows Server software and other solutions.

Microsoft ‘s vision is that customers can have their own private cloud storage with partner storage systems (SAN & NAS), with Microsoft SDS (Scale-out File Server with Storage Spaces), with hybrid cloud storage (StorSimple with Azure storage) and public cloud storage (Azure storage).

Jose also mentioned other recent innovations like the Cloud Platform System using Microsoft software, Dell compute, Force 10 networking and JBOD (PowerVault MD3060e) storage in a rack.

Some recent Microsoft SDS innovations include:

  • HDD and SSD storage tiering;
  • Shared volume storage;
  • System Center volume and unified storage management;
  • PowerShell integration;
  • Multi-layer redundancy across nodes, disk enclosures, and disk devices; and
  • Independent scale-out of compute or storage.

Probably a few more I’m missing here but these will suffice.

Then, Jose broke some news on what’s coming next in Windows Server storage offerings:

  • Quality of service (QoS) – Windows Server provides QoS capabilities which allows one to limit the IO activity and can be used to specify min and max IOPS or latency at a VM or VHD level. The scale-out storage service will balance the IO activity across the cluster to meet this QoS specification. Apparently the balancing algorithm came from Microsoft Research but Jose didn’t go into great detail on what it did differently other than being “fairer” applying QoS constraints.
  • Rolling upgrades – Windows Server now supports a cluster running different versions of software. Now one can take a cluster node down and update its software and re-activate it into the same cluster. Previously, code upgrades had to take a whole cluster down at a time.
  • Synchronous replication – Windows Server now supports synchronous Storage Replicast the volume level. Previously Storage Replicas were limited to asynch.
  • Higher VM storage resiliency – Windows will now pause a VM rather than terminate it during transient storage interruptions. This allows VMs to sustain operations across transient outages. VMs are in PausedCritical state until the storage comes back and then they are restarted automatically.
  • Shared-nothing Storage Spaces – Windows Storage Spaces can be configured across cluster nodes without shared storage. Previously, Storage Spaces required shared JBOD storage between cluster nodes. This feature removes this configuration constraint and allows JBOD storage to only be accessible fro a single node.

Jose did not name what this  “Vnext” was going to be called and didn’t provide a specific time frame other than it’s coming out shortly.

Archival Disc Technology

Yasumori  Hino from Panasonic and Jun Nakano from Sony presented information on a brand new removable media technology or Cold Storage. Previous to there session there was another one from HDS Federal Corporation on their BluRay jukebox but Yasumori’s and Jun’s session was more noteworthy.The  new Archive Disc is the next iteration in optical storage beyond BlueRay and targeted at long term archive or “cold” storage.

As a prelude to the Archive Disc discussion Yasumori played a CD that was pressed in 1982 (52nd Street, Billy Joel album) on his current generation laptop to show the inherent downward compatibility in optical disc technology.

In 1980 IBM 3480 disk drives were refrigerator sized, multi $10K devices, and held 2.3GB. As far as I know there aren’t any of these still in operation. And IBM/STK tape was reel to reel and took up a whole rack. There may be a few of these devices still operating these days but not many.  I still have a CD collection (but then I am a GreyBeard 🙂 that I still listen to occasionally.

IMG_4399The new Archive Disc includes:

  • More resilient media to high humidity, high temperature, salt water, and EMP and other magnetic disturbances. As proof, a BlueRay disk was submerged in sea water for 5 weeks and was still able to be read. Data on BlueRay and the new Archive disk is recorded without using electro magnetics and is recorded in a very stable oxide recording material layer. They project that the new Archive disc has a media life of 50 years at 50C and 1000 years at 25C under high humidity conditions.
  • Dual sided, triple layered which uses land and groove recording to provide 300GB of data storage. BlueRay also uses a land and groove disk format but only records on the land portion of the disc. Track pitch for BlueRay is 320nm whereas for the Archive disc it’s only 225nm.
  • Data transfer speeds of 90MB/sec with two optical heads, one per side. Each head can read/write data at 45MB/sec. They project double or quadrouple this data transfer rate by using more pairs of optical heads .

They also presented a roadmap for a 2nd gen 500GB and 3rd gen 1TB Archive disc using higher linear density changes and better signal processing technology.

Cold storage is starting to get some more interest these days what with all the power consumption going into today’s data centers and the never ending data tsunami. Archive and BluRay optical storage consume no power at rest and only consume power when mounting/dismounting and reading/writing/spinning. Also with optical discs imperviousness to high temp and humidity, optical storage could be stored outside of air conditioned data centers.

The Storage Revolution

The final presentation of interest to me was by Andrea Nelson from Intel. Intel has lately been focusing on helping partners and vendors provide more effective storage offerings. These aren’t storage solutions but rather storage hardware, components and software developed in collaboration with storage vendors and partners that make it easier for them to offer storage solutions using Intel hardware. One example of this collaboration is IBM hardware assist Real Time Compression available on new V7000 and FlashSystem V9000 storage hardware.

As the world turns to software defined storage, Intel wants those solutions to make use of their hardware. (Although, at the show I heard from one another new SDS vendor that was planning to use X86 as well as ARM servers).

Intel has:

  • QuickAssist Acceleration technology – such as hardware assist data compression,
  • Storage Acceleration software libraries – open source erasure coding and other low-level compute intensive functions, and
  • Cache Acceleration software – uses Intel SSDs as a data cache for storage applications.

There wasn’t of a technical description of these capabilities as in other DSI sessions but with the industry moving more and more to SDS, Intel’s got a vested interest in seeing it be implemented on their hardware.

~~~~

That’s about it. I sat in on quite a number of other sessions but nothing else stuck out as significant or interesting to me as these threes sessions.

Comments?

Transporter, a private Dropbox in a tower

Move over DropboxBox and all you synch&share wannabees, there’s a new synch and share in town.

At SFD7 last month, we were visiting with Connected Data where CEO, Geoff Barrell was telling us all about what was wrong with today’s cloud storage solutions. In front of all the participants was this strange, blue glowing device. As it turns out, Connected Data’s main product is the File Transporter, which is a private file synch and share solution.

All the participants were given a new, 1TB Transporter system to take home. It was an interesting sight to see a dozen of these Transporter towers sitting in front of all the bloggers.

I was quickly, established a new account, installed the software, and activated the client service. I must admit, I took it upon myself to “claim” just about all of the Transporter towers as the other bloggers were still paying attention to the presentation.  Sigh, they later made me give back (unclaim) all but mine, but for a minute there I had about 10TB of synch and share space at my disposal.

Transporters rule

transporterB2So what is it. The Transporter is both a device and an Internet service, where you own the storage and networking hardware.

The home-office version comes as a 1 or 2TB 2.5” hard drive, in a tower configuration that plugs into a base module. The base module runs a secured version of Linux and their synch and share control software.

As tower power on, it connects to the Internet and invokes the Transporter control service. This service identifies the node, who owns it, and provides access to the storage on the Transporter to all desktops, laptops, and mobile applications that have access to it.

At initiation of the client service on a desktop/laptop it creates (by default) a new Transporter directory (folder). Files that are placed in this directory are automatically synched to the Transporter tower and then synchronized to any and all online client devices that have claimed the tower.

Apparently you can have multiple towers that are claimed to the same account. I personally tested up to 10 ;/ and it didn’t appear as if there was any substantive limit beyond that but I’m sure there’s some maximum count somewhere.

A couple of nice things about the tower. It’s your’s so you can move it to any location you want. That means, you could take it with you to your hotel or other remote offices and have a local synch point.

Also, initial synchronization can take place over your local network so it can occur as fast as your LAN can handle it. I remember the first time I up-synched 40GB to DropBox, it seemed to take weeks to complete and then took less time to down-synch for my laptop but still days of time. With the tower on my local network, I can synch my data much faster and then take the tower with me to my other office location and have a local synch datastore. (I may have to start taking mine to conferences. Howard (@deepstorage.net, co-host on our  GreyBeards on Storage podcast) had his operating in all the subsequent SFD7 sessions.

The Transporter also allows sharing of data. Steve immediately started sharing all the presentations on his Transporter service so the bloggers could access the data in real time.

They call the Transporter a private cloud but in my view, it’s more a private synch and share service.

Transporter heritage

The Transporter people were all familiar to the SFD crowd as they were formerly with  Drobo which was at a previous SFD sessions (see SFD1). And like Drobo, you can install any 2.5″ disk drive in your Transporter and it will work.

There’s workgroup and business class versions of the Transporter storage system. The workgroup versions are desktop configurations (looks very much like a Drobo box) that support up to 8TB or 12TB supporting 15 or 30 users respectively.  The also have two business class, rack mounted appliances that have up to 12TB or 24TB each and support 75 or 150 users each. The business class solution has onboard SSDs for meta-data acceleration. Similar to the Transporter tower, the workgroup and business class appliances are bring your own disk drives.

Connected Data’s presentation

transporterA1Geoff’s discussion (see SFD7 video) was a tour of the cloud storage business model. His view was that most of these companies are losing money. In fact, even Amazon S3/Glacier appears to be bleeding money, although this may not stop Amazon. Of course, DropBox and other synch and share services all depend on cloud storage for their datastores. So, the lack of a viable, profitable business model threatens all of these services in the long run.

But the business model is different when a customer owns the storage. Here the customer owns the actual storage cost. The only thing that Connected Data provides is the client software and the internet service that runs it. Pricing for the 1TB and 2TB transporters with disk drives are $150 and $240.

Having a Transporter

One thing I don’t like is the lack of data-at-rest encryption. They use TLS for data transfers across your LAN and the Internet. But the nice thing about having possession of the actual storage is that you can move it around. But the downside is that you may move it to less secure environments (like conference hotel rooms). And as with the any disk storage, someone can come up to the device and steel the disk. Whether the data would be easily recognizable is another question but having it be encrypted would put that question to rest. There’s some indication on the Transporter support site that encryption may be coming for the business class solution. But nothing was said about the Transporter tower.

On the Mac, the Transporter folder has the shared folders as direct links (real sub-folders) but the local data is under a Transporter Library soft link. It turns out to be a hidden file (“.Transporter Library”) under the Transporter folder. When you Control click on this file your are given the option to view deleted files. You can also do this with shared files as well.

One problem with synch and share services is once someone in your collaboration group deletes some shared files they are gone (over time) from all other group users. Even if some of them wanted them. Transporter makes it a bit easier to view these files and save them elsewhere. But I assume at some point they have to be purged to free up space.

When I first installed the Transporter, it showed up as a network node on my finder shared servers. But the latest desktop version (3.1.17) has removed this.

Also some of the bloggers complained about files seeing files “in flux” or duplicates of the shared files but with unusual file suffixes appended to them, such as ” filename124224_f367b3b1-63fa-4d29-8d7b-a534e0323389.jpg”. Enrico (@ESignoretti) opened up a support ticket on this and it’s supposedly been fixed in the latest desktop and was a temporary filename used only during upload and should have been deleted-renamed after the upload was completed. I just uploaded 22MB with about 40 files and didn’t see any of this.

I really want encryption as I wanted one transporter in a remote office and another in the home office with everything synched locally and then I would hand carry the remote one to the other location. But without encryption this isn’t going to work for me. So I guess I will limit myself to just one and move it around to wherever I want to my data to go.

Here are some of the other blog posts by SFD7 participants on Transporter:

Storage field day 7 – day 2 – Connected Data by Dan Firth (@PenguinPunk)

File Transporter, private Synch&Share made easy by Enrico Signoretti (@ESignoretti)

Transporter – Storage Field Day 7 preview by Keith Townsend (@VirtualizedGeek)

Comments?