Transporter, a private Dropbox in a tower

Move over DropboxBox and all you synch&share wannabees, there’s a new synch and share in town.

At SFD7 last month, we were visiting with Connected Data where CEO, Geoff Barrell was telling us all about what was wrong with today’s cloud storage solutions. In front of all the participants was this strange, blue glowing device. As it turns out, Connected Data’s main product is the File Transporter, which is a private file synch and share solution.

All the participants were given a new, 1TB Transporter system to take home. It was an interesting sight to see a dozen of these Transporter towers sitting in front of all the bloggers.

I was quickly, established a new account, installed the software, and activated the client service. I must admit, I took it upon myself to “claim” just about all of the Transporter towers as the other bloggers were still paying attention to the presentation.  Sigh, they later made me give back (unclaim) all but mine, but for a minute there I had about 10TB of synch and share space at my disposal.

Transporters rule

transporterB2So what is it. The Transporter is both a device and an Internet service, where you own the storage and networking hardware.

The home-office version comes as a 1 or 2TB 2.5” hard drive, in a tower configuration that plugs into a base module. The base module runs a secured version of Linux and their synch and share control software.

As tower power on, it connects to the Internet and invokes the Transporter control service. This service identifies the node, who owns it, and provides access to the storage on the Transporter to all desktops, laptops, and mobile applications that have access to it.

At initiation of the client service on a desktop/laptop it creates (by default) a new Transporter directory (folder). Files that are placed in this directory are automatically synched to the Transporter tower and then synchronized to any and all online client devices that have claimed the tower.

Apparently you can have multiple towers that are claimed to the same account. I personally tested up to 10 ;/ and it didn’t appear as if there was any substantive limit beyond that but I’m sure there’s some maximum count somewhere.

A couple of nice things about the tower. It’s your’s so you can move it to any location you want. That means, you could take it with you to your hotel or other remote offices and have a local synch point.

Also, initial synchronization can take place over your local network so it can occur as fast as your LAN can handle it. I remember the first time I up-synched 40GB to DropBox, it seemed to take weeks to complete and then took less time to down-synch for my laptop but still days of time. With the tower on my local network, I can synch my data much faster and then take the tower with me to my other office location and have a local synch datastore. (I may have to start taking mine to conferences. Howard (@deepstorage.net, co-host on our  GreyBeards on Storage podcast) had his operating in all the subsequent SFD7 sessions.

The Transporter also allows sharing of data. Steve immediately started sharing all the presentations on his Transporter service so the bloggers could access the data in real time.

They call the Transporter a private cloud but in my view, it’s more a private synch and share service.

Transporter heritage

The Transporter people were all familiar to the SFD crowd as they were formerly with  Drobo which was at a previous SFD sessions (see SFD1). And like Drobo, you can install any 2.5″ disk drive in your Transporter and it will work.

There’s workgroup and business class versions of the Transporter storage system. The workgroup versions are desktop configurations (looks very much like a Drobo box) that support up to 8TB or 12TB supporting 15 or 30 users respectively.  The also have two business class, rack mounted appliances that have up to 12TB or 24TB each and support 75 or 150 users each. The business class solution has onboard SSDs for meta-data acceleration. Similar to the Transporter tower, the workgroup and business class appliances are bring your own disk drives.

Connected Data’s presentation

transporterA1Geoff’s discussion (see SFD7 video) was a tour of the cloud storage business model. His view was that most of these companies are losing money. In fact, even Amazon S3/Glacier appears to be bleeding money, although this may not stop Amazon. Of course, DropBox and other synch and share services all depend on cloud storage for their datastores. So, the lack of a viable, profitable business model threatens all of these services in the long run.

But the business model is different when a customer owns the storage. Here the customer owns the actual storage cost. The only thing that Connected Data provides is the client software and the internet service that runs it. Pricing for the 1TB and 2TB transporters with disk drives are $150 and $240.

Having a Transporter

One thing I don’t like is the lack of data-at-rest encryption. They use TLS for data transfers across your LAN and the Internet. But the nice thing about having possession of the actual storage is that you can move it around. But the downside is that you may move it to less secure environments (like conference hotel rooms). And as with the any disk storage, someone can come up to the device and steel the disk. Whether the data would be easily recognizable is another question but having it be encrypted would put that question to rest. There’s some indication on the Transporter support site that encryption may be coming for the business class solution. But nothing was said about the Transporter tower.

On the Mac, the Transporter folder has the shared folders as direct links (real sub-folders) but the local data is under a Transporter Library soft link. It turns out to be a hidden file (“.Transporter Library”) under the Transporter folder. When you Control click on this file your are given the option to view deleted files. You can also do this with shared files as well.

One problem with synch and share services is once someone in your collaboration group deletes some shared files they are gone (over time) from all other group users. Even if some of them wanted them. Transporter makes it a bit easier to view these files and save them elsewhere. But I assume at some point they have to be purged to free up space.

When I first installed the Transporter, it showed up as a network node on my finder shared servers. But the latest desktop version (3.1.17) has removed this.

Also some of the bloggers complained about files seeing files “in flux” or duplicates of the shared files but with unusual file suffixes appended to them, such as ” filename124224_f367b3b1-63fa-4d29-8d7b-a534e0323389.jpg”. Enrico (@ESignoretti) opened up a support ticket on this and it’s supposedly been fixed in the latest desktop and was a temporary filename used only during upload and should have been deleted-renamed after the upload was completed. I just uploaded 22MB with about 40 files and didn’t see any of this.

I really want encryption as I wanted one transporter in a remote office and another in the home office with everything synched locally and then I would hand carry the remote one to the other location. But without encryption this isn’t going to work for me. So I guess I will limit myself to just one and move it around to wherever I want to my data to go.

Here are some of the other blog posts by SFD7 participants on Transporter:

Storage field day 7 – day 2 – Connected Data by Dan Firth (@PenguinPunk)

File Transporter, private Synch&Share made easy by Enrico Signoretti (@ESignoretti)

Transporter – Storage Field Day 7 preview by Keith Townsend (@VirtualizedGeek)

Comments?

Are RAID's days numbered?

HP/EVA drive shelfs in the HP/EVA lab in  Colo. Springs
HP/EVA drive shelfs in the HP/EVA lab in Colo. Springs
A older article that I recently came across said RAID 5 would be dead in 2009 by Robin Haris StorageMojo. In essence, it said as drives get to 1TB or more the time it took to rebuild the drive required going to RAID6.

Another older article I came across said RAID is dead, all hail the storage robot. It seemed to say that when it came to drive sizes there needed to be more flexibility and support for different capacity drives in a RAID group. Data Robotics Drobo products now support this capability which we discuss below.

I am here to tell you that RAID is not dead, not even on life support and without it the storage industry would seize up and die. One must first realize that RAID as a technology is just a way to group together a bunch of disks and to protect the data on those disks. RAID comes in a number of flavors which includes definitions for

  • RAID 0 – no protection)
  • RAID 1 – mirrored data protection
  • RAID 2 through 5 – single parity protection
  • RAID 6 and DP – dual parity protection

The rebuild time problem with RAID

The problem with drive rebuild time is that the time it takes to rebuild a 1TB or larger disk drive can be measured in hours if not days, depending on the busy-ness of the storage system and the RAID group. And of course as 1.5 and 2TB drives come online this just keeps getting longer. This can be sped up by having larger single parity RAID groups (more disk spindles in the RAID stripe), by using DP which actually has two raid groups cross-coupled (which means more disk spindles), or by using RAID 6 which often has more spindles in the RAID group.

Regardless of how you cut it there is some upper limit to the number of spindles that can be used to rebuild a failed drive – the number of active spindles in the storage subsystem. You could conceivably incorporate all these drives into a simple RAID 5 or 6 group (albeit, a very large one).

The downside of this large a RAID group is that data overwrite could potentially cause a performance bottleneck on the parity disks. That is, whenever a block is overwritten in a RAID 2-6 group, the parity for that data block (usually located on one or more other drives) has to be read, recalculated and rewritten back to the same location. Now it can be buffered, and lazily written but the data is not actually protected until parity is on disk someplace.

One way around this problem is to use a log structured file systems. Log file systems never rewrite data so there is no over-write penalty. Nicely eliminating the problem.

Alas, not everyone uses log structured file systems for backend storage. So for the rest of the storage industry the write penalty is real and needs to be managed effectively in order to not become a performance problem. One way to manage this is to limit RAID group size to a small number of drives.

So the dilemma is that in order to provide reasonable drive rebuild times you want a wide (large) RAID group with as many drives as possible in it. But in order to minimize the (over-)write penalty you want as thin (small) a RAID group as possible. How can we solve this dilemma?

Parity declustering

Parity Declustering figure from Holland&Gibson 1992 paper
Parity Declustering figure from Holland&Gibson 1992 paper

In looking at the declustered parity scheme described by Gibson and Holland in their 1992 paper. Parity and the stripe data can be spread across more drives than just in a RAID 5 or 6 group. They show an 8 drive system (see figure) where stripe data (with 3 data block sets) and parity data (of one parity block set) are rotated around a group of 8 physical drives in the array. In this way all 7 remaining drives are used to service the failed 8th drive. Some blocks will be rebuilt with one set of 3 drives and other blocks with a different set of 3 drives. As you go through the failed drives block set, rebuilding it would take all the remaining 7 drives, but not all of them would be busy for all the blocks. This should shrink the drive rebuild time considerably by utilizing more spindles.

Because parity declustering distributes the parity across a number of disk drives as well as the data no one disk would hold the parity for all drives. Doing this would eliminate the hot drive phenomenon, normally dealt with by using smaller RAID groups sizes.

The mixed drive capacity problem with RAID today

The other problem with RAID today is that it assumes a homogeneous set of disk drives in the storage array so that the same blocks/tracks/block sets could be set up as a RAID stripe across those disks used to compute parity. Now, according to the original RAID paper by Patterson, Gibson, and Katz they never explicitly stated a requirement for all the disk drives to be the same capacity but it seems easiest to implement RAID that way. With diverse capacity and performing drives you would normally want them to be in separate RAID groups. But you could create a RAID group using the least common divisor (or smallest capacity drive). However, by doing this you waste all the excess storage in the larger disks.

Now one solution to the above would be the declustered parity solution mentioned above but in the end you would need at least N-drives of the same capacity for whatever your stripe size (N) was going to be. But if you had that many drives why not just use RAID5 or 6.

Another solution popularized by Drobo is to carve up the various disk drives into RAID group segments. So if you had 4 drives with 100GB, 200GB, 400GB and 800GB, you could carve out 4 RAID groups: a 100GB RAID5 group across 4 drives; another 100GB RAID 5 group across 3 drives; a RAID 1 mirror for 200GB across the largest 2 drives; and a RAID 0 of 400GB on the largest drive. This could be configured as 4 LUNs or windows drive letters and used any way you wish.

But is this RAID?

I would say “yes”. Although this is at the subdrive level, it still looks like RAID storage, using parity and data blocks across stripes of data. All that’s been done is to take the unit of drive and make it some portion of a drive instead. Marketing aside, I think it’s an interesting concept and works well for a few drives of mixed capacity (just the market space Drobo is going after).

For larger concerns with intermixed drives I like parity declustering. It has the best of bigger RAID groups without the problems of increased activity for over-writes. Given today’s drive capacities, I might still lean towards a dual parity scheme with the parity declustering stripe but that doesn’t seem difficult to incorporate.

So when people ask if RAID’s days are numbered – my answer is a definite NO!