Cloud storage – Page 4 – Silverton Consulting

Transporter, a private Dropbox in a tower

Posted on April 4, 2015 by Ray in Cloud services, Cloud storage, data access, Data security, Ethernet, File Storage, Storage, Strategic Inflection Points, Strategy, Systems, WAN

Move over Dropbox, Box and all you synch&share wannabees, there’s a new synch and share in town.

At SFD7 last month, we were visiting with Connected Data where CEO, Geoff Barrell was telling us all about what was wrong with today’s cloud storage solutions. In front of all the participants was this strange, blue glowing device. As it turns out, Connected Data’s main product is the File Transporter, which is a private file synch and share solution.

All the participants were given a new, 1TB Transporter system to take home. It was an interesting sight to see a dozen of these Transporter towers sitting in front of all the bloggers.

I was quickly, established a new account, installed the software, and activated the client service. I must admit, I took it upon myself to “claim” just about all of the Transporter towers as the other bloggers were still paying attention to the presentation. Sigh, they later made me give back (unclaim) all but mine, but for a minute there I had about 10TB of synch and share space at my disposal.

Transporters rule

So what is it. The Transporter is both a device and an Internet service, where you own the storage and networking hardware.

The home-office version comes as a 1 or 2TB 2.5” hard drive, in a tower configuration that plugs into a base module. The base module runs a secured version of Linux and their synch and share control software.

As tower power on, it connects to the Internet and invokes the Transporter control service. This service identifies the node, who owns it, and provides access to the storage on the Transporter to all desktops, laptops, and mobile applications that have access to it.

At initiation of the client service on a desktop/laptop it creates (by default) a new Transporter directory (folder). Files that are placed in this directory are automatically synched to the Transporter tower and then synchronized to any and all online client devices that have claimed the tower.

Apparently you can have multiple towers that are claimed to the same account. I personally tested up to 10 ;/ and it didn’t appear as if there was any substantive limit beyond that but I’m sure there’s some maximum count somewhere.

A couple of nice things about the tower. It’s your’s so you can move it to any location you want. That means, you could take it with you to your hotel or other remote offices and have a local synch point.

Also, initial synchronization can take place over your local network so it can occur as fast as your LAN can handle it. I remember the first time I up-synched 40GB to DropBox, it seemed to take weeks to complete and then took less time to down-synch for my laptop but still days of time. With the tower on my local network, I can synch my data much faster and then take the tower with me to my other office location and have a local synch datastore. (I may have to start taking mine to conferences. Howard (@deepstorage.net, co-host on our GreyBeards on Storage podcast) had his operating in all the subsequent SFD7 sessions.

The Transporter also allows sharing of data. Steve immediately started sharing all the presentations on his Transporter service so the bloggers could access the data in real time.

They call the Transporter a private cloud but in my view, it’s more a private synch and share service.

Transporter heritage

The Transporter people were all familiar to the SFD crowd as they were formerly with Drobo which was at a previous SFD sessions (see SFD1). And like Drobo, you can install any 2.5″ disk drive in your Transporter and it will work.

There’s workgroup and business class versions of the Transporter storage system. The workgroup versions are desktop configurations (looks very much like a Drobo box) that support up to 8TB or 12TB supporting 15 or 30 users respectively. The also have two business class, rack mounted appliances that have up to 12TB or 24TB each and support 75 or 150 users each. The business class solution has onboard SSDs for meta-data acceleration. Similar to the Transporter tower, the workgroup and business class appliances are bring your own disk drives.

Connected Data’s presentation

Geoff’s discussion (see SFD7 video) was a tour of the cloud storage business model. His view was that most of these companies are losing money. In fact, even Amazon S3/Glacier appears to be bleeding money, although this may not stop Amazon. Of course, DropBox and other synch and share services all depend on cloud storage for their datastores. So, the lack of a viable, profitable business model threatens all of these services in the long run.

But the business model is different when a customer owns the storage. Here the customer owns the actual storage cost. The only thing that Connected Data provides is the client software and the internet service that runs it. Pricing for the 1TB and 2TB transporters with disk drives are $150 and $240.

Having a Transporter

One thing I don’t like is the lack of data-at-rest encryption. They use TLS for data transfers across your LAN and the Internet. But the nice thing about having possession of the actual storage is that you can move it around. But the downside is that you may move it to less secure environments (like conference hotel rooms). And as with the any disk storage, someone can come up to the device and steel the disk. Whether the data would be easily recognizable is another question but having it be encrypted would put that question to rest. There’s some indication on the Transporter support site that encryption may be coming for the business class solution. But nothing was said about the Transporter tower.

On the Mac, the Transporter folder has the shared folders as direct links (real sub-folders) but the local data is under a Transporter Library soft link. It turns out to be a hidden file (“.Transporter Library”) under the Transporter folder. When you Control click on this file your are given the option to view deleted files. You can also do this with shared files as well.

One problem with synch and share services is once someone in your collaboration group deletes some shared files they are gone (over time) from all other group users. Even if some of them wanted them. Transporter makes it a bit easier to view these files and save them elsewhere. But I assume at some point they have to be purged to free up space.

When I first installed the Transporter, it showed up as a network node on my finder shared servers. But the latest desktop version (3.1.17) has removed this.

Also some of the bloggers complained about files seeing files “in flux” or duplicates of the shared files but with unusual file suffixes appended to them, such as ” filename124224_f367b3b1-63fa-4d29-8d7b-a534e0323389.jpg”. Enrico (@ESignoretti) opened up a support ticket on this and it’s supposedly been fixed in the latest desktop and was a temporary filename used only during upload and should have been deleted-renamed after the upload was completed. I just uploaded 22MB with about 40 files and didn’t see any of this.

I really want encryption as I wanted one transporter in a remote office and another in the home office with everything synched locally and then I would hand carry the remote one to the other location. But without encryption this isn’t going to work for me. So I guess I will limit myself to just one and move it around to wherever I want to my data to go.

Here are some of the other blog posts by SFD7 participants on Transporter:

Storage field day 7 – day 2 – Connected Data by Dan Firth (@PenguinPunk)

File Transporter, private Synch&Share made easy by Enrico Signoretti (@ESignoretti)

Transporter – Storage Field Day 7 preview by Keith Townsend (@VirtualizedGeek)

Comments?

EMC acquisitions & other announcements at #EMCSummit last week

Posted on November 3, 2014 by Ray in Cloud services, Cloud storage, data protection, Mobile computing, Object storage

EMC’s recently announced the acquisition of Maginetics and Spanning, which are mainly for pushing data protection out to cloud storage and cloud storage data protection. The other major item to come out of EMC Global Analyst Summit last week was the announcement of EMC’s Hybrid Cloud Solution.

Magnetic for cloud onramp

Maginetics is intended to be yet another tier in the deep archive for Avatar, Data Domain and NetWorker but this one is intended to be in the cloud. MagFS Maginetic’s file system manages the file to object transition, provides their own deduplication and can replicate data to one or more cloud providers, even supporting different cloud services such as, AWS, Google Compute, Azzure, CleverSafe etc. I think ultimately this may be broadened beyond just data protection to be another tier for Unified storage to move to the cloud as well but that’s subject for another post.

How does Maginetics differs from another recent EMC acquisition, TwinStrata? TwinStrata is mainly targeted for primary storage moving a LUN to the cloud and maintaining data avaliability and something like reasonable responsiveness to the data that’s moved to the cloud. So where Maginetics is for data protection storage TwinStrata is for primary storage. Unclear where this leaves other file storage…

Spanning for protection of cloud data

Spanning is intended to be a data protection solution for data that’s born in the cloud. In this case, you can use Spanning to backup your cloud applications to different cloud service providers or even the same cloud service providers. Even if you don’t want to use Spanning to backup your cloud app’s data, with a cloud version of Data Protection Advisor (based somewhat on Spanning), you should be ultimately able to use it to monitor your current cloud provider’s replication/protection activities to insure they are copying and backing up your data properly across data center domains etc. In this way you can better monitor your cloud providers internal data replication/protection services.

EMC seems like it’s got a vision of where it intends to go and the cloud represents a significant new potential data stream and they want to be there to help protect it and use it to help protect other data.

EMC’s Hybrid Cloud announcement

The main announcement at the summit was on EMC’s Hybrid Cloud Offering which was pre-announced at EMC World last spring. With their Hybrid Cloud Offering making it easier for data centers to take advantage of the cloud to burst applications back and forth, EMC’s trying to cover anyway to use the cloud that makes sense.

EMC announced that their Hybrid Cloud solution will support a “federation” hybrid cloud solution based on VCE/VBLOCK or VSPEX and a software defined version based on ViPR storage controller. They also made a statement of direction to have their Hybrid Cloud solution support Microsoft Azure as well as OpenStack at some point in the future…

Well I think that about covers it for EMC cloud announcements from the EMC Global Summit last week.

Comments?

Cloud storage growth is hurting NAS & SAN storage vendors

Posted on August 20, 2014August 25, 2014 by Ray in Cloud services, Cloud storage, DAS, Information economy, Object storage, SSD storage, Storage, storage economics, Strategic Inflection Points, Strategy, Systems

My friend Alex Teu (@alexteu), from Oxygen Cloud wrote a post today about how Cloud Storage is Eating the World Alive. Alex reports that all major NAS and SAN storage vendors lost revenue this year over the previous year ranging from a ~3% loss to over a 20% loss (Q1-2014 compared to Q1-2013, from IDC).

Although an interesting development, it’s hard to say that this is the end of enterprise storage as we know it. I believe there are a number of factors that are impacting enterprise storage revenues and Cloud storage adoption may be only one of them.

Other trends impacting NAS & SAN storage adoption

One thing that has emerged over the last decade or so is the advance of Flash storage. Some of this is used in storage controllers to speed up IO access and some is used in servers to speed up IO access. But any speedup of IO could potentially reduce the need for high-performing disk drives and could allow customers to use higher capacity/slower disk drives instead. This could definitely reduce the cost of storage systems. A little bit of flash goes long way to speed up IO access.

The other thing is that disk capacity is trending upward, at exponential rates. Yesterday,s 2TB disk drive is todays 4TB disk drive and we are already seeing 6TB from Seagate, HGST and others. And this is also driving down the cost of NAS and SAN storage.

Nowadays you can configure 1PB of storage with just over 170 drives. Somewhere in there you might want a couple 100TB of Flash to speed up IO access to these slow disks but Flash is also coming down in ($/GB) price (see SanDISK’s recent consumer grade TLC drive at $0.44/GB). Also the move to MLC flash has increased the capacity of flash devices, leading to less SSDs/flash cache cards to store/speed up more data.

Finally, the other trend which seems to have emerged recently is the movement away from enterprise class storage to server storage. One can see this in VMware’s VSAN, HyperConverged systems such as Nutanix and Scale Computing, as well as a general trend in Windows Server applications (SQL Server, Exchange Server, etc.) to make better use of DAS storage. So some customers are moving their data to shared DAS storage today, whereas before this was more difficult to accomplish effectively and because of that they previously purchased networked storage.

What about cloud storage?

Yes, as Alex has noted, the price of cloud storage has declined precipitously over the last year or so. Alex’s cloud storage pricing graph is shows how the entry of Microsoft and Google has seemingly forced Amazon to match their price reductions. But the other thing of note is that they have all come down to about the same basic price of $0.024/GB/Month.

It’s interesting that Amazon delayed their first S3 serious price reductions by about 4 months after Azure and Google Cloud Storage dropped there’s and then within another month after that, they all were at price parity.

What’s cloud storage real growth?

I reported last August that Microsoft Azure and Amazon S3 were respectively storing 8 trillion and over 2 trillion objects (see my Is object storage outpacing structured and unstructured data growth). This year (April 2014) Microsoft mentioned at TechEd that Azure was storing 20 Trillion object and servicing 2 million request per second.

I could find no update to Amazon S3 numbers from last year but the ~~10x~~ 2.5x growth in Azure’s object count in ~8 months and the roughly doubling of request/second (In my post I didn’t mention last year they were processing 900K requests/second) say something interesting is going on in cloud storage.

I suppose Google’s cloud storage service is too new to report serious results and maybe Amazon wants to keep their growth a secret. But considering Amazon’s recent matching of Azure’s and Google’s pricing, it probably means that their growth wasn’t what they expected.

The other interesting item from the Microsoft discussions on Azure, was that they were already hosting 1M SQL databases in Azure and that 57% of Fortune 500 customers are currently using Azure.

In the “olden days”, before cloud storage, all these SQL databases and Fortune 500 data sets would have more than likely resided on NAS or SAN storage of some kind. And possibly due to the traditional storage’s higher cost and greater complexity, some of this data would never have been spun up in the first place if they had to use traditional storage, but with cloud storage so cheap, rapidly configurable and easy to use all this new data was placed in the cloud.

So I must conclude from Microsofts growth numbers and their implication for the rest of the cloud storage industry that maybe Alex was right, more data is moving to the cloud and this is impacting traditional storage revenues. With IDC’s (2013) data growth at ~43% per year, it would seem that Microsoft’s cloud storage is growing more rapidly than the worldwide data growth, ~14X faster!

On the other hand, if cloud storage was consuming most of the world’s data growth, it would seem to precipitate the collapse of traditional storage revenues, not just a ~3-20% decline. So maybe the most new cloud storage applications would never have been implemented before if they had to use traditional storage, which means that only some of this new data would ever have been stored on traditional storage in the first place, leading to a relatively smaller decline in revenue.

One question remains: is this a short term impact or more of a long running trend that will play out over the next decade or so? From my perspective, new applications spinning up on non-traditional storage is a long running threat to traditional NAS and SAN storage which will ultimately see traditional storage relegated to a niche. How big this niche will ultimately be and how well it can be defended needs to be the subject for another post?

~~~~

Comments?

Replacing the Internet?

Posted on July 25, 2014July 25, 2014 by Ray in Cloud services, Cloud storage, Crowdsourcing, Data availability, Data grid, Data security, Distributed computing, Information economy, Internet traffic, Networking, Object storage, Storage, storage economics, Strategic Inflection Points, Strategy, Systems, Visionary leadershp

safe 'n green by Robert S. Donovan (cc) (from flickr) — safe ‘n green by Robert S. Donovan (cc) (from flickr)

Was reading an article the other day from TechCrunch that said Servers need to die to save the Internet. This article talked about a startup called MaidSafe which is attempting to re-architect/re-implement/replace the Internet into a Peer-2-Peer, mesh network and storage service which they call the SAFE (Secure Access for Everyone) network. By doing so, they hope to eliminate the need for network servers and storage.

Sometime in the past I wrote a blog post about Peer-2-Peer cloud storage (see Free P2P Cloud Storage and Computing if interested). But it seems MaidSafe has taken this to a more extreme level. By the way the acronym MAID used in their name stands for Massive Array of Internet Disks, sound familiar?

Crypto currency eco-system

The article talks about MaidSafe’s SAFE network ultimately replacing the Internet but at the start it seems more to be a way to deploy secure, P2P cloud storage. One interesting aspect of the MaidSafe system is that you can dedicate a portion of your Internet connected computers’ storage, computing and bandwidth to the network and get paid for it. Assuming you dedicate more resources than you actually use to the network you will be paid safecoins for this service.

For example, users that wish to participate in the SAFE network’s data storage service run a Vault application and indicate how much internal storage to devote to the service. They will be compensated with safecoins when someone retrieves data from their vault.

Safecoins are a new BitCoin like internet currency. Currently one safecoin is worth about $0.02 but there was a time when BitCoins were worth a similar amount. MaidSafe organization states that there will be a limit to the number of safecoins that can ever be produced (4.3Billion) so there’s obviously a point when they will become more valuable if MaidSafe and their SAFE network becomes successful over time. Also, earned safecoins can be used to pay for other MaidSafe network services as they become available.

Application developers can code their safecoin wallet-ids directly into their apps and have the SAFE network automatically pay them for application/service use. This should make it much easier for App developers to make money off their creations, as they will no longer have to use advertising support, or provide differenct levels of product such as free-simple user/paid-expert use types of support to make money from Apps. I suppose in a similar fashion this could apply to information providers on the SAFE network. An information warehouse could charge safecoins for document downloads or online access.

All data objects are encrypted, split and randomly distributed across the SAFE network

The SAFE network encrypts and splits any data up and then randomly distributes these data splits uniformly across their network of nodes. The data is also encrypted in transit across the Internet using rUDPs (reliable UDPs) and SAFE doesn’t use standard DNS services. Makes me wonder how SAFE or Internet network nodes know where rUDP packets need to go next without DNS but I’m no networking expert. Apparently by encrypting rUDPs and not using DNS, SAFE network traffic should not be prone to deep packet inspection nor be easy to filter out (except of course if you block all rUDP traffic). The fact that all SAFE network traffic is encrypted also makes it much harder for intelligence agencies to eavesdrop on any conversations that occur.

The SAFE network depends on a decentralized PKI to authenticate and supply encryption keys. All SAFE network data is either encrypted by clients or cryptographically signed by the clients and as such, can be cryptographically validated at network endpoints.

The each data chunk is replicated on, at a minimum, 4 different SAFE network nodes which provides resilience in case a network node goes down/offline. Each data object could potentially be split up into 100s to 1000s of data chunks. Also each data object has it’s own encryption key, dependent on the data itself which is never stored with the data chunks. Again this provides even better security but the question becomes where does all this metadata (data object encryption key, chunk locations, PKI keys, node IP locations, etc.) get stored, how is it secured, and how is it protected from loss. If they are playing the game right, all this is just another data object which is encrypted, split and randomly distributed but some entity needs to know how to get to the meta-data root element to find it all in case of a network outage.

Supposedly, MaidSafe can detect within 20msec. if a node is no longer available and reconfigure the whole network. This probably means that each SAFE network node and endpoint is responsible for some network transaction/activity every 10-20msec, such as a SAFE network heartbeat to say it is still alive.

It’s unclear to me whether the encryption key(s) used for rUDPs and the encryption key used for the data object are one and the same, functionally related, or completely independent? And how a “decentralized PKI” and “self authentication” works is beyond me but they published a paper on it, if interested.

For-profit open source business model

MaidSafe code is completely Open Source (available at MaidSafe GitHub) and their APIs are freely available to anyone and require no API key. They also have multiple approved and pending patents which have been provided free to the world for use, which they use in a defensive capacity.

MaidSafe says it will take a 5% cut of all safecoin transactions over the SAFE network. And as the network grows their revenue should grow commensurately. The money will be used to maintain the core network software and MaidSafe said that their 5% cut will be shared with developers that help develop/fix the core SAFE network code.

They are hoping to have multiple development groups maintaining the code. They currently have some across Europe and in California in the US. But this is just a start.

They are just now coming out of stealth, have recently received $6M USD investment (by auctioning off MaidSafeCoins a progenitor of safecoins) but have been in operation now, architecting/designing/developing the core code now for 8+ years now, which probably qualifies them for the longest running startup on the planet.

Replacing the Internet

MaidSafe believes that the Internet as currently designed is too dependent on server farms to hold pages and other data. By having a single place where network data is held, it’s inherently less secure than by having data spread out, uniformly/randomly across a multiple nodes. Also the fact that most network traffic is in plain text (un-encrypted) means anyone in the network data path can examine and potentially filter out data packets.

I am not sure how the SAFE network can be used to replace the Internet but then I’m no networking expert. For example, from my perspective, SAFE is dependent on current Internet infrastructure to store and forward rUDPs on along its trunk lines and network end-paths. I don’t see how SAFE can replace this current Internet infrastructure especially with nodes only present at the endpoints of the network.

I suppose as applications and other services start to make use of SAFE network core capabilities, maybe the SAFE network can become more like a mesh network and less dependent on the current hub and spoke current Internet we have today. As a mesh network, node endpoints can store and forward packets themselves to locally accessed neighbors and only go out on Internet hubs/trunk lines when they have to go beyond the local network link.

Moreover, the SAFE can make any Internet infrastructure less vulnerable to filtering and spying. Also, it’s clear that SAFE applications are no longer executing in data center servers somewhere but rather are actually executing on end-point nodes of the SAFE network. This has a number of advantages, namely:

SAFE applications are less susceptible to denial of service attacks because they can execute on many nodes.
SAFE applications are inherently more resilient because the operate across multiple nodes all the time.
SAFE applications support faster execution because the applications could potentially be executing closer to the user and could potentially have many more instances running throughout the SAFE network.

Still all of this doesn’t replace the Internet hub and spoke architecture we have today but it does replace application server farms, CDNs, cloud storage data centers and probably another half dozen Internet infrastructure/services I don’t know anything about.

Yes, I can see how MaidSafe and its SAFE network can change the Internet as we know and love it today and make it much more secure and resilient.

Not sure how having all SAFE data being encrypted will work with search engines and other web-crawlers but maybe if you want the data searchable, you just cryptographically sign it. This could be both a good and a bad thing for the world.

Nonetheless, you have to give the MaidSafe group a lot of kudos/congrats for taking on securing the Internet and making it much more resilient. They have an active blog and forum that discusses the technology and what’s happening to it and I encourage anyone interested more in the technology to visit their website to learn more

~~~~

Comments?

Google cloud offers SSD storage

Posted on June 6, 2014 by Ray in Block Storage, Cloud services, Cloud storage, Disk storage, SSD storage, Storage, Storage performance

Read an article the other day on Google Cloud tests out fast, high I/O SSD drives. I suppose it was only a matter of time before cloud services included SSDs in their I/O mix.

Yet, it doesn’t seem to me to be as simple as adding SSDs to the storage catalog. Enterprise storage vendors have had SSDs arguably since January of 2008 (see my EMC introduced SSDs to DMX dispatch). And although there are certainly a class of applications that can take advantage of SSD low latency/high IOPs, the vast majority of applications don’t seem to require these services.

Storage systems use of SSDs today

That’s why most enterprise storage system vendors support some form of automated storage tiering or flash caching of normal I/O for their high-end storage systems. Together with offering just plain old SSDs as data storage. In this more sophisticated solution customers have the option to assign application data to SSDs only, hybrid SSD-disks, or disk only storage. In this way the customer get’s to decide whether they want some sort of mix or just pure SSD or disk IO to satisfy their application IO requirements.

Storage startups have emerged that take on both the hybrid SSD-disk and all-flash model and add quality of service to the picture. An example of all-flash that supplies QoS version of all-flash storage is SolidFire (learn more about SolidFire in our GreyBeardsOnStorage podcast with Dave Wright). An example that does the same sort of thing for hybrid storage is Fusion IOcontrol (formerly NexGen) storage.

Storage system QoS

In the case of SolidFire one can limit volume or volume groups with an IOPs max, throughput max, and a Burst max. The burst is sort of a credit that accrues on a time basis if the application doesn’t ask for the maximum IOPs/Througput which they then can consume above their maximums up to the burst max for a limited timeframe.

QoS capabilities are slowly making their way into enterprise storage systems as well but it will take some time for the instrumentation and capabilities to be put in place. But one can see limited QoS in IBM DS8000 priority IO, NetApp Storage QoS, EMC Unisphere QoS manager for VNX & SMC QoS for VMAX, and HDS SVOS QoS via partitioning. Most of these capabilities control access or partition cache, backend and frontend resources for host volumes. As such, they are not nearly as sophisticated or as easy to use as what SolidFire and other start ups are offering, but they are getting there.

Cloud SSD pricing

Back to the cloud offering. According to the GigaOm article, Google SSD volumes can sustain up to 15K IOPs and they are charging a premium price for this storage ($0.325/GB-month). Apparently Amazon AWS offers high IO EC2 storage as well with a maximum of 4K IOPs but charges a premium both for the storage ($0.125/GB month) and on an IOPs basis ($0.10/IOPS-month). GigaOM had a pricing comparison for 500GB and 2000 IOPs indicating that Google SSD storage would cost $163/month and the AWS provisioned SSD storage would cost $263 ($62.50 for storage and $200 for the 2000 IOPs).

The fact that you can drive the Google SSD to it’s limits without incurring any extra cost seems a serious advantage to me and would be very appealing to me to most enterprise customers.

But where’s latency

It seems to me after some IOPs level is attained, most mission critical applications are more interested in low latency IO (for more on why low latency matters seem my IO throughput vs. low latency post…). Many storage systems are capable of maximum of 100,000s of IOPS but most shops don’t run them that hard, ever. But with proper use of SSDs, most enterprise storage is now clocking IO at sub-msec. low latency IO.

However, I have yet to see any Cloud storage pricing or QoS for that matter that was based on latency guarantees. I think this is a serious omission.

In any event, SSDs in the cloud is a good think now they just need to offer flash caching, automatic storage tiering and sophisticated QoS. I realize this is partially re-inventing enterprise storage in the cloud but isn’t that what everyone actually wants, at cloud storage pricing of course.

Comments?

DS3, the BlackPearl and the way forward for … tape

Posted on October 10, 2013 by Ray in Cloud services, Cloud storage, Data, data access, Information economy, Object storage, Storage, Storage architecture, storage economics, Storage standards, Strategic Inflection Points, Systems, Tape storage, Visionary leadershp, WAN

Just got back from an analyst summit with Spectra Logic. They announced a new interface to tape called, Deep Simple Storage Service (DS3) and an appliance that implements this interface named the BlackPearl. The intent is to broaden the use of tape to include, todays more web services, application environments.

The main problems addressed by the new interface is how do you map an essentially sequential, high throughput but long latency access to first byte, removable media device to an essentially small file, get and put environment. And is there a market for such services. I think Spectra Logic has answered the first set of questions and is about to embark on a journey to answer the second set of questions.

The new interface – it’s all about simplifying tape

The DS3 interface answers the first set of questions. With DS3 Specra Logic has extended Amazon’s S3 interface to expose some of the sequentiality and removability of tape to the object storage world.

As you should recall, Amazon S3 is a RESTful, web interface that uses HTTP type GET and PUT commands to move data to and from the S3 storage service. The data you are moving is considered an object and the object name or identifier is unique across the storage service. When you “PUT” an object you get to add key-value pairs of information called meta-data to the object. When you “GET” an object you retrieve the data from the storage service. The other thing one needs to be aware of is that you get and put objects into “BUCKET”s.

With DS3, Spectra Logic has added essentially 4 new commands to S3 protocol, which are:

Bulk Put – this provides a list of objects that one wants to “PUT” into a DS3 storage service and the response from the DS3 storage service is an ordered list of which objects to PUT in sequence and which DS3 storage server node (essentially an IP address) to send the data.
Bulk Get – this supplies a list of objects that one wants to GET from a DS3 storage service and the response is an ordered list of the sequence to get those objects and the node address to use for those object gets
Export Bucket – this identifies a BUCKET that you wish to remove from a DS3 storage service. Presumably the response would be where the bucket can be found, the number of pieces of media to expect, and some identification of the media serial numbers that constitute a bucket on the DS3 storage service.
Import Bucket – this identifies a new bucket which will be imported into a DS3 storage service and will supply some necessary information such as how many pieces of media to expect and the serial numbers of the media. Presumably the response will be a location which can be used to import the media.

With these four simple commands and an appropriate DS3 client, DS3 server and DS3 storage backend one now has everything they need to support a removable media object store. I could see real value for export/import like this on the “rare occasion” when a cloud service provider goes out of business.

The DS3 interface will be publicly available and the intent is to both supply Spectra Logic developed clients as well a ISV/partner developed DS3 clients so as to provide removable media object stores for all sorts of other applications.

Spectra’s is providing developer tools and documentation so that anyone can write a DS3 client. To that end, the DS3 developer portal is up (couldn’t find a link this AM but will update this post when I find it) and available free of charge to anyone today (believe you need to register to gain access to the doc.). They have a DS3 server simulator that DS3 client developers can use to test out and validate their client software. They also have a try & buy service for client developers.

Essentially, the combination of DS3 clients, DS3 servers and DS3 backend storage create a really deep archive for object data. It’s not intended for primary or secondary storage access but it’s big, cheap, and power/space efficient storage that can be very effective if used for archive data.

BlackPearl, the first DS3 Server

Their second announcement is the first implementation of a DS3 server, Spectra Logic calls BlackPearl(™). The BlackPearl connects to one or more Spectra Logic tape libraries as a backend store which together essentially provides a DS3 object storage archive. The DS3 server talks to DS3 clients on the front end. BlackPearl uses SAS or FC connected tape transports, which can be any transport currently supported by SpectraLogic tape libraries, including IBM TS1140, LTO-4, -5 and -6.

In addition to BlackPearl, Spectra Logic is releasing the first DS3 client for Hadoop. In this case, the DS3 client implements a new version of the Hadoop DistCp (distributed copy) command which can be used to create a copy of an HDFS directory tree onto a DS3 storage service.

Current BlackPearl hardware is a standard 2U server with 4-400GB SSDs inside which act as sort of a speed matching buffer for the Object interface to SAS/FC tape interface.

We only saw a configuration with one BlackPearl in operation (GA of BlackPearl is expected this December). But the plan is to support multiple BlackPearl appliances to talk with the same DS3 backend storage. In that case, there will be a shared database and (tape) resource scheduler across all the appliances in the cluster.

Yes, but what about the market?

It’s a gutsy move for someone like Spectra Logic to define a new open interface to deep storage. The fact that the appliance exists outside the tape library itself and could potentially support any removable media offers interesting architectural capabilities. The current (beta) implementation lacked some sophistication but the expectation is that much of this will be resolved by GA or over time through incremental enhancements.

Pricing is appealing. When you add BlackPearl appliance(s), with a T950 Spectra Logic tape library using LTO drives which supports uncompressed data store of ~2.4PB of archive data, the purchase price is ~$0.10/GB. This compares especially well with current Amazon Glacier pricing of $0.01/GB/Month, so that for the price of 10 months of Glacier storage you could own your own DS3 storage service.

At larger capacities, such as BlackPearl using T950 with TS1140 tape drives supporting 6.4PB is even cheaper, at $0.09/GB. Other configurations are available and in general bigger congfigurations are cheaper on $/GB and smaller ones more expensive. The configurations are speced by Spectra Logic to have all the media, tape drives and BlackPearl systems be needed to support an archives object store.

As for markets, Spectra Logic already has beta interest from a large well known web services customer and a number of media & entertainment customers.

In the long run, Spectra Logic believes that if they can simplify access to tape for an application where it’s well qualified to support (deep archive), that this will enable new applications to take advantage of tape, that weren’t even dreamed of before. By opening up a Object Store interface to tape, anyone currently using S3 is a potential customer.

Amazon announced earlier this year that they have over 2 trillion objects is their S3. And as far as I can tell (see my post Who’s the next winner in storage?) they are growing with no end in sight.

~~~~

Comments?

Storage changes in vSphere 5.5 announced at VMworld 2013

Posted on August 27, 2013August 27, 2013 by Ray in Block Storage, Cloud storage, DAS, Disk storage, Object storage, Server virtualization, SSD storage, Storage, Storage virtualization

VMworld2013 is going on in San Francisco this week. The big news is the roll out of network virtualization in NSX and vCloud Hybrid Service (vCHS) but there were a few tidbits in the storage arena worth discussing.

Virtual SAN public beta – VSAN was released as a public beta and customers can now download a copy of VSAN from www.vsanbeta.com. VSAN will construct a pool of storage out of local attached disks and flash across two or more hosts. It uses the flash as a read-write cache for the local disks. With VSAN customers can elect to have multiple tiers of storage be supported within a single VSAN pool, as well as support different availability (replication) levels, and some other, select characteristics. VSAN can easily scale in performance and capacity by just adding more hosts that have local storage. Now all that stranded local storage and flash server level resources can be used as a VM storage pool. VMware stated that they see VSAN as usefull for tier 2/tier 3 application storage and/or backup-archive storage uses. However they showed one chart with a View Planner application simulation using a 3-host VSAN (presumably with lots of SSD and disk storage) compared against an all-flash array (vendor unknown). In this benchmark the VSAN exactly matched the all-flash external storage in performance (VMs supported). [late update] Lot’s of debate on what VSAN means to enterprise storage but it appears to be a limited in scope and mainly focused on SMB applications. Chad Sakac did a (real) lengthy post on EMC’s perspective on VSAN and Software Defined Storage if you want to know more check it out.
Virsto – VMware announced GA of Virsto which uses any external storage and creates a new global storage pool out of them. Apparently, it maps a log structured file system across the external SAN storage. By doing this it sequentializes all the random write IO coming off of ESX hosts. It supports thin provisioning, snapshot and read-write clones. One could see this as almost a write cache for VM IO activity but read IOs are also by definition spread across (extremely wide striped) across the storage pool which should improve read performance as well. You configure external storage as normal and present those LUNs to Virsto which then converts that storage pool into “vDisks” which can then be configured as VM storage. Probably more to see here but it’s available today. Before acquisition one had to install Virsto into each physical host that was going to define VMs using Virsto vDisks. It’s unclear how much Virsto has been integrated into the hypervisor but over time one would assume like VSAN this would be buried underneath the hypervisor and be available to any vSphere host.
vSphere Flash Read Cache – customers with PCIe flash cards and vCenter Ops Manager, can now use them to support a read cache for data access. vSphere Flash Read Cache is apparently vmotion aware such that as you move VMs from one ESX host to another the read cache buffer will move with it. Flash Read Cache is transparent to the VMs and can be assigned on a VMDK basis.
vSphere 5.5 low-latency support – unclear what VMware actually did but they now claim vSphere 5.5 now supports low latency applications, like FinServ apps. They claim to have reduced the “jitter” or variability in IO latency that was present in previous versions of vSphere. Presumably they shortened the IO and networking paths through the hypervisor which should help. I suppose if you have a VMDK which ends up on an SSD storage someplace one can have a more predictable response time. But the critical question is how much overhead does the hypervisor IO path add to the base O/S. When all-flash arrays now sporting latencies under 100 µsecs, adding another 10 or 100 µsecs can make a big difference. In VMware’s quest to virtualize any and all mission critical apps, low-latency apps are one of the last bastions of physical server apps left to conquer. Consider this a step to accommodate them.
vVols – VMware keeps talking about vVols as an attempt to extend their VSAN “policy driven control plane” functionality out to networked storage but there’s still no GA yet. The (VASA 2 or vVol) spec’s seem to be out for awhile now, and I have heard from at least two “major” vendors that they have support in place today but VMware still isn’t announcing formal availability yet. Unclear what the hold up is, but maybe the spec’s are more in a state of flux than what’s depicted externally.

Most of this week was spent talking about NSX, VMware’ network virtualization and vCloud Hybrid Services. When they flashed the list of NSX partners on the screen Cisco was absent. Not sure what this means but perhaps there’s some concern that NSX will take revenue away from Cisco.

As for vCHS apparently this is a VMware run public cloud with two now expanding to three data centers in US, that customers can use to support their own hybrid cloud services. VMware announced that SAVVIS is now offering vCHS services as well as VMware with data centers in NY and Chicago. There was some talk about vCHS offering object storage services like Amazon’s S3 but there was nothing specific about when. [Late update] Pat did mention that a future offering will provide DR-as-a-Service using vCHS as a target for SRM. That seems to be matching what Microsoft seems to be planning for Azzure and Hyper-V DR.

That’s about it as far as I can tell. Didn’t hear any other news on storage changes in vSphere 5.5. But this is the year of network virtualization. Can’t wait to see what they roll out next year.

Who’s the next winner in data storage?

Posted on August 15, 2013August 15, 2013 by Ray in CIFS/SMB, Cloud services, Cloud storage, data access, Data availability, data services, Distributed computing, FC, FCoE, NFS, Object storage, Storage, Storage availability, Strategic Inflection Points, Strategy, Visionary organizations

“The future is already here – just not evenly distributed”, W. Gibson

It starts as it always does outside the enterprise data center. In the line of businesses, in the development teams, in the small business organizations that don’t know any better but still have an unquenchable need for data storage.

It’s essentially an Innovator’s Dillemma situation. The upstarts are coming into the market at the lower end, lower margin side of the business that the major vendors don’t seem to care about, don’t service very well and are ignoring to their peril.

Yes, it doesn’t offer all the data services that the big guns (EMC, Dell, HDS, IBM, and NetApp) have. It doesn’t offer the data availability and reliability that enterprise data centers have come to demand from their storage. require. And it doesn’t have the performance of major enterprise data storage systems.

But what it does offer, is lower CapEx, unlimited scaleability, and much easier to manage and adopt data storage, albeit using a new protocol. It does have some inherent, hard to get around problems not the least of which is speed of data ingest/egress, highly variable latency and eventual consistency. There are other problems which are more easily solvable, with work, but the three listed above are intrinsic to the solution and need to be dealt with systematically.

And the winner is …

It has to be cloud storage providers and the big elephant in the room has to be Amazon. I know there’s a lot of hype surrounding AWS S3 and EC2 but you must admit that they are growing, doubling year over year. Yes it is starting from a much lower capacity point and yes, they are essentially providing “rentable” data storage space with limited or even non-existant storage services. But they are opening up whole new ways to consume storage that never existed before. And therein lies their advantage and threat to the major storage players today, unless they act to counter this upstart.

On AWS’s EC2 website there must be 4 dozen different applications that can be fired up in the matter of a click or two. When I checked out S3 you only need to signup and identify a bucket name to start depositing data (files, objects). After that, you are charged for the storage used, data transfer out (data in is free), and the number of HTTP GETs, PUTs, and other requests that are done on a per month basis. The first 5GB is free and comes with a judicious amount of gets, puts, and out data transfer bandwidth.

… but how can they attack the enterprise?

Aside from the three systemic weaknesses identified above, for enterprise customers they seem to lack enterprise security, advanced data services and high availability storage. Yes, NetApp’s Amazon Direct addresses some of the issues by placing enterprise owned, secured and highly available storage to be accessed by EC2 applications. But to really take over and make a dent in enterprise storage sales, Amazon needs something with enterprise class data services, availability and security with an on premises storage gateway that uses and consumes cloud storage, i.e., a cloud storage gateway. That way they can meet or exceed enterprise latency and services requirements at something that approximates S3 storage costs.

We have talked about cloud storage gateways before but none offer this level of storage service. An enterprise class S3 gateway would need to support all storage protocols, especially block (FC, FCoE, & iSCSI) and file (NFS & CIFS/SMB). It would need enterprise data services, such as read-writeable snapshots, thin provisioning, data deduplication/compression, and data mirroring/replication (synch and asynch). It would need to support standard management configuration capabilities, like VMware vCenter, Microsoft System Center, and SMI-S. It would need to mask the inherent variable latency of cloud storage through memory, SSD and hard disk data caching/tiering.. It would need to conceal the eventual consistency nature of cloud storage (see link above). And it would need to provide iron-clad, data security for cloud storage.

It would also need to be enterprise hardened, highly available and highly reliable. That means dually redundant, highly serviceable hardware FRUs, concurrent code load, multiple controllers with multiple, independent, high speed links to the internet. Todays, highly-available data storage requires multi-path storage networks, multiple-independent power sources and resilient cooling so adding multiple-independent, high-speed internet links to use Amazon S3 in the enterprise is not out of the question. In addition to the highly available and serviceable storage gateway capabilities described above it would need to supply high data integrity and reliability.

Who could build such a gateway?

I would say any of the major and some of the minor data storage players could easily do an S3 gateway if they desired. There are a couple of gateway startups (see link above) that have made a stab at it but none have it quite down pat or to the extent needed by the enterprise.

However, the problem with standalone gateways from other, non-Amazon vendors is that they could easily support other cloud storage platforms and most do. This is great for gateway suppliers but bad for Amazon’s market share.

So, I believe Amazon has to invest in it’s own storage gateway if they want to go after the enterprise. Of course, when they create an enterprise cloud storage gateway they will piss off all the other gateway providers and will signal their intention to target the enterprise storage market.

So who is the next winner in data storage – I have to believe its going to be and already is Amazon. Even if they don’t go after the enterprise which I feel is the major prize, they have already carved out an unbreachable market share in a new way to implement and use storage. But when (not if) they go after the enterprise, they will threaten every major storage player.

Yes but what about others?

Arguably, Microsoft Azure is in a better position than Amazon to go after the enterprise. Since their acquisition of StorSimple last year, they already have a gateway that with help, could be just what they need to provide enterprise class storage services using Azure. And they already have access to the enterprise, already have the services, distribution and goto market capabilities that addresses enterprise needs and requirements. Maybe they have it all but they are not yet at the scale of Amazon. Could they go after this – certainly, but will they?

Google is the other major unknown. They certainly have the capability to go after enterprise cloud storage if they want. They already have Google Cloud Storage, which is priced under Amazon’s S3 and provides similar services as far as I can tell. But they have even farther to go to get to the scale of Amazon. And they have less of the marketing, selling and service capabilities that are required to be an enterprise player. So I think they are the least likely of the big three cloud providers to be successful here.

There are many other players in cloud services that could make a play for enterprise cloud storage and emerge out of the pack, namely Rackspace, Savvis, Terremark and others. I suppose DropBox, Box and the other file sharing/collaboration providers might also be able to take a shot at it, if they wanted. But I am not sure any of them have enterprise storage on their radar just yet.

And I wouldn’t leave out the current major storage, networking and server players as they all could potentially go after enterprise cloud storage if they wanted to. And some are partly there already.

Comments?