Kasten, Kafka & the quest to protect data #CFD11

We attended Cloud Field Day 11 (CFD11) last week and among the vendors at the show was Kasten by Veeam talking about their extensive Kubernetes data protection/DR/migration capabilities. All that was worthy of its own blog post but somewhere near the end of their session, they started discussing how they plan to backup Apache Kafka message traffic

We have discussed Kafka before (see our Data in motion post). For those who have read that post or are familiar with Kafka, you can ignore the Kafka primer section below and just for the record, I’m no Kafka expert.

Kafka primer

Kafka is a massively scalable message bus and real time stream processing system. Messages or records come in from Kafka producers and are sent to Kafka consumers or subscribers for processing. Kafka supplies a number of guarantees one of which is that messages are processed once and only once.

Messages or records are key-value pairs that are generated continuously and are processed by Kafka apps. Message streams are split into topics.

Kafka connectors allow message traffic to be extracted/generated from external databases and other systems . Kafka stream processors use Kafka streaming primitives and stream flow graphs to construct applications that process unbounded, continuously updated data sets.

Topics in Kafka can have multi-producers and multi-subscribers. Each topic can be further split across partitions which can be processed in different servers or brokers in a Kafka cluster. It’s this partitioning that allows Kafka to scale from processing 1 message to millions of messages per second.

Consumers/subscribers can also be producers, so a message that comes in and is processed by a subscriber could create other messages on topics that need to be processed. Messages are placed on topic partitions in the order they are received.

Messages can be batched and added to a partition or not. Recently Kafka added support for sticky partitions. Normally messages are assigned to partitions based on key hashing but sometimes messages have null keys and in this case Kafka assigned them to partitions in a round robin fashion. Sticky partitioning strategy amends that approach to use batches of messages rather than single messages when adding null key messages to partitions.

Kafka essentially provides a realtime streaming message/event processing system that scales very well.

Data protection for Kafka streams without Kasten

This is our best guess as to what this looks like, so bear with us.

Sometime after messages are processed by a subscriber they can be sent to a logging system. If that system records those messages to external storage, they would be available to be copied off-line and backed up.

How far behind real time, Kafka logs happen to be can vary significantly based on the incoming/outgoing message traffic, resources available for logging and the overall throughput of your logging process/storage. Could it be an hour or two behind, possibly but I doubt it, could it be 1 second behind, also unlikely. Somewhere in between those two extremes seems reasonable

Log processing messages just like any topic could be partitioned. So any single log would only have messages on that partition. So there would need to be some post process to stitch all these log partitions together to get a consistent message stream.

In any case, there is potentially a wide gulf between the message data that is being processed and the logs which hold them. But then they need to be backed up as well so there’s another gap introduced between the backed up data what is happening in real time This is especially true the more messages that are being processed in your Kafka cluster.

(Possible) data protection for Kafka streams with Kasten

For starters, it wasn’t clear what Kasten presented was a current product offering, beta offering or just a germ of an idea. So bear that in mind. See the videos of their CFD11 session here for their take on this.

Kafka supports topic replication. What this means is that any topic can be replicated to other servers in the Kafka cluster. Topics can have 0 to N replications and one server is always designated leader (or primary) for a topic and other replicas are secondary. The way it’s supposed to work is that the primary topic partition server will not acknowledge or formally accept any message until it is replicated to all other topic replicas.

What Kasten is proposing is to use topic replicas and take hot snapshots of replicated topic partitions at the secondary server. That way there should be a minimal impact on primary topic message processing. Once snapped, topic partition message data can be backed up and sent offsite for DR.

We have a problem with this chart. Our understanding is that if we take the numbers to be a message id or sequence number, each partition should be getting different messages rather than the the same messages. Again we are not Kafka experts. The Eds.

However, even though Kasten plans to issue hot snapshots, one after another, for each partition, there is still a small time difference between each snap request. As a result, the overall state of the topic partitions when snapped may be slightly inconsistent. In fact, there is a small possibility that when you stitch all the partition snapshots together for a topic, some messages may be missing.

For example, say a topic is replicated and when looking at the replicated topic partition some message say Msg[2001] was delayed (due to server load) in being replicated for partition 0, while Kasten was hot snapping replica partition 15 which already held Msg[2002]. In this case the snapshots missed Msg[2001]. Thus hot snapshots in aggregate, can be missing one or (potentially) more messages.

Kasten called this a crash consistent backup but it’s not the term we would use (not sure what the term would be but it’s worse than crash consistent). But to our knowledge, this was the first approach that Kasten (or anyone else) has described that comes close to providing data protection for Kafka messaging.

As another alternative, Kasten suggested one could make the backed up set of partitions better would by post processing all the snapshots to find the last message point where all prior messages were available and jettisoning any followon messages. In this way, after post processing, the set of data derived from the hot snaps would be crash consistent up to that message point..

It seems to me, the next step to create an application consistent Kafka messaging backup would require Kafka to provide some way to quiesce a topic message stream. Once quiesced and after some delay to accept all in flight messages, hot snaps could be taken. The resultant snapshots in aggregate would have all the messages to the last one accepted.

Unclear whether quiescing a Kafka topic stream, even for a matter of seconds to minutes, is feasible for a system that processes 1000s to millions of messages per second.


Photo Credit(s):

IT in space

Read an article last week about all the startup activity that’s taking place in space systems and infrastructure (see: As rocket companies proliferate … new tech emerges leading to a new space race). This is a consequence of cheap(er) launch systems from SpaceX, Blue Origin, Rocket Lab and others.

SpaceBelt, storage in space

One startup that caught my eye was SpaceBelt from Cloud Constellation Corporation, that’s planning to put PB (4X library of congress) of data storage in a constellation of LEO satellites.

The LEO storage pool will be populated by multiple nodes (satellites) with a set of geo-synchronous access points to the LEO storage pool. Customers use ground based secure terminals to talk with geosynchronous access satellites which communicate to the LEO storage nodes to access data.

Their main selling points appear to be data security and availability. The only way to access the data is through secured satellite downlinks/uplinks and then you only get to the geo-synchronous satellites. From there, those satellites access the LEO storage cloud directly. Customers can’t access the storage cloud without going through the geo-synchronous layer first and the secured terminals.

The problem with terrestrial data is that it is prone to security threats as well as natural disasters which take out a data center or a region. But with all your data residing in a space cloud, such concerns shouldn’t be a problem. (However, gaining access to your ground stations is a whole different story.

AWS and Lockheed-Martin supply new ground station service

The other company of interest is not a startup but a link up between Amazon and Lockheed Martin (see: Amazon-Lockheed Martin …) that supplies a new cloud based, satellite ground station as a service offering. The new service will use Lockheed Martin ground stations.

Currently, the service is limited to S-Band and attennas located in Denver, but plans are to expand to X-Band and locations throughout the world. The plan is to have ground stations located close to AWS data centers, so data center customers can have high speed, access to satellite data.

There are other startups in the ground station as a service space, but none with the resources of Amazon-Lockheed. All of this competition is just getting off the ground, but a few have been leasing idle ground station resources to customers. The AWS service already has a few big customers, like DigitalGlobe.

One thing we have learned, is that the appeal of cloud services is as much about the ecosystem that surrounds it, as the service offering itself. So having satellite ground stations as a service is good, but having these services, tied directly into other public cloud computing infrastructure, is much much better. Google, Microsoft, IBM are you listening?

Data centers in space

Why stop at storage? Wouldn’t it be better to support both storage and computation in space. That way access latencies wouldn’t be a concern. When terrestrial disasters occur, it’s not just data at risk. Ditto, for security threats.

Having whole data centers, would represent a whole new stratum of cloud computing. Also, now IT could implement space native applications.

If Microsoft can run a data center under the oceans, I see no reason they couldn’t do so in orbit. Especially when human flight returns to NASA/SpaceX. Just imagine admins and service techs as astronauts.

And yet, security and availability aren’t the only threats one has to deal with. What happens to the space cloud when war breaks out and satellite killers are set loose.

Yes, space infrastructure is not subject to terrestrial disasters or internet based security risks, but there are other problems besides those and war that exist such as solar storms and space debris clouds. .

In the end, it’s important to have multiple, non-overlapping risk profiles for your IT infrastructure. That is each IT deployment, may be subject to one set of risks but those sets are disjoint with another IT deployment option. IT in space, that is subject to solar storms, space debris, and satellite killers is a nice complement to terrestrial cloud data centers, subject to natural disasters, internet security risks, and other earth-based, man made disasters.

On the other hand, a large, solar storm like the 1859 one, could knock every data system on the world or in orbit, out. As for under the sea, it probably depends on how deep it was submerged!!

Photo Credit(s): Screen shots from SpaceBelt youtube video (c) SpaceBelt

Screens shot from AWS Ground Station as a Service sign up page (c) Amazon-Lockheed

Screen shots from Microsoft’s Under the sea news feature (c) Microsoft

Disaster recovery from VMware to AWS using Dell EMC Avamar & Data Domain

avI was at Dell EMC World2017 last week and although most of the news was on Dell’s new 14th generation server and Dell-EMC integration progress, Wednesday’s keynote was devoted to storage and non-server infrastructure news.

There was plenty of non-server news but one item that caught my attention was new functionality from Dell EMC Data Protection Division that used Avamar and Data Domain to provide disaster recovery for VMware VMs directly to AWS.

Data Domain (AWS) Cloud DR

Dell EMC Data Domain Cloud DR (DDCDR) is  a new capability that enables DD to backup to AWS S3 object storage and when needed restart the virtual machines within AWS.

DDCDR requires that a customer with Avamar backup and Data Domain (DD) storage install an OVA which deploys an “add-on” to their on-prem Avamar/DD system and install a lightweight VM (Cloud DR server) utility in their AWS domain.

Once the OVA is installed, it will read the changed data and will segment, encrypt, and compress the backup data and then send this and the backup metadata to AWS S3 objects. Avamar/DD policies can be established to control how many daily backup copies are to be saved to S3 object storage. There’s no need for Data Domain or Avamar to run in AWS.

When there’s a problem at the primary data center, an admin can click on a Avamar GUI button and have the Cloud DR server, uncompress, decrypt, rehydrate and restore the backup data into EBS volumes, translate the VMware VM image to an AMI image and then restarts the AMI on an AWS virtual server (EC2) with its data on EBS volume storage. The Cloud DR server will use the backup metadata to select the AWS EC2 instance with the proper CPU and RAM needed to run the application. Once this completes, the VM is running standalone, in an AWS EC2 instance. Presumably, you have to have EC2 and EBS storage volumes resources available under your AWS domain to be able to install the application and restore its data.

For simplicity purposes, the user can control almost all of the required functionality for DDCDR from the Avamar GUI alone. But in case of a site outage, the user can initiate the application DR from a portal supplied by the Cloud DR server utility.

There you have it, simplified, easy to use (AWS) Cloud DR for your VM applications all through Dell EMC Avamar, Data Domain storage and DDCDR. At the moment, it only works with AWS cloud but it’s likely to be available for other public clouds in the near future.


There was much more infrastructure news at Dell EMC World2017. I’ll discuss more details on their new storage offerings in my upcoming Storage Intelligence newsletter, due out the end of this month. If your interested in receiving your own copy of my newsletter, checkout the signup button in the upper right of this page.


[Edits were made for readability and technical accuracy after this post was published. Ed]

Existential threats

Not sure why but lately I have been hearing a lot about existential events. These are events that threaten the existence of humanity itself.

Massive Solar Storm

A couple of days ago I read about the Carrington Event which was a massive geomagnetic solar storm in 1859. Apparently it wreaked havoc with the communications infrastructure of the time (telegraphs). Researchers have apparently been able to discover other similar events in earth’s history by analyzing ice cores from Greenland which indicate that events of this magnitude occur once every 500 years and smaller events typically occur multiple times/century.

Unclear to me what a solar storm of the magnitude of the Carrington Event would do to the world as we know it today, but we are much more dependent on electronic communications, radio, electronic power, etc. If such an event were to take out, 50% of our electro-magnetic infrastructure, such as frying power transformers, radio transceivers, magnetic storage/motors/turbines, etc. civilization as we know it would be brought back to the mid 1800’s but with a 21st century population.

This would last until we could rebuild all the lost infrastructure, at tremendous cost. During this time we would be dependent on animal-human-water power, paper-optical based communications/storage, and animal-wind transport.

It appears that any optical based communication/computer systems would remain intact but powering them would be problematic without working transformers and generators.

One article (couldn’t locate this) stated that the odds of another Carrington Event happening is 12%  by 2022. But the ice core research seems to indicate that it should be higher than this. By my reckoning, it’s been 155 years since the last event, which means we are ~1/3rd of the way through the next 500 years, so I would expect the probability of a similar event happening to be ~1/3 at this point and rising slightly every year until it happens again.


I picked up a book called Superintelligence: Paths, Dangers, Strengths by Nick Bostrom last week and started reading it last night. It’s about the dangers of AI gaining the ability to improve itself and after that becoming not just equivalent to Human Level Intelligence (HMLI) but greatly exceeding HMLI at a super-HMLI level (Superintelligent). This means some Superintelligent entity that would have more intelligence than our current population of humans today, by many orders of magnitude.

Bostrom discusses the take off processes that would lead to Superintelligence and some of the ways we could hope to control it. But his belief is that trying to install any of these controls after it has reached HMLI would be fruitless.

I haven’t finished the book but what I have read so far, has certainly scared me.

Bostrom presents three scenarios for a Superintelligence take off: slow take off, fast take off and medium take off. He believes that in a slow take off scenario there may be many opportunities to control the emerging Superintelligence. In a moderate or medium take off, we would know that something is wrong but would have only some limited opportunity to control it. In the fast take off (literally 18months from HMLI to Superintelligence in one scenario Bostrom presents), the likelihood of controlling it after it starts are non-existent.

The later half of Bostrom’s book discusses potential control mechanisms and other ways to moderate the impacts of superintelligence.  So far I don’t see much hope for mankind in the controls he has proposed. But l am only half way through the book and hope to see more substantial mechanisms in the 2nd half.

In the end, any Superintelligence could substantially alter the resources of the world and the impact this would have on humanity is essentially unpredictable. But by looking at recent history, one can see how other species have faired as humanity has altered the resources of the earth. Humanity’s rise has led to massive species die offs, for any species that happened to lie in the way of human progress.

The first part of Bostrom’s book discusses some estimates as to when the world will reach AI with HMLI. Most experts believe that we will see HMLI like this with a 90% probability by the year 2075 and a 50% probability by the year 2050. As for the duration of take off to superintelligence ,the expert opinions are mixed and he believes that they highly underestimate the speed of take off.

Humanity’s risks

The search for extra-terristial intelligence has so far found nothing. One of the parameters for the odds of a successful search was the number of inhabitable planets in the universe. But the another parameter is the ability of a technological civilization to survive long enough to be noticed – the likelihood of a civilization to survive any existential risk that comes up.

Superintelligence and massive solar storms represent just two such risks but there are a multitude of others that can be identified today, and tomorrow’s technological advances will no doubt give rise to more.

Existential risks like these are ever-present and appear to be growing as our technolgical prowess grows. My only problem is that today the study of existential risks seem at best, ad hoc today and at worst, outright disregard.

I believe the best policy is to recognize known existential risks, have some intelligent debate on how probably they are and how we could potentially check them. There really needs to be some systematic study of existential risks around the world bringing academics and technologists together to understand and to mitigate them. The threats to humanity are real, we can continue to ignore them, study a few that gain human interest, or actively seek out and mitigate all of them we can.


Photo Credit(s): C3-class Solar Flare Erupts on Sept. 8, 2010 [Detail] by NASA Goddard’s space flight center photo stream

DR preparedness in real time

As many may have seen there has been serious flooding throughout the front range of Colorado.  At the moment the flooding hasn’t impacted our homes or offices but there’s nothing like a recent, nearby disaster to focus one’s thoughts on how prepared we are to handle a similar situation.


What we did when serious flooding became a possibility

As I thought about what I should be doing last night with flooding in nearby counties, I moved my computers, printer, some other stuff from the basement office to an upstairs area in case of basement flooding. I also moved my “Time Machine” backup disk upstairs as well which holds the iMac’s backups (hourly for last 24 hrs, daily for past month and weekly backups [for as many weeks that can be held on a 2TB disk drive]). I have often depended on time machine backups to recover files I inadvertently overwrote, so it’s good to have around.

I also charged up all our mobiles, laptops & iPads and made sure software and email were as up-to-date as possible.  I packed up my laptop & iPad, with my most recent monthly and weekly backups and some other recent work printouts into my backpack and left it upstairs ready to go at a moments notice.

The next day post-mortum

This morning with less panic and more time to think, the printer was probably the least of my concerns but the internet and telecommunications (phones & headset) should probably have been moved upstairs as well.

Although we have multiple mobile phones, (AT&T) reception is poor in the office and home. It would have been pretty difficult to conduct business here with the mobile alone if we needed to.  I use a cable provider for business phones but also have a land line for our home. So I (technically) have triple backup for telecom, although to use the mobile effectively, we would have had to leave the office.

Internet access

Internet is another matter though. We also use cable for internet and the modem that supplies office internet connects to a cable close to where it enters the house/office. All this is downstairs, in the basement. The modem is powered using basement plugs (although it does have a battery as well) and there’s a hard ethernet link between the cable modem and an Airport Express base station (also downstairs) which provides WiFi to the house and LAN for the house iMacs/PCs.

Considering what I could do to make this a bit more flood tolerant, I should have probably moved the cable modem and Airport Express upstairs connecting it to the TV cable and powering it using upstairs power. Airport Express WiFi would have provided sufficient Internet access to work but with the modem upstairs connecting an ethernet cable to a desktop would also have been a possibility.

I do have the hotspot/tethering option for my mobile phone but as discussed above, reception is not that great. As such, it may have not sufficed for the household, let alone a work computer.

Internet is available at our local library and at many nearby coffee shops.  So, worst case was to take my laptop and head to a coffee shop/library that still had power/WiFi and camp out all day, for potentially multiple days.

I could probably do better with Internet access. With the WiFi and tethering capabilities available with cellular iPad these days, if I should just purchase one for the office, with a suitable data plan, I could have used the iPad as another hot spot, independent of my mobile. Of course, I would probably go with a different carrier so that reception issues could also be minimized (hoping where one [AT&T] is poor the other [Verizon?] carrier would be fine).

Data availability

Data access outside of the Time Machine disk and the various hard drive backups was another item I considered this morning.  I have a monthly, hard-drive backups, normally kept in a safety deposit box at a local bank.

The bank is in the same flood/fire plane that I am in, but the tell me it’s floodproof, fireproof and earthquake proof.  Call me paranoid but I didn’t see any fire suppression equipment visible in the vault. The vault door although a large quantity of steel and other metals didn’t seem to have waterproof seals surrounding it.  As for earthquakes, concrete walls, steel door doesn’t mean it’s eartquake proof.  But then again, I am paranoid, it would probably survive much better than anything in our home/office.

Also, I keep weekly encrypted backups in the house, alternating between two hard disk drives and keep the most recent upstairs. So between the weeklies, monthlies, and Time Machine I have three distinct tiers of data backups. Of course, the latest monthly was sitting in the house waiting to be moved to the safety deposit box – not good news.

I also have  a (manual) copy of work data on the laptop, current to the last hard backup (also at home). So of my three tiers of backup every single current one of them was in the home/office.

I could do better. Looking at Dropbox and Box for about $100/year/100GB (DropBox, Box is ~40% cheaper) I could keep our important work and home data on cloud storage and have access to it from any Internet accessible location (including with mobile devices) with proper security credentials. Not sure how long it would take to seed this backup we have about 20Gb of family and work office documents and probably another 120GB or so of photos that I would want to keep around or about 140GB of info.  This could provide 5-way redundancy with Time machine, weekly hard drive and monthly hard drive backups and now Box/Dropbox for a for a (office and home) fourth backup, with  the laptop being a fifth (office only) backup.  Seems like cheap insurance at the moment.

The other thing that Box/DropBox would do for me is to provide a synch service with my laptop so that files changed on either device would synch to the cloud and then be copied to all other devices.  This would substitute my current 4th tier of (work) backups with a more current, cloud backup. It would also eliminate the manual copy process performed during every backup to keep my laptop up to date.

I have some data security concerns with using cloud storage, but according to Dropbox they use Amazon S3 for their storage and AES-256 data encryption so that others can’t read your data. They use SSL to transfer data to the cloud.

Where all the keys are held is another matter and with all the hullabaloo with NSA, anything on the internet can be provided to the gov’t with a proper request. But the same could be said for my home computer and all my backups.

There are plenty of other solutions here, Google drive and Microsoft’s SkyDrive to name just a few. But from what I have heard Dropbox is best, especially if you have a large number of files.

The major downsides (besides the cost) is that when you power up your system it can take longer while Dropbox scans for out-of-synch files and the time it takes to seed your Dropbox account. This is all dependent on your internet access, but according to a trusted source Dropbox seeding starts with smallest files and works up to the larger ones over time. So there is a good likelihood your office files (outside of PPT) might make it to the cloud sooner than your larger media, databases, and other large files.  I figure we have about ~140GB to be copied to the cloud. I promise to update the post with the time it took to copy this data to the cloud.

Power and other emergency preparedness

Power is yet another concern.  I have not taken the leap to purchase a generator for the home/office. But now think this unwise. Although power has gotten a lot more reliable in our home/office over the years, there’s still a real possibility that there could be a disruption. The areas with serious flooding all around us are having power blackouts this morning and no telling when their power might get back on. So a generator purchase is definitely in my future.

Listening to the news today, there was talk of emergency personnel notifying people that they had 30 minutes to evacuate their houses.  So, next time there is a flood/fire warning in the area I think I will take some time to pack up more than my laptop. Perhaps some other stuff like clothing and medicines that will help us survive and continue to work.

Food and water are also serious considerations. In Florida for hurricane preparedness  they suggest filling up your bathtubs with water or having 1 gallon of water per person per day set aside in case of emergency – didn’t do this last night but should have.  Florida’s family emergency preparedness plan also suggests enough water for 5-7 days.

I think we have enough dry food around the house to sustain us for a number of days (maybe not 7 though). If we consider whats in the freezer and fridge that probably goes up to a couple of weeks or so, assuming we can keep it cold.

Cooking food is another concern. We have propane and camp stoves which would provide rudimentary ability to cook outdoors if necessary as well as an old charcoal grill and bag of charcoal in our car-camping stuff. Which should suffice for a couple of days but probably not a week.

As for important documents they are in that safety deposit box in our flood plain. (May need to rethink that). Wills and other stuff are also in the hands of appropriate family members and lawyers so that’s taken care of.

Another item on their list of things to have for a hurricane is flashlights and fresh batteries. These are all available in our camping stuff but would be difficult to access in a moments notice. So a couple of rechargeable flashlights that were easier to access might be a reasonable investment. The Florida plan further suggests you have a battery operated radio. I happen to have an old one upstairs with the batteries removed – just need to make sure to have some fresh batteries around someplace.

They don’t mention gassing up your car. But we do that as a matter of course anytime harsh weather is forecast.

I think this is about it for now. Probably other stuff I didn’t think of. I have a few fresh fire extinguishers around the home/office but have no pumps. May need to add that to the list…



Photo Credits: September 12 [2013], around 4:30pm [Water in Smiley Creek – Boulder Flood]



Enterprise file synch

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Last fall at SNW in San Jose there were a few vendors touting enterprise file synchronization services each having a slightly different version of the requirements.   The one that comes most readily to mind was Egnyte which supported file synchronization across a hybrid cloud (public cloud and network storage) which we discussed in our Fall SNWUSA wrap up post last year.

The problem with BYOD

With bring your own devices (BYOD) corporate end users are quickly abandoning any pretense of IT control and turning consumer class file synchronization services to help  synch files across desktop, laptop and all mobile devices they haul around.   But the problem with these solutions such as DropBoxBoxOxygenCloud and others are that they are really outside of IT’s control.

Which is why there’s a real need today for enterprise class file synchronization solutions that exhibit the ease of use and set up available from consumer file synch systems but  offer IT security, compliance and control over the data that’s being moved into the cloud and across corporate and end user devices.

EMC Syncplicity and EMC on premises storage

Last week EMC announced an enterprise version of their recently acquired Syncplicity software that supports on-premises Isilon or Atmos storage, EMC’s own cloud storage offering.

In previous versions of Syncplicity storage was based in the cloud and used Amazon Web Services (AWS) for cloud orchestration and AWS S3 for cloud storage. With the latest release, EMC adds on premises storage to host user file synchronization services that can span mobile devices, laptops and end user desktops.

New Syncplicity users must download desktop client software to support file synchronization or mobile apps for mobile device synchronization.  After that it’s a simple matter of identifying which if any directories and/or files are to be synchronized with the cloud and/or shared with others.

However, with the Business (read enterprise) edition one also gets the Security and Compliance console which supports access control to define users and devices that can synchronize or share data, enforce data retention policies, remote wipe corporate data,  and native support for single sign services. In addition, one also gets centralized user and group management services to grant, change, revoke user and group access to data.  Also, one now obtains enterprise security with AES-256 data-at-rest encryption, separate key manager data centers and data storage data centers, quadruple replication of data for high disaster fault tolerance and SAS70 Type II compliant data centers.

If the client wants to use on premises storage, they would also need to deploy a VM virtual appliance somewhere in the data center to act as the gateway to file synchronization service requests. The file synch server would also presumably need access to the on premises storage and it’s unclear if the virtual appliance is in-band or out-of-band (see discussion on Egnyte’s solution options below).

Egnyte’s solution

Egnyte comes as a software only solution building a file server in the cloud for end user  storage. It also includes an Egnyte app for mobile hardware and the ever present web file browser.  Desktop file access is provided via mapped drives which access the Egnyte cloud file server gateway running as a virtual appliance.

One major difference between Syncplicity and Egnyte is that Egnyte offers a combination of both cloud and on premises storage but you cannot have just on premises storage. Syncplicity only offers one or the other storage for file data, i.e., file synchronization data can only be in the cloud or on local on premises storage but cannot be in both locations.

The other major difference is that Egnyte operates with just about anybody’s NAS storage such as EMC, IBM, and HDS for the on premises file storage.  It operates as an in-band, software appliance solution that traps file activity going to your on premises storage. In this case, one would need to start using a new location or directory for data to be synchronized or shared.

But for NetApp storage only (today), they utilize ONTAP APIs to offer out-of-band file synchronization solutions.  This means that you can keep NetApp data where it resides and just enable synchronization/shareability services for the NetApp file data in current directory locations.

Egnyte promises enterprise class data security with AD, LDAP and/or SSO user authentication, AES-256 data encryption and their own secure data centers.  No mention of separate key security in their literature.

As for cloud backend storage, Egnyte has it’s own public cloud or supports other cloud storage providers such as AWS S3, Microsoft Azure, NetApp Storage Grid and HP Public Cloud.

There’s more to Egnyte’s solution than just file synchronization and sharing but that’s the subject of today’s post. Perhaps we can cover them at more length in a future post if their interest.

File synchronization, cloud storage’s killer app?

The nice thing about these capabilities is that now IT staff can re-gain control over what is and isn’t synched and shared across multiple devices.  Up until now all this was happening outside the data center and external to IT control.

From Egnyte’s perspective, they are seeing more and more enterprises wanting data both on premises for performance and compliance as well as in the cloud storage for ubiquitous access.  They feel its both a sharability demand between an enterprise’s far flung team members and potentially client/customer personnel as well as a need to access, edit and propagate silo’d corporate information using new mobile devices that everyone has these days.

In any event, Enterprise file synchronization and sharing is emerging as one of the killer apps for cloud storage.  Up to this point cloud gateways made sense for SME backup or disaster recovery solutions but IMO, didn’t really take off beyond that space.  But if you can package a robust and secure file sharing and synchronization solution around cloud storage then you just might have something that enterprise customers are clamoring for.



Oracle (finally) releases StorageTek VSM6

[Full disclosure: I helped develop the underlying hardware for VSM 1-3 and also way back, worked on HSC for StorageTek libraries.]

Virtual Storage Manager System 6 (VSM6) is here. Not exactly sure when VSM5 or VSM5E were released but it seems like an awful long time in Internet years.  The new VSM6 migrates the platform to Solaris software and hardware while expanding capacity and improving performance.

What’s VSM?

Oracle StorageTek VSM is a virtual tape system for mainframe, System z environments.  It provides a multi-tiered storage system which includes both physical disk and (optional) tape storage for long term big data requirements for z OS applications.

VSM6 emulates up to 256 virtual IBM tape transports but actually moves data to and from VSM Virtual Tape Storage Subsystem (VTSS) disk storage and backend real tape transports housed in automated tape libraries.  As VSM data ages, it can be migrated out to physical tape such as a StorageTek SL8500 Modular [Tape] Library system that is attached behind the VSM6 VTSS or system controller.

VSM6 offers a number of replication solutions for DR to keep data in multiple sites in synch and to copy data to offsite locations.  In addition, real tape channel extension can be used to extend the VSM storage to span onsite and offsite repositories.

One can cluster together up to 256 VSM VTSSs  into a tapeplex which is then managed under one pane of glass as a single large data repository using HSC software.

What’s new with VSM6?

The new VSM6 hardware increases volatile cache to 128GB from 32GB (in VSM5).  Non-volatile cache goes up as well, now supporting up to ~440MB, up from 256MB in the previous version.  Power, cooling and weight all seem to have also gone up (the wrong direction??) vis a vis VSM5.

The new VSM6 removes the ESCON option of previous generations and moves to 8 FICON and 8 GbE Virtual Library Extension (VLE) links. FICON channels are used for both host access (frontend) and real tape drive access (backend).  VLE was introduced in VSM5 and offers a ZFS based commodity disk tier behind the VSM VTSS for storing data that requires longer residency on disk.  Also, VSM supports a tapeless or disk-only solution for high performance requirements.

System capacity moves from 90TB (gosh that was a while ago) to now support up to 1.2PB of data.  I believe much of this comes from supporting the new T10,000C tape cartridge and drive (5TB uncompressed).  With the ability of VSM to cluster more VSM systems to the tapeplex, system capacity can now reach over 300PB.

Somewhere along the way VSM started supporting triple redundancy  for the VTSS disk storage which provides better availability than RAID6.  Not sure why they thought this was important but it does deal with increasing disk failures.

Oracle stated that VSM6 supports up to 1.5GB/Sec of throughput. Presumably this is landing data on disk or transferring the data to backend tape but not both.  There doesn’t appear to be any standard benchmarking for these sorts of systems so, will take their word for it.

Why would anyone want one?

Well it turns out plenty of mainframe systems use tape for a number of things such as data backup, HSM, and big data batch applications.  Once you get past the sunk  costs for tape transports, automation, cartridges and VSMs, VSM storage can be a pretty competitive data storage solution for the mainframe environment.

The fact that most mainframe environments grew up with tape and have long ago invested in transports, automation and new cartridges probably makes VSM6 an even better buy.  But tape is also making a comeback in open systems with LTO-5 and now LTO-6 coming out and with Oracle’s 5TB T10000C cartridge and IBM’s 4TB 3592 JC cartridge.

Not to mention Linear Tape File System (LTFS) as a new tape format that provides a file system for tape data which has brought renewed interest in all sorts of tape storage applications.

Competition not standing still

EMC introduced their Disk Library for Mainframe 6000 (DLm6000) product that supports two different backends to deal with the diversity of tape use in the mainframe environment.  Moreover, IBM has continuously enhanced their Virtual Tape Server the TS7700 but I would have to say it doesn’t come close to these capacities.

Lately, when I talked with long time StorageTek tape mainframe customers they have all said the same thing. When is VSM6 coming out and when will Oracle get their act in gear and start supporting us again.  Hopefully this signals a new emphasis on this market.  Although who is losing and who is winning in the mainframe tape market is the subject of much debate, there is no doubt that the lack of any update to VSM has hurt Oracle StorageTek tape business.

Something tells me that Oracle may have fixed this problem.  We hope that we start to see some more timely VSM enhancements in the future, for their sake and especially for their customers.




Image credit: Interior of StorageTek tape library at NERSC (2) by Derrick Coetzee


VMworld first thoughts kickoff session

[Edited for readability. RLL] The drummer band was great at the start but we couldn’t tell if it was real or lipsynched. It turned out that each of the Big VMWORLD letters had a digital drum pad on them which meant it was live, in realtime.

Paul got a standing ovation as he left the stage introducing Pat the new CEO.  With Paul on the stage, there was much discussion of where VMware has come the last four years.  But IDC stats probably say it better than most in 2008 about 25% of Intel X86 apps were virtualized and in 2012 it’s about 60% and and Gartner says that VMware has about 80% of that activity.

Pat got up on stage and it was like nothing’s changed. VMware is still going down the path they believe is best for the world a virtual data center that spans private, on premises equipment and extrenal cloud service providers equipment.

There was much ink on software defined data center which is taking the vSphere world view and incorporating networking, more storage, more infrastructure to the already present virtualized management paradigm.

It’s a bit murky as to what’s changed, what’s acquired functionality and what’s new development but suffice it to say that VMware has been busy once again this year.

A single “monster vm” (has it’s own facebook page) now supports up to 64 vCPUs, 1TB of RAM, and can sustain more than a million IOPS. It seems that this should be enough for most mission critical apps out there today. No statement on latency the IOPS but with a million IOS a second and 64 vCPUs we are probably talking flash somewhere in the storage hierarchy.

Pat mentioned that the vRAM concept is now officially dead. And the pricing model is now based on physical CPUs and sockets. It no longer has a VM or vRAM component to it. Seemed like this got lots of applause.

There are now so many components to vCloud Suite that it’s almost hard to keep track of them all:  vCloud Director, vCloud Orchestrator, vFabric applications director, vCenter Operations Manager, of course vSphere and that’s not counting relatively recent acquisitions Dynamic Op’s a cloud dashboard and Nicira SDN services and I am probably missing some of them.

In addition to all that VMware has been working on Serengeti which is a layer added to vSphere to virtualize Hadoop clusters. In the demo they spun up and down a hadoop cluster with MapReduce operating to process log files.  (I want one of these for my home office environments).

Showed another demo of the vCloud suite in action spinning up a cloud data center and deploying applications to it in real time. Literally it took ~5minutes to start it up until they were deploying applications to it.  It was a bit hard to follow as it was going a lot into the WAN like networking environment configuration of load ballancing, firewalls and other edge security and workload characteristics but it all seemed pretty straightforward and took a short while but configured an actual cloud in minutes.

I missed the last part about social cast but apparently it builds a social network of around VMs?  [Need to listen better next time]

More to follow…