New cloud storage and Hadoop managed service offering from Spring SNW

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Last week I posted my thoughts on Spring SNW in Dallas, but there were two more items that keep coming back to me (aside from the tornados).  The first was a new startup called Symform in cloud storage and the other was an announcement from SunGard about their new Hadoop managed services offering.

Symform

Symform offers an interesting alternative on cloud storage that avoids the build up of large multi-site data centers and uses your desktop storage as a sort of crowd-sourced storage cloud, sort of bit-torrent cloud storage.

You may recall I discussed such a Peer-to-Peer cloud storage and computing services in a posting a couple of years ago.  It seems Symform has taken this task on, at least for storage.

A customer downloads (Windows or Mac) software which is installed and executes on your desktop.  The first thing you have to do after providing security credentials is to identify which directories will be moved to the cloud and the second is to tell whether you wish to contribute to Symform’s cloud storage and where this storage is located.  Symform maintains a cloud management data center which records all the metadata about your cloud resident data and everyone’s contributed storage space.

Symform cloud data is split up into 64MB blocks and encrypted (AES-256) using a randomly generated key (known only to Symform). Then this block is broken up into 64 fragments with 32 parity fragments (using erasure coding) added to the stream which is then written to 96 different locations.  With this arrangement, the system could potentially lose 31 fragments out of the 96 and still reconstitute your 64MB of data.  The metadata supporting all this activity sits in Symform’s data center.

Unclear to me what you have to provide as far as ongoing access to your contributed storage.  I would guess you would need to provide 7X24 access to this storage but the 32 parity fragments are there for possible network/power failures outside your control.

Cloud storage performance is an outcome of the many fragments that are disbursed throughout their storage cloud world. It’s similar to a bit torrent stream with all 96 locations participating in reconstituting your 64MB of data.  Of course, not all 96 locations have to be active just some > 64 fragment subset but it’s still cloud storage so data access latency is on the order of internet time (many seconds).  Nonetheless, once data transfer begins, throughput performance can be pretty high, which means your data should arrive shortly thereafter.

Pricing seemed comparable to other cloud storage services with a monthly base access fee and a storage amount fee over that.  But, you can receive significant discounts if you contribute storage and your first 200GB is free as long as you contribute 200GB of storage space to the Symform cloud.

Sungard’s new Apache Hadoop managed service

Hadoop Logo (from http://hadoop.apache.org website)
Hadoop Logo (from http://hadoop.apache.org website)

We are well aware of Sungard’s business continuity/disaster recovery (BC/DR) services, an IT mainstay for decades now. But sometime within the last decade or so Sungard has been expanding outside this space by moving into managed availability services.

Apparently this began when Sungard noticed the number of new web apps being deployed each year exceeded the number of client server apps. Then along came virtualization, which reduced the need for lots of server and storage hardware for BC/DR.

As evident of this trend, last year Sungard announced a new enterprise class computing cloud service.  But in last week’s announcement, Sungard has teamed up with EMC Greenplum to supply an enterprise ready Apache Hadoop managed service offering.

Recall, that EMC Greenplum is offering their own Apache Hadoop supported distribution, Greenplum HD.  Sungard is basing there service on this distribution. But there’s more.

In conjunction with Hadoop, Sungard adds Greenplum appliances.  With this configuration Sungard can load Hadoop processed and structured data into a Greenplum relational database for high performance data analytics.  Once there, any standard SQL analytics and queries can be used against to analyze the data.

With these services Sungard is attempting to provide a unified analytics service that spans all structured, semi-structured and unstructured data.

~~~~

Probably more to Spring SNW but given my limited time on the exhibition floor and time in vendor discussions these and my previously published post are what I seem of most interest to me.

IT as a service on the Cloud is not the end

Prison Planet by AZRainman (cc) (from Flickr)
Prison Planet by AZRainman (cc) (from Flickr)

[Long post] Read another intriguing post by David Vellente at Wikibon today about the emergence of IT shops becoming service organizations to their industries using the cloud to hosting these services.  I am not in complete agreement with Dave but he certainly describes a convincing picture.

His main points are:

  • Cloud storage and cloud computing are emerging as a favorite platform for IT-as-a-service.
  • Specialization and economics of scale will generate an IT-as-a-service capability for any organization’s information processing needs.

I would have to say another tenet of his overall discussion is that IT matters, a lot and I couldn’t agree more.

Cloud reality

For some reason I have been talking a lot about cloud storage this past couple of weeks, in multiple distinct venues.  On the one hand, I was talking with a VAR the other day and they were extremely excited about the opportunity in cloud storage. It seems getting SMB customers to sign up for a slice of storage is easy and once they have that, getting them to use more becomes a habit they can’t get rid of.

I thought maybe the enterprise level would be immune to such inducements, but no.  Another cloud storage gateway vendor,  StorSimple, I talked with recently was touting the great success they were having displacing tier 2 storage in the enterprise.

Lately, I heard that some small businesses/startups have decided to abandon their own IT infrastructure altogether and depend entirely on cloud offerings from Amazon, RackSpace and others for all they need.  They argue that such infrastructure, for all its current faults, will have less downtime than anything they could create on their own within a limited budget.

So, cloud seems to be taking off, everywhere I look.

Vertical support for IT as a service

Dave mentions plenty in his lengthy post that a number of sophisticated IT organizations are taking their internal services and becoming IT-as-a-service profit centers.  Yes, hard to disagree with this one as well.

But, it’s not the end of IT organizations

However, where I disagree with Dave is that he sees this as a winning solution, taking over all internal IT activities.  In his view, either your IT group becomes an external service profit center or it’s destined to be replaced by someone else’s service offering(s).

I don’t believe this. To say that IT as a service will displace 50+ years of technology development in the enterprise is just overstatement.

Dave talks about WINTEL, displacing mainframes as the two monopolies created in IT.  But the fact remains, WINTEL has not eliminated mainframes.  Mainframes still exist and arguably, today are still expanding through out the world.

Dave states that the introduction of WINTEL reduced the switching cost of mainframes, and that the internet and the cloud that follows, have reduced the costs yet again. I agree.  But, that doesn’t mean that switching cost is 0.

Ask anyone whether SalesForce.com switching cost inhibits them from changing services and more than likely they will say yes.  Switching costs have come down, but they are still a viable barrier to change.

Cloud computing and storage generates similar switching costs not to mention the time it takes to transfer TBs of data over a WAN.  Whether a cloud service uses AWS interface, OpenStack, Azzure or any of the other REST/SOAP cloud storage/cloud computing protocols is a formidable barrier to change.  It would be great if OpenStack were to take over but it hasn’t yet, and most likely won’t in the long run.  Mainly because the entrenched suppliers don’t want to help their competition.

IT matters, a lot to my organization

What I see happening is not that much different from what Dave sees, it’s only a matter of degree.  Some IT shops will become service organizations to their vertical but there will remain a large proportion of IT shops that see

  • That their technology is a differentiator.
  • That their technology is not something they want their competition using.
  • That their technology is too important to their corporate advantage to sell to others.

How much of this is reality vs. fiction is another matter.

Nonetheless, I firmly believe that a majority of IT shops that exist today will not convert to using IT as a service.   Some of this is due to sunk costs but a lot will be due to the belief that they are truly better than the service.

That’s not to say that new organizations, just starting out might be more interested in utilizing IT as a service.  For these entities, service offerings are going to be an appealing alternative.

However, a small portion of these startups may just as likely conclude that they can do better or believe it’s more important for them to develop their own IT services to help them get ahead.  Similarly, how much of this is make believe is TBD.

In the end, I believe IT as a service will take it’s place alongside IT developed services and IT outsourced development as yet another capability that any company can deploy to provide information processing for their organization.

The real problem

In my view, the real problem with IT developed services today is development disease.  Most organizations, would like increased functionality, and want it ASAP but they just can’t develop working functionality fast enough.  I call slow functionality development, missing critical features with lots of bugs development disease.  And it’s everywhere today and has never really gone away.

Some of this is due to poor IT infrastructure, some is due to the inability to use new development frameworks, and some of it is due to a lack of skills.  If IT had some pill they could take to help them develop business processing faster, consuming less resources with much fewer bugs and fuller functionality, they would never consider IT as a service.

That’s where the new frameworks of Ruby on Rails, SpringForce and the like are exciting. Their promise is providing faster functionality with fewer failures. When that happens, organizations will move away from IT as a service in droves, and back to internally developed capabilities.

But, we’re not there yet.

—-

Comments?

Is cloud a leapfrog technology?

Mobile Phone with Money in Kenya by whiteafrican (cc) (from Flickr)
Mobile Phone with Money in Kenya by whiteafrican (cc) (from Flickr)

Read an article today about Safaricom creating a domestic cloud service offering outside Nairobi in Kenya (see Chasing the African Cloud).

But this got me to thinking that cloud services may be just like mobile phones in that developing countries can use it to skip over older technologies like wired phone lines and gain advantages of more recent technology that offers similar services, the mobile phone without the need to bother with the expense and time to build telephone wires across the land.

Leapfrogging IT infrastructure buildout

In the USA, cloud computing, cloud storage, and SAAS services based in the cloud are essentially taking the place of small business IT infrastructure services today.  Many small businesses skip over building their own IT infrastructures, absolutely necessary years ago for email, web services, back office processing, etc., and are moving directly to using cloud service providers for these capabilities.

In some cases, it’s even more than  just the IT infrastructure, as the application, data and processing services all can be supplied from SAAS providers.

Today, it’s entirely possible to run a complete, very large business without owning a stitch of IT infrastructure (other than desktops, laptops, tablets and mobile phones) by doing this

Developing countries can show us the way

Developing countries can do much the same for their economic activity. Rather than have their small businesses spend time building out homegrown IT infrastructure just lease it out from one or more domestic (or international) cloud service providers and skip the time, effort and cost of doing it your self.

Hanging out with Kenya Techies by whiteafrican (cc) (from Flickr)
Hanging out with Kenya Techies by whiteafrican (cc) (from Flickr)

Given this dynamic, cloud service vendors ought to be focusing more time and money on developing countries. They should adopt such services more rapidly because they don’t have the sunk costs in current, private IT infrastructure and applications.

China moves into the cloud

I probably should have caught on earlier.  Earlier this year I was at a vendor analyst meeting, having dinner with a colleague from the China Center for Information Industry Development (CCID) Consulting.  He mentioned that Cloud was one of a select set of technologies that China was focusing considerable state and industry resources on.   At the time, I just thought this was prudent thinking to keep up with industry trends. What I didn’t realize at the time was that the cloud could be a leap frog technology that would help them avoid a massive IT infrastructure build out in millions of small companies in their nation.

One can see that early adopter nations have understood that with the capabilities of mobile phones they can create a fully functioning telecommunications infrastructure almost overnight.  Much the same can be done with cloud computing, storage and services.

Now if they can only get WiMAX up and running to eliminate cabling their cities for internet access.

—-

Comments?

The sensor cloud comes home

We thought the advent of smart power meters would be the killer app for building the sensor cloud in the home.  But, this week Honeywell announced a new smart thermostat that attaches to the Internet and uses Opower’s cloud service to record and analyze home heating and cooling demand.  Looks to be an even better bet.

9/11 Memorial renderings, aerial view (c) 9/11 Memorial.org (from their website)
9/11 Memorial renderings, aerial view (c) 9/11 Memorial.org (from their website)

Just this past week, on a NPR NOVA telecast: Engineering Ground Zero on building the 9/11 memorial in NYC, it was mentioned that all the trees planted in the memorial had individual sensors to measure soil chemistry, dampness, and other tree health indicators. Yes, even trees are getting on the sensor cloud.

And of course the buildings going up at Ground Zero are all smart buildings as well, containing sensors embedded in the structure, the infrastructure, and anywhere else that matters.

But what does this mean in terms of data

Data requirements will explode as the smart home and other sensor clouds build out.  For example, even if a smart thermostat only issues a message every 15 minutes and the message is only 256 bytes, the data from the 130 million households in the US alone would be an additional ~3.2TB/day.  And that’s just one sensor per household.

If you add the smart power meter, lawn sensor, intrusion/fire/chemical sensor, and god forbid, the refrigerator and freezer product sensors to the mix that’s another another 16TB/day of incoming data.

And that’s just assuming a 256 byte payload per sensor every 15 minutes.  The intrusion sensors could easily be a combination of multiple, real time exterior video feeds as well as multi-point intrusion/motion/fire/chemical sensors which would generate much, much more data.

But we have smart roads/bridges, smart cars/trucks, smart skyscrapers, smart port facilities, smart railroads, smart boats/ferries, etc. to come.   I could go on but the list seems long enouch already.  Each of these could generate another ~19TB/day data stream, if not more.  Some of these infrastructure entities/devices are much more complex than a house and there are a lot more cars on the road than houses in the US.

It’s great to be in the (cloud) storage business

All that data has to be stored somewhere and that place is going to be the cloud.  The Honeywell smart thermostat uses Opower’s cloud storage and computing infrastructure specifically designed to support better power management for heating and cooling the home.  Following this approach, it’s certainly feasible that more cloud services would come online to support each of the smart entities discussed above.

Naturally, using this data to provide real time understanding of the infrastructure they monitor will require big data analytics. Hadoop, and it’s counterparts are the only platforms around today that are up to this task.

—-

So cloud computing, cloud storage, and big data analytics have yet another part to play. This time in the upcoming sensor cloud that will envelope the world and all of it’s infrastructure.

Welcome to the future, it’s almost here already.

Comments?

 

 

Is cloud computing/storage decentralizing IT, again?

IBM Card Sorter by Pargon (cc) (From Flickr)
IBM Card Sorter by Pargon (cc) (From Flickr)

Since IT began, over the course of years, computing services have run through massive phases of decentralization out to departments followed by consolidation back to the data center.  In the early years of computing, from the 50s to the 60s, the only real distributed solution to mainframe or big iron data processing was sophisticated card sorters.

Consolidation-decentralization Wars

But back in the 70s the consolidation-decentralization wars were driven by the availability of mini-computers competing with mainframes for applications and users.  During the 80s, the PC emerged to become the dominant decentralizer taking applications away from mainframes and big servers and in the 90s it was small, off-the-shelf linux servers and continuing use of high-powered PCs that took applications out from data center control.

In those days it seemed that most computing decentralization was driven by the ease of developing applications for these upstarts and the relative low-cost of the new platforms.

Server virtualization, the final solution

Since 2000, another force has come to solve the consolidation quandry – server virtualization.  With server virtualization such as from VMware, Citrix and others, IT has once again driven massive consolidation outlying departmental computing services to bring them all, once again, under one roof, centralizing IT control.  Virtualization provided an optimum answer to the one issue that decentralization could never seem to address – utilization efficiency.  With most departmental servers being used at 5-10% utilization, virtualzation offered demonstrable cost savings when consolidated onto data center hardware.

Cloud computing/storage mutiny

But with the insurrection that is cloud computing and cloud storage once again, departments can easily acquire storage and computing resources on demand and utilization is no longer an issue because it’s a “pay only for what you use” solution. And they don’t even need to develop their own applications because SaaS providers can supply most of their application needs using cloud computing and cloud storage resources alone.

Virtualization was a great solution to the poor utilization of systems and storage resources. But with the pooling available with cloud computing and storage, utilization effectiveness occurs outside the bounds of the todays data center.  As such, with cloud services utilization effectiveness in $/MIP or $/GB can be approximately equivalent to any highly virtualized data center infrastructure (perhaps even better).  Thus, cloud services can provide these very same utilization enhancements at reduced costs out to any departmental user without the need for centralized data center services.

Other decentralization issues that cloud solves

Traditionally, the other problems with departmental computing services were lack of security and the unmanageability distributed service both of which held back some decentralization efforts but these are partially being addressed with cloud infrastructure today.  Insecurity continues to plague cloud computing but some cloud storage gateways (see Cirtas Surfaces and other cloud storage gateway posts) are beginning to use encryption and other cryptographic techniques to address these issues.  How this is solved for cloud computing is another question (see Securing the cloud – Part B).

Cloud computing and storage can be just as diffuse and difficult to manage as a proliferation of PCs or small departmental linux servers.  However, such unmanage-ability is a very different issue, one intrinsic to decentralization and much harder to address.  Although it’s fairly easy to get a bill for any cloud services, it’s unclear whether IT will be able to see all of them to manage it.  Also, nothing seems able to stop some department from signing up for SalesForce.com or even to use Amazon EC2 to support an application they need.  The only remedy, as far as I can see to this problem, is adherence to strict corporate policy and practice.  So unmanageability remains an ongoing issue for decentralized computing for some time to come.

—-

Nonetheless, it seems as if decentralization via the cloud is back, at least until the next wave of consolidation hits.  My guess for the next driver of consolidation is to make application development much easier and quicker to accomplish for centralized data center infrastructure – application frameworks anyone?

Comments?

Data processing logistics

IBM System/370 Model 145 By jovike (cc) (from Flickr)
IBM System/370 Model 145 By jovike (cc) (from Flickr)

Chuck Hollis wrote a great post on “information logistics” as a new paradigm for IT centers to have to consider as they deploy applications around the globe and into the cloud.  The problem is that there’s lot’s of data to move around in order to make all this work.

Supercomputing’s Solution

Big data/super computing groups have been thinking about this problem for a long time and have some solutions that might help but it all harken’s back to batch processing and JCL (job control language) of the last century.  In my comment to Chuck’s post I mentioned the University of Wisconsin’s Condor(r) Project which can be used to schedule data transmission and data processing across distributed server nodes in a network, but there are others namely the Globus ToolKit 4 (GT4)  which creates a data grid to support collaborative research on PB of data currently being used by CERN for LHC data, EU for their data grid and others.  We have discussed Condor in our Free Cloud Storage and Cloud Computing post and GT4 in our 15PB a year created by CERN post.

These super computing projects were designed to move data around so that analysis could be done locally with results shared within the community.  However, at least with GT4, they replicate data at a number of nodes, which may not be storage efficient but does provide quicker access for data analysis.  In CERN, there are a hierarchy of nodes which participate in a GT4 data grid and the data is replicated between tiers and within peer nodes just to have better access to it.

In olden days, …

With JCL someone would code up a sequence of batch steps, each of which could be conditional on previous steps that would manipulate data into some transient and at the end, final form.  Sometimes JCL would invoke another job (set of JCL) for a follow on step if everything in this job worked as planned.  The JCL would wait in a queue until the data and execution resources were available for it.

This could mean mounting removable media, creating disk storage “datasets”, or waiting until other jobs were down with datasets being needed, jobs would execute in a priority sequence, and scheduling options could include using different hosts (servers) that would coordinate to provide job execution services.   For all I know, z/OS still supports JCL for batch processing, but it’s been a long time since I have used JCL.

Cloud computing and storage services

Where does that bring us for today. Cloud computing and Cloud storage are bringing this execution paradigm back into vogue. But instead of batch jobs, we are talking about virtual machines, or web applications or anything else that can be packaged up and run generically on anybody’s hardware and storage.

The only problem is that there are only application specific ways to control these execution activities.  I am thinking here of web services that hand off web handling to any web server that happens to have cycles to support it.  Similarly, database machines seem capable of handing off queries to any database server that has idle ergs to process with.  There are myriad others like this but they all seem specific to one application domain.  Nothing exists that is generic or can cross many application domains.

That’s where something like Condor, GT4 or god forbid, JCL can make some sense.  In essence, all of these approaches are application independent.  By doing so, they can be used for any number of applications to take advantage of cloud computing and cloud storage services.

Just had to get this out.  Chuck’s post had me thinking about JCL again and there had to be another solution.

Free P2P-Cloud Storage and Computing Services?

FFT_graph from Seti@home
FFT_graph from Seti@home

What would happen if somebody came up with a peer-to-peer cloud (P2P-Cloud) storage or computing service.  I see this as

  • Operating a little like Napster/Gnutella where many people come together and share out their storage/computing resources.
  • It could operate in a centralized or decentralized fashion
  • It  would allow access to data/computing resources anywhere from the internet

Everyone joining the P2P-cloud would need to set aside computing and/or storage resources they were willing to devote to the cloud.  By doing so, they would gain access to an equivalent amount (minus overhead) of other nodes computing and storage resources to use as they see fit.

P2P-Cloud Storage

For cloud storage the P2P-Cloud would create a common cloud data repository spread across all nodes in the network:

  • Data would be distributed across the network in such a way that would allow reconstruction within any reasonable time frame and would handle any reasonable amount of node outages without loss of data.
  • Data would be encrypted before being sent to the cloud rendering the data unreadable without the key.
  • Data would NOT necessarily be shared, but would be hosted on other users systems.

As such, if I were to offer up 100GB of storage to the P2P-Cloud, I would get at least a 100GB (less overhead) of protected storage elsewhere on the cloud to use as I see fit.  Some % of this would be lost to administration say 1-3% and redundancy protection say ~25% but the remaining 72GB of off-site storage could be very useful for DR purposes.

P2P-Cloud storage would provide a reliable, secure, distributed file repository that could be easily accessible from any internet location.  At a minimum, the service would be free and equivalent to what someone supplies (less overhead) to the P2P-Cloud Storage service.  If storage needs exceeded your commitment, more cloud storage could be provided at a modest cost to the consumer.  Such fees would be shared by all the participants offering excess [=offered – (consumed + overhead)] storage to the cloud .

P2P-Cloud Computing

Cloud computing is definitely more complex, but generally follows the Seti@HOME/BOINC model:

  • P2P-Cloud computing suppliers would agree to use something like a “new screensaver” which would perform computation while generating a viable screensaver.
  • Whenever the screensaver was invoked, it would start execution on the last assigned processing unit.  Intermediate work results would need to be saved and when completed, the answer could be sent to the requester and a new processing unit assigned.
  • Processing units would be assigned by the P2P-Cloud computing consumer, would be timeout-able and re-assignable at will.

Computing users won’t gain much if the computing time they consume is <= the computing time they offer (less overhead).  However, computing time offset may be worth something, i.e., computing time now might be more valuable than computing time tonite.  Which may offer a slight margin of value to help get this off the ground.  As such, P2P-Cloud computing suppliers would need to be able to specify when computing resources might be mostly available along with the type, quality and quantity.

Unclear how to secure the processing unit and this makes legal issues more prevalent.  That may not be much of a problem, as a complex distributed computing task makes little sense in isolation. But the (il-)legality of some data processing activities could conceivably put the provider in a precarious position. (Somebody from the legal profession would need clarify all this, but I would think that some “Amazon C2” like licensing might offer safe harbor here).

P2P-Cloud computing services wouldn’t necessarily be amenable to the more normal, non-distributed or linear computing tasks but one could view these as just a primitive version of distributed computing tasks.  In either case, any data needed for computation would need to be sent along with the computing software to be run on a distributed node.  Whether it’s worth the effort is something for the users to debate.

BOINC can provide a useful model here.  Also, the Condor(R) project at U. of Wisconsin/Madison can provide a similar framework for scheduling the work of a “less distributed” computing task model.  In my mind, both types of services ultimately need to be provided.

To generate more compute servers, the SETI@Home and similar BOINC projects rely on doing good deeds.  As such, if you can make your computing task  do something of value to most users then maybe that’s enough. In that case, I would suggest joining up as a BOINC project. For the rest of us, doing more mundane data processing, just offering our compute services to the P2P-Cloud will need to suffice.

Starting up the P2P-Cloud

Bootstrapping the P2P-Cloud might take some effort but once going it should be self sustaining (assuming no centralized infrastructure).  I envision an open source solution, taking off from the work done on Napster&Gnutella and/or Boinc&Condor.

I believe the P2P-Cloud Storage service would be the easiest to get started.  BOINC and SETI@home (list of active Boinc projects) have been around a lot longer than cloud storage but their existence suggests that with the right incentives, even the P2P-Cloud Computing service can make sense.