Top 10 storage technologies over the last decade

Aurora's Perception or I Schrive When I See Technology by Wonderlane (cc) (from Flickr)
Aurora's Perception or I Schrive When I See Technology by Wonderlane (cc) (from Flickr)

Some of these technologies were in development prior to 2000, some were available in other domains but not in storage, and some were in a few subsystems but had yet to become popular as they are today.  In no particular order here are my top 10 storage technologies for the decade:

  1. NAND based SSDs – DRAM and other technology solid state drives (SSDs) were available last century but over the last decade NAND Flash based devices have dominated SSD technology and have altered the storage industry forever more.  Today, it’s nigh impossible to find enterprise class storage that doesn’t support NAND SSDs.
  2. GMR head– Giant Magneto Resistance disk heads have become common place over the last decade and have allowed disk drive manufacturers to double data density every 18-24 months.  Now GMR heads are starting to transition over to tape storage and will enable that technology to increase data density dramatically
  3. Data DeduplicationDeduplication technologies emerged over the last decade as a complement to higher density disk drives as a means to more efficiently backup data.  Deduplication technology can be found in many different forms today, ranging from file and block storage systems, backup storage systems, to backup software only solutions.
  4. Thin provisioning – No one would argue that thin provisioning emerged last century but it took the last decade to really find its place in the storage pantheon.  One almost cannot find a data center class storage device that does not support thin provisioning today.
  5. Scale-out storage – Last century if you wanted to get higher IOPS from a storage subsystem you could add cache or disk drives but at some point you hit a subsystem performance wall.  With scale-out storage, one can now add more processing elements to a storage system cluster without having to replace the controller to obtain more IO processing power.  The link reference talks about the use of commodity hardware to provide added performance but scale-out storage can also be done with non-commodity hardware (see Hitachi’s VSP vs. VMAX).
  6. Storage virtualizationserver virtualization has taken off as the dominant data center paradigm over the last decade but a counterpart to this in storage has also become more viable as well.  Storage virtualization was originally used to migrate data from old subsystems to new storage but today can be used to manage and migrate data over PBs of physical storage dynamically optimizing data placement for cost and/or performance.
  7. LTO tape When IBM dominated IT in the mid to late last century, the tape format dejour always matched IBM’s tape technology.  As the decade dawned, IBM was no longer the dominant player and tape technology was starting to diverge into a babble of differing formats.  As a result, IBM, Quantum, and HP put their technology together and created a standard tape format, called LTO, which has become the new dominant tape format for the data center.
  8. Cloud storage Unclear just when over the last decade cloud storage emerged but it seemed to be a supplement to cloud computing that also appeared this past decade.  Storage service providers had existed earlier but due to bandwidth limitations and storage costs didn’t survive the dotcom bubble. But over this past decade both bandwidth and storage costs have come down considerably and cloud storage has now become a viable technological solution to many data center issues.
  9. iSCSI SCSI has taken on many forms over the last couple of decades but iSCSI has the altered the dominant block storage paradigm from a single, pure FC based SAN to a plurality of technologies.  Nowadays, SMB shops can have block storage without the cost and complexity of FC SANs over the LAN networking technology they already use.
  10. FCoEOne could argue that this technology is still maturing today but once again SCSI has taken opened up another way to access storage. FCoE has the potential to offer all the robustness and performance of FC SANs over data center Ethernet hardware simplifying and unifying data center networking onto one technology.

No doubt others would differ on their top 10 storage technologies over the last decade but I strived to find technologies that significantly changed data storage that existed in 2000 vs. today.  These 10 seemed to me to fit the bill better than most.

Comments?

EMC to buy Isilon Systems

Isilon X series nodes (c) 2010 Isilon from Isilon's website
Isilon X series nodes (c) 2010 Isilon from Isilon's website

I understand the rationale behind EMC’s purchase of Isilon scale out NAS technology for big data applications.  More and more data is being created every day and most of that unstructured.  How can one begin to support multiple PBs of file data that’s coming online in the next couple of years without scale out NAS.  Scale out NAS has the advantage that within the same architecture one can scale from TBs to PBs of file storage by just adding storage and/or accessor nodes.  Sounds great.

Isilon for backup storage?

But what’s surprising to me is the use of Isilon NL-Series storage in more mundane applications like Database backup.  A couple of weeks ago I wrote a post on how Oracle RMAN compressed backups don’t dedupe very well.  The impetus for that post was that a very large enterprise customer I was talking with had just started deploying Isilon NAS systems in their backup environment to handle non-dedupable data.  The customer was backing up PB of storage, a good portion of which was non-dedupable, and as such, they planed to use Isilon Systems to store this data.

I had never seen scale out NAS systems used for backup storage so I was intrigued to find out why.  Essentially, this customer was in the throws of replacing tape and between deduplication appliances and Isilon storage they believed they had the solutions to eliminate tape forever from their backup systems.

All this begs the question where does EMC put Isilon –  with Celerra and other storage platforms, with Atmos and other cloud services, or with Data Domain and other backup systems?  It seems one could almost break out the three Isilon storage systems and split them into these three business groups but given Isilon’s flexibility it probably belongs in storage platforms.

However, I would think that BRS would have an immediate market requirement for Isilon’s NL-Series storage to complement it’s other backup systems.  I guess we will know shortly where EMC puts it – until then it’s anyone’s guess.

Cloud storage replication does not suffice for backups – revisited

Free Whipped Cream Clouds on True Blue Sky Creative Commons by Pink Sherbet Photography (cc) (from Flickr)
Free Whipped Cream Clouds on True Blue Sky Creative Commons by Pink Sherbet Photography (cc) (from Flickr)

I was talking with another cloud storage gateway provider today and I asked them if they do any sort of backup for data sent to the cloud.  His answer disturbed me – they said they depend on backend cloud storage providers replication services to provide data protection – sigh. Curtis and I have written about this before (see my Does Cloud Storage need Backup? post and Replication is not backup by W. Curtis Preston).

Cloud replication is not backup

Cloud replication is not data protection for anything but hardware failures!   Much more common than hardware failures is mistakes by end-users who inadvertently delete files, overwrite files, corrupt files, or systems that corrupt files any of which would just be replicated in error throughout the cloud storage multi-verse.  (In fact, cloud storage itself can lead to corruption see Eventual data consistency and cloud storage).

Replication does a nice job of covering a data center or hardware failure which leaves data at one site inaccessible but allows access to a replica of the data from another site.  As far as I am concerned there’s nothing better than replication for these sorts of DR purposes but it does nothing for someone deleting the wrong file. (I one time did a “rm * *” command on a shared Unix directory – it wasn’t pretty).

Some cloud storage (backend) vendors delay the deletion of blobs/containers until sometime later  as one solution to this problem.  By doing this, the data “stays around” for “sometime” after being deleted and can be restored via special request to the cloud storage vendor. The only problem with this is that “sometime” is an ill-defined, nebulous concept which is not guaranteed/specified in any way.  Also, depending on the “fullness” of the cloud storage, this time frame may be much shorter or longer.  End-user data protection cannot depend on such a wishy-washy arrangement.

Other solutions to data protection for cloud storage

One way is to have a local backup of any data located in cloud storage.  But this kind of defeats the purpose of cloud storage and has the cloud data being stored both locally (as backups) and remotely.  I suppose the backup data could be sent to another cloud storage provider but someone/somewhere would need to support some sort of versioning to be able to keep multiple iterations of the data around, e.g., 90 days worth of backups.  Sounds like a backup package front-ending cloud storage to me…

Another approach is to have the gateway provider supply some sort of backup internally using the very same cloud storage to hold various versions of data.  As long as the user can specify how many days or versions of backups can be held this works great, as cloud replication supports availability in the face of hardware failures and multiple versions support availability in the face of finger checks/logical corruptions.

This problem can be solved in many ways, but just using cloud replication is not one of them.

Listen up folks, whenever you think about putting data in the cloud, you need to ask about backups among other things.  If they say we only offer data replication provided by the cloud storage backend – go somewhere else. Trust me, there are solutions out there that really backup cloud data.

Poor deduplication with Oracle RMAN compressed backups

Oracle offices by Steve Parker (cc) (from Flickr)
Oracle offices by Steve Parker (cc) (from Flickr)

I was talking with one large enterprise customer today and he was lamenting how poorly Oracle RMAN compressed backupsets dedupe. Apparently, non-compressed RMAN backup sets generate anywhere from 20 to 40:1 deduplication ratios but when they use RMAN backupset compression, their deduplication ratios drop down to 2:1.  Given that RMAN compression probably only adds another 2:1 compression ratio then the overall data reduction becomes something ~4:1.

RMAN compression

It turns out Oracle RMAN supports two different compression algorithms that can be used zlib (or gzip) and bzip2.  I assume the default is zlib and if you want to one can specify bzip2 for even higher compression rates with the commensurate slower or more processor intensive compression activity.

  • Zlib is pretty standard repeating strings elimination followed by Huffman coding which uses shorter bit strings to represent more frequent characters and longer bit strings to represent less frequent characters.
  • Bzip2 also uses Huffman coding but only after a number of other transforms such as run length encoding (changing duplicated characters to a count:character sequence), Burrows–Wheeler transform (changes data stream so that repeating characters come together), move-to-front transform (changes data stream so that all repeating character strings are moved to the front), another run length encoding step, huffman encoding, followed by another couple of steps to decrease the data length even more…

The net of all this is that a block of data that is bzip2 encoded may look significantly different if even one character is changed.  Similarly, even zlib compressed data will look different with a single character insertion, but perhaps not as much.  This will depend on the character and where it’s inserted but even if the new character doesn’t change the huffman encoding tree, adding a few bits to a data stream will necessarily alter its byte groupings significantly downstream from that insertion. (See huffman coding to learn more).

Deduplicating RMAN compressed backupsets

Sub-block level deduplication often depends on seeing the same sequence of data that may be skewed or shifted by one to N bytes between two data blocks.  But as discussed above, with bzip2 or zlib (or any huffman encoded) compression algorithm the sequence of bytes looks distinctly different downstream from any character insertion.

One way to obtain decent deduplication rates from RMAN compressed backupsets would be to decompress the data at the dedupe appliance and then run the deduplication algorithm on it – dedupe appliance ingestion rates would suffer accordingly.  Another approach is to not use RMAN compressed backupsets but the advantages of compression are very appealing such as less network bandwidth, faster backups (because they are not transferring as much data), and quicker restores.

Oracle RMAN OST

On the other hand, what might work is some form of Data Domain OST/Boost like support from Oracle RMAN which would partially deduplicate the data at the RMAN server and then send the deduplicated stream to the dedupe appliance.  This would provide less network bandwidth and faster backups but may not do anything for restores.  Perhaps a tradeoff worth investigating.

As for the likelihood that Oracle would make such services available to deduplicatione vendors, I would have said this was unlikely but ultimately the customers have a say here.   It’s unclear why Symantec created OST but it turned out to be a money maker for them and something similar could be supported by Oracle.  Once an Oracle RMAN OST-like capability was in place, it shouldn’t take much to provide Boost functionality on top of it.  (Although EMC Data Domain is the only dedupe vendor that has Boost yet for OST or their own Networker Boost version.)

—-

When I first started this post I thought that if the dedupe vendors just understood the format of the RMAN compressed backupsets they would be able to have the same dedupe ratios as seen for normal RMAN backupsets.  As I investigated the compression algorithms being used I became convinced that it’s a computationally “hard” problem to extract duplicate data from RMAN compressed backupsets and ultimately would probably not be worth it.

So, if you use RMAN backupset compression, probably ought to avoid deduplicating this data for now.

Anything I missed here?

EMC NetWorker 7.6 SP1 surfaces

Photo of DD880 appliance (from EMC.com)
Photo of DD880 appliance (from EMC.com)

This week EMC releases NetWorker 7.6 SP1 with new Boost support for Data Domain (DD) appliances which allows NetWorker’s storage node (media server) and the DD appliance to jointly work on providing deduplication services.  Earlier this year EMC DD announced the new Boost functionality which at the time only worked with Symantec’s OST interface. But with this latest service pack (SP1), NetWorker also offers this feature and EMC takes another step to integrate DD systems and functonality across their product portfolio.

DD Boost integration with NetWorker

DD Boost functionality resides on the NetWorker storage node which transfers data to backend storage.  Boost offloads the cutting up of data into segments fingerprinting segments and passing the fingerprints to DD.  Thereafter NetWorker only passes unique data between the storage node and the DD appliance.

Doing this reduces the processing workload on DD appliance, uses less network bandwidth, and on the NetWorker storage node itself, reduces the processing requirements.  While this later reduction may surprise some, realize the storage node primarily moves data and with DD Boost, it moves less data, consuming less processing power. All in all, with NetWorker-DD Boost vs. NetWorker using DD in NFS mode there is a SIGNIFICANT improvement in data ingest performance/throughput.

DD cloning controlled by NetWorker

Also the latest SP incorporates DD management integration, such that an admin can control DataDomain replication from the NetWorker management console alone.  Thus, the operator no longer needs to use the DD management interface to schedule, monitor, and terminate DD replication services.

Additionally, NetWorker can now be aware of all DD replicas and as such, can establish separate retention periods for each replica all from the NetWorker management interface.  Another advantage is that now tape clones of DD data can be completely managed from the NetWorker management console.

Furthermore, one can now configure new DD appliances as a NetWorker resource using new configuration wizards.  NetWorker also supports monitoring and alerting on DD appliances through the NetWorker management console which includes capacity utilization and dedupe rates.

Other enhancements made to NetWorker

  • NetWorker Cloning – scheduling of clones no longer requires CLI scripts and is now can be managed within the GUI as well.  NetWorker cloning is the process which replicates save sets to other storage media.
  • NetWorker Checkpoint/Restart- resuming backups from known good points after a failure. Checkpoint/Restart can be used for very large save sets which cannot complete within a window.

New capacity based licensing for NetWorker

It seems like everyone is simplifying their licensing (see CommVault’s Simpana 9 release). With this version of NetWorker, EMC now supports a capacity based licensing option in addition to their current component- and feature-based  licensing.  With all the features of the NetWorker product, component-based licensing has become more complex and cumbersome to use.  The new Capacity License Option charges on the amount of data being protected and all NetWorker features are included at no additional charge.

The new licensing option is available worldwide, with no tiers of capacity based licensing for feature use, i.e., one level of capacity based licensing.  Capacity based licensing can be more cost effective for those using advanced NetWorker features, should be easier to track, and will be easier to install.  Anyone under current maintenance can convert to the new licensing model but it requires this release of NetWorker software.

—-

NetWorker’s 7.6 SP1 is not a full release but substantial nonetheless.  Not the least of which is the DD Boost and management integration being rolled out.  Also, I believe the new licensing option may appeal to a majority of their customer base but one has to do the math.  Probably some other enhancements I missed here but these seem the most substantial.

What do you think?

CommVault’s Simpana 9 release

CommVault annoucned a new release of their data protection product today – Simpana® 9.  The new software provides significantly enhanced support for VM backup, new source-level deduplication capabilities and other enhanced facilities.

Simpana 9 starts by defining 3 tiers of data protection based on their Snapshot Protection Client (SPC):

  • Recovery tier – using SPC application consistent hardware snapshots can be taken utilizing storage interfaces to create content aware granular level recovery.  Simpana 9 SPC now supports EMC, NetApp, HDS, Dell, HP, and IBM (including LSI) storage snapshot capabilities.  Automation supplied with Simpana 9 allows the user to schedule hardware snapshots at various intervals throughout the day such  that they can be used to recover data without delay.
  • Protection tier – using mounted snapshot(s) provided by SPC above, Simpana 9 can create an extract or physical backup set copy to any disk type (DAS, SAN, NAS) providing a daily backup for retention purposes. This data can be deduplicated and encrypted for increased storage utilization and data security.
  • Compliance tier – selective backup jobs can then be sent to cloud storage and/or archive appliances such as HDS HCP or Dell DX for long term retention and compliance, preserving CommVault’s deduplication and encryption.  Alternatively, compliance data can be sent to the cloud.  CommVault’s previous cloud storage support included Amazon S3, Microsoft Azure, Rackspace, Iron Mountain and Nirvanix, with Simpana 9, they now add EMC Atmos providers and Mezeo to the mix.

Simpana 9 VM backup support

Simpana 9 also introduces a SnapProtect Enable Virtual Server Agent (VSA) to speed up virtual machine datastore backups.  With VSA’s support for storage hardware snapshot backups and VMware facilities to provide application consistent backups, virtual server environments can now scale to 1000s of VMs without concern for backup’s processing and IO impact to ongoing activity.  VSA snapshots can be mounted afterwards to a proxy server and using VMware services extract file level content which CommVault can then data deduplicate, encrypt and offload to other media that allows for granular content recovery.

In addition, Simpana 9 supports auto-discovery of virtual machines with auto-assignment of data protection policies.  As such, VM guests can be automatically placed into an appropriate, pre-defined data protection regimen without the need for operator intervention after VM creation.

Also with all the meta-data content cataloguing, Simpana 9 now supplies a light weight file-oriented Storage Resources Manager capability via the CommVault management interface.  Such services can provide detailed file level analytics for VM data without the need for VM guest agents.

Simpana 9 new deduplication support

CommVault’s 1st gen deduplication with Simpana 7 was at the object level.  With Simpana 8 deduplication occured at the block level providing content aware variable block sizes and added software data encryption support for disk or tape backup sets.  With today’s release, Simpana 9 shifts some deduplication processing out to the source (the client) increasing backup data throughput by reducing data transfer. All this sounds similar to EMC’s Data Domain Boost capability introduced earlier this year .

Such a change takes advantage of the CommVault’s intelligent Data Agent (iDA) running in the clients to provide pre-deduplication hashing and list creation rather than doing this all at CommVault’s Media Agent node, reducing data to be transferred.  Further, CommVault’s data deduplication can be applied across a number of clients for a global deduplication service that spans remote clients as well as a central data center repositories.

Simpana 9 new non-CommVault backup reporting and migration capabilities

Simpana 9 provides a new data collector for NetBackup versions 6.0, 6.5, and 7.0 and TSM 6.1 which allows CommVault to discover other backup services in the environment, extract backup policies, client configurations, job histories, etc. and report on these foreign backup processes.  In addition, once their data collertor is in place, Simpana 9 also supports automated procedures that can roll out and convert all these other backup services to CommVault data protection over a weekend, vastly simplifying migration from non-CommVault to Simpana 9 data protection.

Simpana 9 new software licensing

CommVault is also changing their software licensing approach to include more options for capacity based licensing. Previously, CommVault supported limited capacity based licensing but mostly used CommVault architectural component level licensing.  Now, they have expanded the capacity licensing offerings and both licensing modes are available so the customer can select whichever approach proves best for them.  With CommVault’s capacity-based licensing, usage can be tracked on the fly to show when customers may need to purchase a larger capacity license.

Probably other enhancements I missed here as Simpana 9 was a significant changeover from Simpana 8. Nonetheless, this version’s best feature was their enhanced approach to VM backups, allowing more VMs to run on a single server without concern for backup overhead.  The fact that they do source-level pre-deduplication processing just adds icing to the cake.

What do you think?

Cloud storage, CDP & deduplication

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Somebody needs to create a system that encompasses continuous data protection, deduplication and cloud storage.  Many vendors have various parts of such a solution but none to my knowledge has put it all together.

Why CDP, deduplication and cloud storage?

We have written about cloud problems in the past (eventual data consistency and what’s holding back the cloud) despite all that, backup is a killer app for cloud storage.  Many of us would like to keep backup data around for a very long time. But storage costs govern how long data can be retained.  Cloud storage with its low cost/GB/month can help minimize such concerns.

We have also blogged about dedupe in the past (describing dedupe) and have written in industry press and our own StorInt dispatches on dedupe product introductions/enhancements.  Deduplication can reduce storage footprint and works especially well for backup which often saves the same data over and over again.  By combining deduplication with cloud storage we can reduce the data transfers and data stored on the cloud, minimizing costs even more.

CDP is more troublesome and yets still worthy of discussion.  Continuous data protection has always been sort of a step child in the backup business.  As a technologist, I understand it’s limitations (application consistency) and understand why it has been unable to take off effectively (false starts).   But, in theory at some point CDP will work, at some point CDP will use the cloud, at some point CDP will embrace deduplication and when that happens it could be the start of an ideal backup environment.

Deduplicating CDP using cloud storage

Let me describe the CDP-Cloud-Deduplication appliance that I envision.  Whether through O/S, Hypervisor or storage (sub-)system agents, the system traps all writes (forks the write) and sends the data and meta-data in real time to another appliance.  Once in the CDP appliance, the data can be deduplicated and any unique data plus meta data can be packaged up, buffered, and deposited in the cloud.  All this happens in an ongoing fashion throughout the day.

Sometime later, a restore is requested. The appliance looks up the appropriate mapping for the data being restored, issues requests to read the data from the cloud and reconstitutes (un-deduplicates) the data before copying it to the restoration location.

Problems?

The problems with this solution include:

  • Application consistency
  • Data backup timeframes
  • Appliance throughput
  • Cloud storage throughput

By tieing the appliance to a storage (sub-)system one may be able to get around some of these problems.

One could configure the appliance throughput to match the typical write workload of the storage.  This could provide an upper limit as to when the data is at least duplicated in the appliance but not necessarily backed up (pseudo backup timeframe).

As for throughput, if we could somehow understand the average write and deduplication rates we could configure the appliance and cloud storage pipes accordingly.  In this fashion, we could match appliance throughput to the deduplicated write workload (appliance and cloud storage throughput)

Application consistency is more substantial concern.  For example, copying every write to a file doesn’t mean one can recover the file.  The problem is at some point the file is actually closed and that’s the only time it is in an application consistent state.  Recovering to a point before or after this, leaves a partially updated, potentially corrupted file, of little use to anyone without major effort to transform it into a valid and consistent file image.

To provide application consistency, one needs to somehow understand when files are closed or applications quiesced.  Application consistency needs would argue for some sort of O/S or hypervisor agent rather than storage (sub-)system interface.  Such an approach could be more cognizant of file closure or application quiesce, allowing a synch point could be inserted in the meta-data stream for the captured data.

Most backup software has long mastered application consistency through the use of application and/or O/S APIs/other facilities to synchronize backups to when the application or user community is quiesced.  CDP must take advantage of the same facilities.

Seems simple enough, tie cloud storage behind a CDP appliance that supports deduplication.  Something like this could be packaged up in a cloud storage gateway or similar appliance.  Such a system could be an ideal application for cloud storage and would make backups transparent and very efficient.

What do you think?

Problems solved, introduced and left unsolved by cloud storage

Cloud whisps (sic) by turtlemom4bacon (cc) (from flickr)
Cloud whisps (sic) by turtlemom4bacon (cc) (from flickr)

When I first heard about cloud storage I wondered just what exactly it was trying to solve. There are many storage problems within the IT shop nowadays days, cloud storage can solve a few of them but introduces more and leaves a few unsolved.

Storage problems solved by cloud storage

  • Dynamic capacity – storage capacity is fixed once purchased/leased. Cloud storage provides an almost infinite amount of storage for your data. One pays for this storage, in GB or TB per month increments, with added storage services (multi-site replication, high availability, etc.) at extra charge. Such capacity can be reduced or expanded at a moments notice.
  • Offsite DR – disaster recovery for many small shops is often non-existent or rudimentary at best. Using cloud storage, data can be copied to the cloud and accessed anywhere via the internet. Such data copies can easily support rudimentary DR for a primary data center outage.
  • Access anywhere – storage is typically local to the IT shop and can normally only be accessed at that location. Cloud storage can be accessed from any internet access point. Applications that are designed to operate all over the world can easily take advantage of such storage.
  • Data replication – data should be replicated for high availability. Cloud storage providers can replicate your data to multiple sites so that if one site goes down other sites can still provide service.

Storage problems introduced by the cloud

  • Variable access times – local storage access times vary from 1 and 100 milleseconds. However, accessing cloud storage can take from 100’s of milleseconds to minutes depending on network connectivity. Many applications cannot endure such variable access times.
  • Different access protocols – local storage support fairly standard access protocols like FC, iSCSI, NFS, and/or CIFS/SMB. Barring the few (but lately increasing) cloud providers that provide NFS access protocol, most cloud storage requires rewriting applications to use new protocols such as REST to store and access cloud file data.
  • Governance over data – local storage is by definition all located inside one data center. Many countries do not allow personal and/or financial data to be stored outside the country of origin. Some cloud storage providers will not guarantee that data stored in the cloud couldn’t be stored outside the country and jurisdiction of a single country.

Storage problems not solved by the cloud:

  • Data backups – data protection via some form of backup is essential. Nothing says that cloud storage providers cannot provide backup of data in the cloud but few if any provide such service. See my Are backups needed in the cloud post.
  • Data security – data security remains an ongoing problem for the local data center moving the data to the cloud just makes security more difficult. Many cloud storage providers provide rudimentary security for data stored but none seem to have integrated strong authentication and encryption services that might provide true data security.
  • Energy consumption – today’s storage consumes power and cooling. Although, cloud storage can be more efficient than onsite storage, this does not eliminate the environmental cost of storage.
  • Data longevity – data stored in the cloud can just as easily go obsolete as data stored locally.

Probably some I have missed here but these are a good start.