What is cloud storage good for?

Facebook friend carrousel by antjeverena (cc) (from flickr)
Facebook friend carrousel by antjeverena (cc) (from flickr)

Cloud storage has emerged  as a viable business service in the last couple of years, but what does cloud storage really do for the data center.  Moving data out to the cloud makes for unpredictable access times with potentially unsecured and unprotected data.  So what does the data center gain by using cloud storage?

  • Speed – it  often takes a long time (day-weeks-months) to add storage to in-house data center infrastructure.  In this case, having a cloud storage provider where one can buy additional storage by the GB/Month may make sense if one is developing/deploying new applications where speed to market is important.
  • Flexibility – data center storage is often leased or owned for long time periods.  If an application’s data storage requirements vary significantly over time then cloud storage, purchase-able or retire-able on a moments notice, may be just right.
  • Distributed data access – some applications require data to be accessible around the world.  Most cloud providers have multiple data centers throughout the world that can be used to host one’s data. Such multi-site data centers can be often be accessed much quicker than going back to a central data center.
  • Data archive – backing up data that is infrequently accessed wastes time and resources. As such, this data could easily reside in the cloud with little trouble.  References to such data would need to be redirected to one’s cloud provider but that’s about all that needs to be done.
  • Disaster recovery – disaster recovery for many data centers is very low on their priority list.  Cloud storage provides an easy, ready made solution to accessing one’s data outside the data center.  If you elect to copy all mission critical data out to the cloud on a periodic basis, then this data could theoretically be accessed anywhere, usable in many DR scenarios.

Probably some I am missing here but these will do for now.  Most cloud storage providers can provide any and all of these services.

Of course all these capabilities can be done in-house with additional onsite infrastructure, multi-site data centers, archive systems, or offsite backups.  But the question then becomes which is more economical.  Cloud providers can amortize their multi-site data centers across many customers and as such, may be able to provide these services much cheaper than could be done in-house.

Now if they could only solve that unpredictable access time, …

7 grand challenges for the next storage century

Clock tower (4) by TJ Morris (cc) (from flickr)
Clock tower (4) by TJ Morris (cc) (from flickr)

I saw a recent IEEE Spectrum article on engineering’s grand challenges for the next century and thought something similar should be done for data storage. So this is a start:

  • Replace magnetic storage – most predictions show that magnetic disk storage has another 25 years and magnetic tape another decade after that before they run out of steam. Such end-dates have been wrong before but it is unlikely that we will be using disk or tape 50 years from now. Some sort of solid state device seems most probable as the next evolution of storage. I doubt this will be NAND considering its write endurance and other long-term reliability issues but if such issues could be re-solved maybe it could replace magnetic storage.
  • 1000 year storage – paper can be printed today with non-acidic based ink and retain its image for over a 1000 years. Nothing in data storage today can claim much more than a 100 year longevity. The world needs data storage that lasts much longer than 100 years.
  • Zero energy storage – today SSD/NAND and rotating magnetic media consume energy constantly in order to be accessible. Ultimately, the world needs some sort of storage that only consumes energy when read or written or such storage would provide “online access with offline power consumption”.
  • Convergent fabrics running divergent protocols – whether it’s ethernet, infiniband, FC, or something new, all fabrics should be able to handle any and all storage (and datacenter) protocols. The internet has become so ubiquitous becauset it handles just about any protocol we throw at it. We need the same or something similar for datacenter fabrics.
  • Securing data – securing books or paper is relatively straightforward today, just throw them in a vault/safety deposit box. Securing data seems simple but yet is not widely used today. It doesn’t have to be that way. We need better, more long lasting tools and methodology to secure our data.
  • Public data repositories – libraries exist to provide access to the output of society in the form of books, magazines, papers and other printed artifacts. No such repository exists today for data. Society would be better served if we could store and retrieve data if there were library like institutions could store data. Most of these issues are legal due to data ownership but technological issues exist here as well.
  • Associative accessed storage – Sequential and random access have been around for over half a century now. Associative storage could complement these and be another approach allowing storage to be retrieved by its content. We can kind of do this today by keywording and indexing data. Biological memory is accessed associations or linkages to other concepts, once accessed memory seem almost sequentially accessed from there. Something comparable to biological memory may be required to build more intelligent machines.

Some of these are already being pursued and yet others receive no interest today. Nonetheless, I believe they all deserve investigation, if storage is to continue to serve its primary role to society, as a long term storehouse for society’s culture, thoughts and deeds.

Comments?

Storage strategic inflection points

EMC vs S&P 500 Stock price chart
EMC vs S&P 500 Stock price chart - 20 yrs from Yahoo Finance

Both EMC and Spectra Logic celebrated their 30 years in business this month and it got me to thinking. Both companies started the same time but one is a ~$14B revenue (’09 projected) behemoth and the other a relatively successful, but relatively mid-size storage company (Spectra Logic is private and does not report revenues). What’s the big difference between these two. As far as I can tell both companies have been adequately run for some time now by very smart people. Why is one two or more orders of magnitude bigger than the other – recognizing strategic inflection points is key.

So what is a strategic inflection point? Andy Grove may have coined the term and calls a strategic inflection point a point “… where the old strategic picture dissolves and gives way to the new.” In my view EMC has been more successful at recognizing storage strategic inflection points than Spectra Logic and this explains a major part of their success.

EMC’s history in brief

In listening this week to Joe Tucci’s talk at EMC Analyst Days he talked about the rather humble beginnings of EMC. It started out selling furniture and memory for mainframes (I think) but Joe said it really took off in 1991, almost 12 years after it was founded. It seems they latched onto some DRAM based SSD like storage technology and converted it to use disk as a RAID storage device in the mainframe and later open systems arena. RAID killed off the big (14″ platter) disk devices that had dominated storage at that time and once started could not be stopped. Whether by luck or smarts EMC’s push into RAID storage made them what they are today – probably a little of both.

It was interesting to see how this played out in the storage market space. RAID used smaller disks, first 8″, then 5.25″ and now 3.5″. When first introduced, manufacturing costs for the RAID storage were so low that one couldn’t help but make a profit selling against big disk devices that held 14″ platters. The more successful RAID became, the more available and reliable the smaller disks became which led to a virtuous cycle culminating in the highly reliable 3.5″ disk devices available today. Not sure Joe was at EMC at the time but if he was he would probably have called that transition between big platter disks and RAID a “strategic inflection point” in the storage industry at the time.

Most of EMC’s competitors and customers would probably say that aggressive marketing also helped propel EMC to be the top of the storage heap. I am not sure which came first, the recognition of a strategic inflection like RAID or the EMC marketing machine but, together, they gave EMC a decided advantage that re-constructed the storage industry.

Spectra Logic’s history in brief

As far as I can tell Spectra Logic has been in the backup software for a long time and later started supporting tape technology where they are well known today. Spectra Logic has disk storage systems as well but they seem better known for their tape and backup technology.

The big changes in tape technology over the past 30 years have been tape cartridges and robotics. Although tape cartridges were introduced by IBM (for the IBM 3480 in 1985), the first true tape automation was introduced by Storage Technology Corp. (with the STK 4400 in 1987). Storage Technology rode the wave of the robotics revolution throughout the late 80’s into the mid 90’s and was very successful for a time. Spectra Logic’s entry into tape robotics was sometime later (1995) but by the time they got onboard it was a very successful and mature technology.

Nonetheless, the revolution in tape technology and operations brought on by these two advances, probably held off the decline in tape for a decade or two, and yet it could not ultimately stem the tide in tape use apparent today (see my post on Repositioning of tape). Spectra Logic has recently introduced a new tape library.

Another strategic inflection point that helped EMC

Proprietary “Open” Unix systems had started to emerge in the late 80’s and early 90’s and by the mid 90’s were beginning to host most new and sophisticated applications. The FC interface also emerged in the early to mid 90’s as a replacement to HPC-HPPI technology and for awhile battled it out against SSA technology from IBM but by 1997 emerged victorious. Once FC and the follow-on higher level protocols (resulting in SAN) were available, proprietary Unix systems had the IO architecture to support any application needed by the enterprise and they both took off feeding on each other. This was yet another strategic inflection point and I am not sure if EMC was the first entry into this market but they sure were the biggest and as such, quickly emerged to dominate it. In my mind EMC’s real accelerated growth can be tied to this timeframe.

EMC’s future bets today

Again, today, EMC seems to be in the fray for the next inflection. Their latest bets are on virtualization technology in VMware, NAND-SSD storage and cloud storage. They bet large on the VMware acquisition and it’s working well for them. They were the largest company and earliest to market with NAND-SSD technology in the broad market space and seem to enjoy a commanding lead. Atmos is not the first cloud storage service out there, but once again EMC was one of the largest companies to go after this market.

One can’t help but admire a company that swings for the bleachers every time they get a chance at bat. Not every one is going out of the park but when they get ahold of one, sometimes they can change whole industries.

Ibrix reborn as HP X9000 Network Storage

HP X9000 appliances pictures from HP(c) presentation
HP X9000 appliances pictures from HP(c) presentation

On Wednesday 4 November, HP announced a new network storage system based on the Ibrix Fusion file system called the X9000. Three versions were announced:

  • X9300 gateway appliance which can be attached to SAN storage (HP EVA, MSA, P4000, or 3rd party SAN storage) and provides scale out file system services
  • X9320 performance storage appliance which includes a fixed server gateway and storage configuration in one appliance targeted at high performance application environments
  • X9720 extreme storage appliance using blade servers for file servers and separate storage in one appliance but can be scaled up (with additional servers and storage) as well as out (by adding more X9720 appliances) to target more differentiated application environments

The new X9000 appliances support a global name space of 16PB by adding additional X9000 network storage appliances to a cluster. The X9000 supports a distributed metadata architecture which allows the system to scale performance by adding more storage appliances.

X9000 Network Storage appliances

With the X9300 gateway appliance, storage can be increased by adding more SAN arrays. Presumably, multiple gateways can be configured to share the same SAN storage creating a highly available file server node. The gateway can be configured to support the following Gige, 10Gbe, and/or QDR (40gb/s) Infiniband interfaces for added throughput.

The Extreme appliance (X9720) comes with 82 TB in the starting configuration and storage can be increased by in 82TB raw capacity block increments (7u-1/2rack wide/35*2 drive enclosures + 1-12 drive tray for each capacity block) up to a maximum of 656TB in two rack (42U) configuration. Capacity blocks are connected to the file servers via 3gb SAS, and the X9720 includes a SAS switch as well as two ProCurve 10Gbe ethernet switches. Also, file system performance can be scaled by independently adding performance blocks, essentially C-class HP blade servers. The starter configuration includes 3 performance blocks (blades) but up to 8 can be added to one X9720 appliance.

For the X9320 scale out appliance, performance and capacity are fixed in a 12U rack mountable appliance that includes 2-X9300 gateways and 21.7TB SAS or 48TB SATA raw storage per appliance. The X9320 comes with either GigE or 10Gbe attachments for added performance. The 10Gbe version supports up to 700MB/s raw potential throughput per gateway (node).

X9000 capabilities

All these systems have separate, distinct internal-like storage devoted to O/S, file server software and presumably metadata services. In the X9300 and X9320 storage, this internal storage is packaged in the X9300 gateway server itself. In the X9720, presumably this internal storage is configured via storage blades in the blade server cabinet which would need to be added with each performance block.

All X9000 storage is now based on the Fusion file system technology acquired by HP from Ibrix, an acquisition which closed this summer. Ibrix’s Fusion file system provided a software only implementation of a distributed (or segmented) metadata serviced file system which allowed the product to scale out performance and/or capacity, independently by adding appropriate hardware.

HP’s X9000 supports both NFS and CIFS interfaces. Moreover, a\Advanced storage features such as continuous remote file replication, snapshot, high availability (with two or more gateways/performance blocks), and automated policy driven data tiering also come with the X9000 Network Storage system. In additition, file data is automatically re-distributed across all nodes in X9000 appliance to ballance storage performance across nodes. Every X9000 Network Storage system requires a separate management server to manage the X9000 Network Storage nodes but one server can support the whole 16PB name space.

I like the X9720 and look forward to seeing some performance benchmarks on what it can do. In the past Ibrix never released a SPECsfs(tm) benchmark, presumably because they were a software only solution. But now that HP has instantiated it with top-end hardware there seems to be no excuse to providing benchmark comparisons.

Full disclosure: I have an current contract with another group within HP StorageWorks, not associated with HP X9000 storage.

Symantec's FileStore

Picture of old filing shelves to hold spare parts
Data Storage Device by BinaryApe (cc) (from flickr)
Earlier this week Symantec GA’ed their Veritas FileStore software. This software was an outgrowth of earlier Symantec Veritas Cluster File System and Storage Foundation software which were combined with new frontend software to create scaleable NAS storage.

FileStore is another scale-out, cluster file system (SO/CFS) implemented as NAS head via software. The software runs on a hardened Linux OS and can run on any commodity x86 hardware. It can be configured with up to 16 nodes. Also, it currently supports any storage supported by Veritas Storage Foundation which includes FC, iSCSI, and JBODs. Symantec claims FileStoreo has the broadest storage hardware compatibility list in the industry for a NAS head.

As a NAS head FileStore supports NFS, CIFS, HTTP, and FTP file services and can be configured to support anywhere from under a TB to over 2PB of file storage. Currently FileStore can support up to 200M files per file system, up to 100K file systems, and over 2PB of file storage.

FileStore nodes work in an Active-Active configuration. This means any node can fail and the other, active nodes will take over providing the failed node’s file services. Theoretically this means that in a 16 node system, 15 nodes could fail and the lone remaining node could continue to service file requests (of course performance would suffer considerably).

As part of cluser file system, FileStore support quick failover of active nodes. This can be accomplished in under 20 seconds. In addition, FileStore supports asynchronous replication to other FileStore clusters to support DR and BC in the event of a data center outage.

One of the things that FileStore brings to the table is that as it’s running standard Linux O/S services. This means other Symantec functionality can also be hosted on FileStore nodes. The first Symantec service to be co-hosted with FileStore functionality is NetBackup Advanced Client services. Such a service can have the FileStore node act as a media server for it’s own backup cutting network traffic required to do a backup considerably.

FileStore also supports storage tiering whereby files can be demoted and promoted between storage tiers in the multi-volume file system. Also, Symantec EndPoint Protection can be hosted on a FileStore node provided anti-virus protection completely onboard. Other Symantec capabilities will soon follow to add to the capabilities already available.

FileStore’s NFS performance

Regarding performance, Symantec has submitted a 12 node FileStore system for SPECsfs2008 NFS performance benchmark. I looked today to see if it was published yet and it’s not available but they claim to currently be the top performer for SPECsfs2008 NFS operations. I asked about CIFS and they said they had yet to submit one. Also they didn’t mention what the backend storage looked like for the benchmark, but one can assume it had lots of drives (look to the SPECsfs2008 report whenever it’s published to find out).

In their presentation they showed a chart depicting FileStore performance scaleability. According to this chart, at 16 nodes, the actual NFS Ops performance was 93% of theoretical NFS Ops performance. In my view, scaleability is great but often as you approach some marginal utility as the number of nodes increases, the net performance improvement decreases. The fact that they were able to hit 93% with 16 nodes of what a linear extrapolation of NFS ops performance was from 2 to 8 nodes is pretty impressive. (I asked to show the chart but hadn’t heard back by post time

Pricing and market space

At the lowend, FileStore is meant to compete with Windows Storage Server and would seem to provide better performance and availability versus Windows. At the high end, I am not sure but the competition would be with HP/PolyServe and standalone NAS heads from EMC and NetApp/IBM and others. List pricing is about US$7K/node and that top performing SPECsfs2008 12-node system would set you back about $84K for the software alone (please note that list pricing <> street pricing). You would need to add node hardware and the storage hardware to provide a true apples-to-apples pricing comparison with other NAS storage.

As far as current customers they range from large from the high end (>1PB) E-retailers to SAAS providers (Symantec SAAS offering), and at the low end (<10TB) universities and hospitals. FileStore with it’s inherent scaleability and ability to host storage applications from Symantec on the storage nodes can offer a viable solution to many hard file system problems.

We have discussed scale-out and cluster file systems (SO/CFS) in a prior post (Why SO/CFS, Why Now) so I won’t elaborate on why they are so popular today. But, suffice it to say Cloud and SAAS will need SO/CFS to be viable solutions and everybody is responding to supply that market as it emerges.

Full disclosure: I currently have no active or pending contracts with Symantec.

Chart of the month: SPC-1 LRT performance results

Chart of the Month: SPC-1 LRT(tm) performance resultsThe above chart shows the top 12 LRT(tm) (least response time) results for Storage Performance Council’s SPC-1 benchmark. The vertical axis is the LRT in milliseconds (msec.) for the top benchmark runs. As can be seen the two subsystems from TMS (RamSan400 and RamSan320) dominate this category with LRTs significantly less than 2.5msec. IBM DS8300 and it’s turbo cousin come in next followed by a slew of others.

The 1msec. barrier

Aside from the blistering LRT from the TMS systems one significant item in the chart above is that the two IBM DS8300 systems crack the <1msec. barrier using rotating media. Didn’t think I would ever see the day, of course this happened 3 or more years ago. Still it’s kind of interesting that there haven’t been more vendors with subsystems that can achieve this.

LRT is probably most useful for high cache hit workloads. For these workloads the data comes directly out of cache and the only thing between a server and it’s data is subsystem IO overhead, measured here as LRT.

Encryption cheap and fast?

The other interesting tidbit from the chart is that the DS5300 with full drive encryption (FDE), (drives which I believe come from Seagate) cracks into the top 12 at 1.8msec exactly equivalent with the IBM DS5300 without FDE. Now FDE from Seagate is a hardware drive encryption capability and might not be measurable at a subsystem level. Nonetheless, it shows that having data security need not reduce performance.

What is not shown in the above chart is that adding FDE to the base subsystem only cost an additional US$10K (base DS5300 listed at US$722K and FDE version at US$732K). Seems like a small price to pay for data security which in this case is simply turn it on, generate keys, and forget it.

FDE is a hard drive feature where the drive itself encrypts all data written and decrypts all data read to from a drive and requires a subsystem supplied drive key at power on/reset. In this way the data is never in plaintext on the drive itself. If the drive were taken out of the subsystem and attached to a drive tester all one would see is ciphertext. Similar capabilities have been available in enterprise and SMB tape drives is the past but to my knowledge the IBM DS5300 FDE is the first disk storage benchmark with drive encryption.

I believe the key manager for the DS5300 FDE is integrated within the subsystem. Most shops would need a separate, standalone key manager for more extensive data security. I believe the DS5300 can also interface with an standalone (IBM) key manager. In any event, it’s still an easy and simple step towards increased data security for a data center.

The full report on the latest SPC results will be up on my website later this week but if you want to get this information earlier and receive your own copy of our newsletter – email me at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_Newsletter.

XAM and data archives

Vista de la Biblioteca Vasconcelos by Eneas
Vista de la Biblioteca Vasconcelos by Eneas

XAM, a SNIA defined interface standard supporting reference data archives, is starting to become real. EMC and other vendors are starting to supply XAM compliant interfaces.  I could not locate (my Twitter survey for application vendors came back empty) any application vendors supporting XAM APIs but its only a matter of time .  What does XAM mean for your data archive?

The problem

Most IT shops with data archives use special purpose applications that support a vendor defined proprietary interface to store and retrieve data out of a dedicated archive appliance. For example, many email archives support EMC Centerra which has defined a proprietary Centerra API to store and retrieve data from their appliance.  Most other archive storage vendors have followed suit.  Leading to a proprietary vendor lock-in which slows adoption.

However, some proprietary APIs have been front-ended with something like NFS. The problem with NFS and other standard file interfaces is that they were never meant for reference data (data that does not change). So when you try to update an archived file one often gets some sort of weird system error.

Enter XAM

It was designed from the start for reference data. Moreover, XAM supports concurrent access to multiple vendor archive storage systems from the same application. As such, an application supplier need only code to one standard API to gain access to multiple vendor archive systems.

SNIA released the V1.0 XAM interface specfication last July  which defines XAM architecture, C- and JAVA-language API for both the application and the storage vendor.   Although from the looks of it the C version of vendor API is more complete.

However, currently I can only locate two archive storage vendors having released support for the XAM interface (EMC Centerra and SAND/DNA?).   A number of vendors have expressed interest in providing XAM interfaces (HP, HDS HCAP, Bycast StorageGrid and others).   How soon their XAM API support will be provided is TBD.

I would guess what’s really needed is for more vendors to start supporting XAM interface which would get the application vendors more interested in supporting XAM.   Its sort of a chicken and egg thing but I believe the storage vendors have the first move, the application vendors will take more time to see the need.

Does anyone know what other storage vendors support XAM today. Is there any single place where one could even find out? Ditto for applications supporting XAM today?

Digital Rosetta Stone vs 3d-Barcodes

The BBC reported today on a new way to store digital data for 1000 years coming out of Japan (BBC NEWS | Technology | ‘Rosetta stone’ offers digital lifeline). Personally, I don’t feel that silicon storage is the best answer to this problem, and “wireless” read-back may be problematic over protracted periods of time.

Something more like a 3-dimensional bar code makes a lot more sense to me. Such a recording device could easily record a lot more data than paper does today, be readable via laser scans, microscope, or other light based mechanisms, and by being a physical representation, could be manufactured out of many different materials.

It’s not to say that silicon might not be a good material, lasting for a long time. The article did not go into detail how the data was recorded but presumably this etched storage device somehow trapped a charge in a particular cell that could be read back electronically – not unlike NAND flash does today but with much better reliability. But it is unclear to me why the article states that humidity surrounding the Digital Rosetta Stone device impairs storage longevity. This seems to imply that even though the device is sealed it still can be impacted by external environmental conditions.

That’s why having a recording device that can be made up of many types of materials makes more sense to me. Such a device could conceivably be etched out of marble, ceramics, steel, or any number of other materials. Marble has lasted for millennia in Greece, Italy, and other places. Of course marble is subject to weather and acid rain. But the point is by having multiple substances that can be used to record data for long periods, all using the same recording format and read-back mechanisms we can insure that any number of them can retain data for a long time in the future. Such a 3d barcode could also be sealed in any transparent media such as glass which also has been known to last centuries.

Today 3d barcodes can be attached to a surface of a cube, but they could just as easily be attached to a plate, disk, or page. Once attached (or printed) they could easily record vast amounts of data.

In my view magnetic storage cannot last for over 50 years, electronic storage will not last over 100 years, and the only thing I know of that can last a 1000 years is some physical mechanism. 3D barcodes easily emerges as the answer to this storage problem.