To iPad or not to iPad – part 4

Apple iPad (wi-fi) (from apple.com)
Apple iPad (wi-fi) (from apple.com)

I took the iPad to another conference last month. My experience the last time I did this (see To iPad or not to iPad – part 3) made me much more leary, but I was reluctant to lug the laptop for only a 2-day trip.

Since my recent experience, I have become a bit more nuanced and realistic with my expectations for iPad use on such trips. As you may recall, I have an iPad without 3G networking.

When attending a conference and using a laptop, I occasionally take a few notes, do email, twitter, blog and other work related items. With my iPad I often take copius notes – unclear why other than it’s just easier/quicker to get out of my backpack/briefcase and start typing on. When I take fewer notes usually I don’t have a table/desk to use for the iPad and keyboard.

As for the other items email, twitter, and blogging, my iPad can do all of these items just fine with proper WiFi connectivity. Other work stuff can occasionally be done offline but occasionally requires internet access, probably ~50:50.

iPhone and iPad together

I have found that an iPhone and iPad can make a very useable combination in situations with flaky/inadequate WiFi. While the iPad can attempt to use room WiFi, the iPhone can attempt to use 3G data network to access the Internet. Mostly, the iPhone wins in these situations. This works especially well when WiFi is overtaxed at conferences. The other nice thing is that the BlueTooth (BT) keypad can be paired with either the iPad or the iPhone (it does take time, ~2-5 minutes to make the switch, so I don’t change pairing often).

So at the meeting this past month, I was doing most of my note taking and offline work items with the iPad and blogging, tweeting and emailing with the iPhone.

If the iPad WiFi was working well enough, I probably wouldn’t use the iPhone for most of this. However, I find that at many conferences and most US hotels, WiFi is either not available in the hotel room or doesn’t handle conference room demand well enough to depend on. Whereas, ATT’s 3G network seems to work just fine for most of these situations (probably because, no one is downloading YouTube videos to their iPhone).

A couple of minor quibbles

While this combination works well enough, I do have a few suggestions to make this even better to use,

  • Mouse support – Although, I love the touch screen for most tasks, editing is painful without a mouse. Envision this, you are taking notes, see an error a couple of lines back, and need to fix it. With the iPad/iPhone, one moves your hand from keypad to point to the error on the screen to correct it. Finger pointing is not as quick to re-position cursors as a mouse and until magnification kicks in obscures the error, leading to poor positioning. Using the BT keypad arrow keys are more accurate but not much faster. So, do to bad cursor positioning, I end up deleting and retyping many characters that weren’t needed. As a result, I don’t edit much on the iPad/iPhone. If a BT mouse (Apple’s magic mouse) would pair up with the iPad&iPhone editing would work much better. Alternatively, having some like the old IBM ThinkPad Trackpoint in the middle of a BT keypad would work just fine. Having the arrow keys respond much faster would even be better.
  • iPad to iPhone file transfer capability – Now that I use the iPad offline with an online iPhone, it would be nice if there was some non-Internet way to move data between the two. Perhaps using the BT’s GOEB capabilities to provide FTP-lite services would work. It wouldn’t need high bandwidth as typical use would be to only move a Pages, Numbers, or Keynote file to the iPhone for email attachment or blog posting . It would be great if this were bi-directional. Another option is supporting a USB port but would require more hardware. A BT file transfer makes more sense to me.
  • iPad battery power – Another thing I find annoying at long conferences is iPad battery power doesn’t last all day. Possibly having BT as well as WiFi active may be hurting battery life. My iPad often starts running out of power around 3pm at conferences. To conserve energy, I power down the display between note taking and this works well enough it seems. The display comes back alive whenever I hit a key on the BT keypad and often I don’t even have to retype the keystrokes used to restart the display. More battery power would help.

—-

So great, all this works just fine domestically, but my next business trip is to Japan. To that end, I have been informed that unless I want to spend a small fortune in roaming charges, I should disable iPhone 3G data services while out of country. As such, if I only take my iPad and iPhone, I will have no email/twitter/blog access whenever WiFi is unavailable. If I took a laptop at least it could attach to an Ethernet cable if that were available. However, I have also been told that WiFi is generally more available overseas. Wish me luck.

Anyone know how prevalent WiFi is in Tokyo hotels and airports and how well it works with iPhone/iPad?

Other comments?

Is cloud computing/storage decentralizing IT, again?

IBM Card Sorter by Pargon (cc) (From Flickr)
IBM Card Sorter by Pargon (cc) (From Flickr)

Since IT began, over the course of years, computing services have run through massive phases of decentralization out to departments followed by consolidation back to the data center.  In the early years of computing, from the 50s to the 60s, the only real distributed solution to mainframe or big iron data processing was sophisticated card sorters.

Consolidation-decentralization Wars

But back in the 70s the consolidation-decentralization wars were driven by the availability of mini-computers competing with mainframes for applications and users.  During the 80s, the PC emerged to become the dominant decentralizer taking applications away from mainframes and big servers and in the 90s it was small, off-the-shelf linux servers and continuing use of high-powered PCs that took applications out from data center control.

In those days it seemed that most computing decentralization was driven by the ease of developing applications for these upstarts and the relative low-cost of the new platforms.

Server virtualization, the final solution

Since 2000, another force has come to solve the consolidation quandry – server virtualization.  With server virtualization such as from VMware, Citrix and others, IT has once again driven massive consolidation outlying departmental computing services to bring them all, once again, under one roof, centralizing IT control.  Virtualization provided an optimum answer to the one issue that decentralization could never seem to address – utilization efficiency.  With most departmental servers being used at 5-10% utilization, virtualzation offered demonstrable cost savings when consolidated onto data center hardware.

Cloud computing/storage mutiny

But with the insurrection that is cloud computing and cloud storage once again, departments can easily acquire storage and computing resources on demand and utilization is no longer an issue because it’s a “pay only for what you use” solution. And they don’t even need to develop their own applications because SaaS providers can supply most of their application needs using cloud computing and cloud storage resources alone.

Virtualization was a great solution to the poor utilization of systems and storage resources. But with the pooling available with cloud computing and storage, utilization effectiveness occurs outside the bounds of the todays data center.  As such, with cloud services utilization effectiveness in $/MIP or $/GB can be approximately equivalent to any highly virtualized data center infrastructure (perhaps even better).  Thus, cloud services can provide these very same utilization enhancements at reduced costs out to any departmental user without the need for centralized data center services.

Other decentralization issues that cloud solves

Traditionally, the other problems with departmental computing services were lack of security and the unmanageability distributed service both of which held back some decentralization efforts but these are partially being addressed with cloud infrastructure today.  Insecurity continues to plague cloud computing but some cloud storage gateways (see Cirtas Surfaces and other cloud storage gateway posts) are beginning to use encryption and other cryptographic techniques to address these issues.  How this is solved for cloud computing is another question (see Securing the cloud – Part B).

Cloud computing and storage can be just as diffuse and difficult to manage as a proliferation of PCs or small departmental linux servers.  However, such unmanage-ability is a very different issue, one intrinsic to decentralization and much harder to address.  Although it’s fairly easy to get a bill for any cloud services, it’s unclear whether IT will be able to see all of them to manage it.  Also, nothing seems able to stop some department from signing up for SalesForce.com or even to use Amazon EC2 to support an application they need.  The only remedy, as far as I can see to this problem, is adherence to strict corporate policy and practice.  So unmanageability remains an ongoing issue for decentralized computing for some time to come.

—-

Nonetheless, it seems as if decentralization via the cloud is back, at least until the next wave of consolidation hits.  My guess for the next driver of consolidation is to make application development much easier and quicker to accomplish for centralized data center infrastructure – application frameworks anyone?

Comments?

Whatever happened to holographic storage?

InPhase Technologies Drive & Media (c) 2010 InPhase Technologies, All Rights Reserved (From their website)
InPhase Technologies Drive & Media (c) 2010 InPhase Technologies, All Rights Reserved (From their website)

Although InPhase Technologies and a few other startups had taken a shot at holographic storage over time, there has not been any recent innovation here that I can see.

Ecosystems matter

The real problem (which InPhase was trying to address) is to build up an ecosystem around their technology.  In magnetic disk storage, you have media companies, head companies, and interface companies; in optical disk (Blu-Ray, DVDs, CDs) you have drive vendors, media vendors, and laser electronic providers; in magnetic tape, you have drive vendors, tape head vendors, and tape media vendors, etc.  All of these corporate ecosystems are driving their respective technologies with joint and separate R&D funding, as fast as they can and gaining economies of scale from specialization.

Any holographic storage or any new storage technology for that matter would have to enter into the data storage market with a competitive product but the real trick is maintaining that competitiveness over time. That’s where an ecosystem and all their specialized R&D funding can help.

Market equavalence is fine, but technology trend parity is key

So let’s say holographic storage enters the market with a 260GB disk platter to compete against something like Blu-ray. Well today Blu-ray technology supports 26GB of data storage in single layer media, costing about $5 each and a drive costs about ~$60-$190.   So to match todays Blu-ray capabilities holographic media would need to cost ~$50 and the holographic drive about ~$600-$1900.  But that’s just today, dual layer Blu-Ray is available coming on line soon and in the labs, a 16-layer Blu-ray recording was demonstrated in 2008.  To keep up with Blu-ray, holographic storage would need to demonstrate in their lab more than 4TB of data on a platter and be able to maintain similar cost multipliers for their media and drives.  Hard to do with limited R&D funding.

As such, I believe it’s not enough to achieve parity to other technologies currently available, any new storage technology really has to be at least (in my estimation) 10x better in costs and performance right at the start in order to gain some sort of foothold that can be sustained.  To do this against Blu-ray, optical holographic would need to start at 260GB platter for $5 with a drive at $60-$190 – just not there yet.

But NAND Flash/SSDs did it!

Yes, but the secret with NAND/SSDs was that they emerged from e-prom’s a small but lucrative market and later their technology was used in consumer products as a lower cost alternative/lower power/more rugged solution to extremely small form factor disk devices that were just starting to come online.  We don’t hear about extremely small factor disk drives anymore because NAND flash won out.  Once NAND flash held the market there, consumer product volumes were able to drive costs down and entice the creation of a valuable multi-company/multi-continent ecosystem.  From there, it was only a matter of time before NAND technologies became dense and cheap enough to be used in SSDs addressing the more interesting and potential more lucrative enterprise data storage domain.

So how can optical holographic storage do it?

Maybe the real problem for holographic storage was its aim at the enterprise data storage market, perhaps if they could have gone after some specialized or consumer market and carved out a niche, they could have created an ecosystem.  Media and Entertainment has some pretty serious data storage requirements which might be a good match.  InPhase was making some inroads there but couldn’t seem to put it altogether.

So what’s left for holographic technology to go after – perhaps medical imaging.  It would play to holographic’s storage strengths (ability to densely record multiple photographs). It’s very niche-like with a few medical instrument players developing MRI, cat scans and other imaging technology that all require lot’s of data storage and long-term retention is a definite plus.  Perhaps, if holographic technology could collaborate with a medical instrument consortium to establish a beachhead and develop some sort of multi-company ecosystem, it could move out from there.  Of course, magnetic disk and tape are also going after this market,  so this isn’t a certainty but there may be others markets like this out there, e.g., check imaging, satellite imagery, etc.  Something specialized like this could be just the place to hunker down, build an ecosystem and in 5-7 years, emerge to attack general data storage again.

Comments?

SOHO backup options

© 2010 RDX Storage Alliance. All Rights Reserved. (From their website)
© 2010 RDX Storage Alliance. All Rights Reserved. (From their website)

I must admit, even though I have disparaged DVD archive life (see CDs and DVDs longevity questioned) I still backup my work desktops/family computers to DVD and DVDdl disks.  It’s cheap (on sale 100 DVDs cost about $30 and DVDdl ~2.5 that much) and it’s convenient (no need for additional software, outside storage fees, or additional drives).  For offsite backups I take the monthly backups and store them in a safety deposit box.

But my partner (and wife) said “Your time is worth something, every time you have to swap DVDs you could be doing something else.” (… like helping around the house.)

She followed up by saying “Couldn’t you use something that was start it and forget it til it was done.”

Well this got me to thinking (as well as having multiple media errors in my latest DVDdl full backup), there’s got to be a better way.

The options for SOHO (small office/home office) Offsite backups look to be as follows: (from sexiest to least sexy)

  • Cloud storage for backup – Mozy, Norton BackupGladinetNasuni, and no doubt many others can provide secure, cloud based backup of desktop, laptop data for Macs and Window systems.  Some of these would require a separate VM or server to connect to the cloud while others would not.  Using the cloud might require the office systems to be left on at nite but that would be a small price to pay to backup your data offsite.   Benefits to cloud storage approaches are that it would get the backups offsite, could be automatically scheduled/scripted to take place off-hours and would require no (or minimal) user intervention to perform.  Disadvantages to this approach is that the office systems would need to be left powered on, backup data is out of your control and bandwidth and storage fees would need to be paid.
  • RDX devices – these are removable NFS accessed disk storage which can support from 40GB to 640GB per cartridge. The devices claim 30yr archive life, which should be fine for SOHO purposes.  Cost of cartridges is probably RDX greatest issue BUT, unlike DVDs you can reuse RDX media if you want to.   Benefits are that RDX would require minimal operator intervention for anything less than 640GB of backup data, backups would be faster (45MB/s), and the data would be under your control.  Disadvantages are the cost of the media (640GB Imation RDX cartridge ~$310) and drives (?), data would not be encrypted unless encrypted at the host, and you would need to move the cartridge data offsite.
  • LTO tape – To my knowledge there is only one vendor out there that makes an iSCSI LTO tape and that is my friends at Spectra Logic but they also make a SAS (6Gb/s) attached LTO-5 tape drive.  It’s unclear which level of LTO technology is supported with the iSCSI drive but even one or two generations down would work for many SOHO shops.  Benefits of LTO tape are minimal operator intervention, long archive life, enterprise class backup technology, faster backups and drive data encryption.  Disadvantages are the cost of the media ($27-$30 for LTO-4 cartridges), drive costs(?), interface costs (if any) and the need to move the cartridges offsite.  I like the iSCSI drive because all one would need is a iSCSI initiator software which can be had easily enough for most desktop systems.
  • DAT tape – I thought these were dead but my good friend John Obeto informed me they are alive and well.  DAT drives support USB 2.0, SAS or parallel SCSI interfaces. Although it’s unclear whether they have drivers for Mac OS/X, Windows shops could probably use them without problem. Benefits are similar to LTO tape above but not as fast and not as long a archive life.  Disadvantages are cartridge cost (320GB DAT cartridge ~$37), drive costs (?) and one would have to move the media offsite.
  • (Blu-ray, Blu-ray dl), DVD, or DVDdl – These are ok but their archive life is miserable (under 2yrs for DVDs at best, see post link above). Benefits are they’res very cheap to use, lowest cost removable media (100GB of data would take ~22 DVDs or 12 DVDdls which at $0.30/ DVD or $0.75 for DVDdl thats  ~$6.60 to $9 per backup), and lowest cost drive (comes optional on most desktops today). Disadvantages are high operator intervention (to swap out disks), more complexity to keep track of each DVDs portion of the backup, more complex media storage (you have a lot more of it), it takes forever (burning 7.7GB to a DVDdl takes around an hour or ~2.1MB/sec.), data encryption would need to be done at the host, and one has to take the media offsite.  I don’t have similar performance data for using Blu-ray  for backups other than Blu-ray dl media costs about $11.50 each (50GB).

Please note this post only discusses Offsite backups. Many SOHOs do not provide offsite backup (risky??) and for online backups I use a spare disk drive attached to every office and family desktop.

Probably other alternatives exist for offsite backups, not the least of which is NAS data replication.  I didn’t list this as most SOHO customers are unlikely to have a secondary location where they could host the replicated data copy and the cost of a 2nd NAS box would need to be added along with the bandwidth between the primary and secondary site.  BUT for those sophisticated SOHO customers out there already using a NAS box for onsite shared storage maybe data replication might make sense. Deduplication backup appliances are another possibility but suffer similar disadvantages to NAS box replication and are even less likely to be already used by SOHO customers.

—-

Ok where to now.  Given all this I M hoping to get a Blu-ray dl writer in my next iMac.  Let’s see that would cut my DVDdl swaps down by ~3.2X for single layer and ~6.5X for dl Blu-ray.  I could easily live with that until I quadrupled my data storage, again.

Although an iSCSI LTO-5 tape transport would make a real nice addition to the office…

Comments?

Top 10 storage technologies over the last decade

Aurora's Perception or I Schrive When I See Technology by Wonderlane (cc) (from Flickr)
Aurora's Perception or I Schrive When I See Technology by Wonderlane (cc) (from Flickr)

Some of these technologies were in development prior to 2000, some were available in other domains but not in storage, and some were in a few subsystems but had yet to become popular as they are today.  In no particular order here are my top 10 storage technologies for the decade:

  1. NAND based SSDs – DRAM and other technology solid state drives (SSDs) were available last century but over the last decade NAND Flash based devices have dominated SSD technology and have altered the storage industry forever more.  Today, it’s nigh impossible to find enterprise class storage that doesn’t support NAND SSDs.
  2. GMR head– Giant Magneto Resistance disk heads have become common place over the last decade and have allowed disk drive manufacturers to double data density every 18-24 months.  Now GMR heads are starting to transition over to tape storage and will enable that technology to increase data density dramatically
  3. Data DeduplicationDeduplication technologies emerged over the last decade as a complement to higher density disk drives as a means to more efficiently backup data.  Deduplication technology can be found in many different forms today, ranging from file and block storage systems, backup storage systems, to backup software only solutions.
  4. Thin provisioning – No one would argue that thin provisioning emerged last century but it took the last decade to really find its place in the storage pantheon.  One almost cannot find a data center class storage device that does not support thin provisioning today.
  5. Scale-out storage – Last century if you wanted to get higher IOPS from a storage subsystem you could add cache or disk drives but at some point you hit a subsystem performance wall.  With scale-out storage, one can now add more processing elements to a storage system cluster without having to replace the controller to obtain more IO processing power.  The link reference talks about the use of commodity hardware to provide added performance but scale-out storage can also be done with non-commodity hardware (see Hitachi’s VSP vs. VMAX).
  6. Storage virtualizationserver virtualization has taken off as the dominant data center paradigm over the last decade but a counterpart to this in storage has also become more viable as well.  Storage virtualization was originally used to migrate data from old subsystems to new storage but today can be used to manage and migrate data over PBs of physical storage dynamically optimizing data placement for cost and/or performance.
  7. LTO tape When IBM dominated IT in the mid to late last century, the tape format dejour always matched IBM’s tape technology.  As the decade dawned, IBM was no longer the dominant player and tape technology was starting to diverge into a babble of differing formats.  As a result, IBM, Quantum, and HP put their technology together and created a standard tape format, called LTO, which has become the new dominant tape format for the data center.
  8. Cloud storage Unclear just when over the last decade cloud storage emerged but it seemed to be a supplement to cloud computing that also appeared this past decade.  Storage service providers had existed earlier but due to bandwidth limitations and storage costs didn’t survive the dotcom bubble. But over this past decade both bandwidth and storage costs have come down considerably and cloud storage has now become a viable technological solution to many data center issues.
  9. iSCSI SCSI has taken on many forms over the last couple of decades but iSCSI has the altered the dominant block storage paradigm from a single, pure FC based SAN to a plurality of technologies.  Nowadays, SMB shops can have block storage without the cost and complexity of FC SANs over the LAN networking technology they already use.
  10. FCoEOne could argue that this technology is still maturing today but once again SCSI has taken opened up another way to access storage. FCoE has the potential to offer all the robustness and performance of FC SANs over data center Ethernet hardware simplifying and unifying data center networking onto one technology.

No doubt others would differ on their top 10 storage technologies over the last decade but I strived to find technologies that significantly changed data storage that existed in 2000 vs. today.  These 10 seemed to me to fit the bill better than most.

Comments?

SCI’s latest SPC-1&-1/E LRT results – chart of the month

(c) 2010 Silverton Consulting, Inc., All Rights Reserved
(c) 2010 Silverton Consulting, Inc., All Rights Reserved

It’s been a while since we reported on Storage Performance Council (SPC) Least Response Time (LRT) results (see Chart of the month: SPC LRT[TM]).  This is one of the charts we produce for our monthly dispatch on storage performance (quarterly report on SPC results).

Since our last blog post on this subject there have been 6 new entries in LRT Top 10 (#3-6 &, 9-10).  As can be seen here which combines SPC-1 and 1/E results, response times vary considerably.  7 of these top 10 LRT results come from subsystems which either have all SSDs (#1-4, 7 & 9) or have a large NAND cache (#5).    The newest members on this chart were the NetApp 3270A and the Xiotech Emprise 5000-300GB disk drives which were published recently.

The NetApp FAS3270A, a mid-range subsystem with 1TB of NAND cache (512MB in each controller) seemed to do pretty well here with all SSD systems doing better than it and a pair of all SSD systems doing worse than it.  Coming in under 1msec LRT is no small feat.  We are certain the NAND cache helped NetApp achieve their superior responsiveness.

What the Xiotech Emprise 5000-300GB storage subsystem is doing here is another question.  They have always done well on an IOPs/drive basis (see SPC-1&-1/E results IOPs/Drive – chart of the month) but being top ten in LRT had not been their forte, previously.  How one coaxes a 1.47 msec LRT out of a 20 drive system that costs only ~$41K, 12X lower than the median price(~$509K) of the other subsystems here is a mystery.  Of course, they were using RAID 1 but so were half of the subsystems on this chart.

It’s nice that some turnover in this top 10 LRT.  I still contend that response time is an important performance metric for many storage workloads (see my IO throughput vs. response time and why it matters post) and improvement over time validates my thesis.  Also I received many comments discussing the merits of database latencies for ESRP v3 (Exchange 2010) results, (see my Microsoft Exchange Perfomance ESRP v3.0 results – chart of the month post).  You can judge the results of that lengthy discussion for yourselves.

The full performance dispatch will be up on our website in a couple of weeks but if you are interested in seeing it sooner just sign up for our free monthly newsletter (see upper right) or subscribe by email and we will send you the current issue with download instructions for this and other reports.

As always, we welcome any constructive suggestions on how to improve our storage performance analysis.

Comments?

One platform to rule them all – Compellent&EqualLogic&Exanet from Dell

Compellent drive enclosure (c) 2010 Compellent (from Compellent.com)
Compellent drive enclosure (c) 2010 Compellent (from Compellent.com)

Dell and Compellent may be a great match because Compellent uses commodity hardware combined with specialized software to create their storage subsystem. If there’s any company out there that can take advantage of commodity hardware it’s probably Dell. (Of course Commodity hardware always loses in the end, but that’s another story).

Similarly, Dell’s EqualLogic iSCSI storage system uses commodity hardware to provide its iSCSI storage services.  It doesn’t take a big leap of imagination to have one storage system that combines the functionality of EqualLogic’s iSCSI and Compellent’s FC storage capabilities.  Of course there are others already doing this including Compellent themselves which have their own iSCSI support already built into their FC storage system.

Which way to integrate?

Does EqualLogic survive such a merger?  I think so.  It’s easy to imagine that Equal Logic may have the bigger market share today. If that’s so, the right thing might be  to merge Compellent FC functionality into EqualLogic.  If Compellent has the larger market, the correct approach is the opposite. The answer lies probably with a little of both.  It seems easiest to add iSCSI functionality to a FC storage system than the converse but the FC to iSCSI approach may be the optimum path for Dell, because of the popularity of their EqualLogic storage.

What about NAS?

The only thing missing from this storage system is NAS.  Of course the Compellent storage offers a NAS option through the use of a separate Windows Storage Server (WSS) front end.  Dell’s EqualLogic does the much the same to offer NAS protocols for their iSCSI system.  Neither of these are bad solutions but they are not a fully integrated NAS offering such as available from NetApp and others.

However, there is a little discussed part, the Dell-Exanet acquisition which happened earlier this year. Perhaps the right approach is to integrate Exanet with Compellent first and target this at the high end enterprise/HPC market place, keeping Equal Logic at the SMB end of the marketplace.  It’s been a while since I have heard about Exanet, and nothing since the acquisition earlier this year.  Does it make sense to backend a clustered NAS solution with FC storage – probably.

—-

Much of this seems doable to me, but it all depends on taking the right moves once the purchase is closed.   But if I look at where Dell is weakest (baring their OEM agreement with EMC), it’s in the highend storage space.  Compellent probably didn’t have much of a foot print there as possible due to their limited distribution and support channel.  A Dell acquisition could easily eliminate these problems and open up this space without having to do much other than start to marketing, selling and supporting Compellent.

In the end, a storage solution supporting clustered NAS, FC, and iSCSI that combined functionality equivalent to Exanet, Compellent and EqualLogic based on commodity hardware (ouch!) could make a formidable competitor to what’s out there today if done properly. Whether Dell could actually pull this off and in a timely manner even if they purchase Compellent, is another question.

Comments?

Storage throughput vs. IO response time and why it matters

Fighter Jets at CNE by lifecreation (cc) (from Flickr)
Fighter Jets at CNE by lifecreation (cc) (from Flickr)

Lost in much of the discussions on storage system performance is the need for both throughput and response time measurements.

  • By IO throughput I generally mean data transfer speed in megabytes per second (MB/s or MBPS), however another definition of throughput is IO operations per second (IO/s or IOPS).  I prefer the MB/s designation for storage system throughput because it’s very complementary with respect to response time whereas IO/s can often be confounded with response time.  Nevertheless, both metrics qualify as storage system throughput.
  • By IO response time I mean the time it takes a storage system to perform an IO operation from start to finish, usually measured in milleseconds although lately some subsystems have dropped below the 1msec. threshold.  (See my last year’s post on SPC LRT results for information on some top response time results).

Benchmark measurements of response time and throughput

Both Standard Performance Evaluation Corporation’s SPECsfs2008 and Storage Performance Council’s SPC-1 provide response time measurements although they measure substantially different quantities.  The problem with SPECsfs2008’s measurement of ORT (overall response time) is that it’s calculated as a mean across the whole benchmark run rather than a strict measurement of least response time at low file request rates.  I believe any response time metric should measure the minimum response time achievable from a storage system although I can understand SPECsfs2008’s point of view.

On the other hand SPC-1 measurement of LRT (least response time) is just what I would like to see in a response time measurement.  SPC-1 provides the time it takes to complete an IO operation at very low request rates.

In regards to throughput, once again SPECsfs2008’s measurement of throughput leaves something to be desired as it’s strictly a measurement of NFS or CIFS operations per second.  Of course this includes a number (>40%) of non-data transfer requests as well as data transfers, so confounds any measurement of how much data can be transferred per second.  But, from their perspective a file system needs to do more than just read and write data which is why they mix these other requests in with their measurement of NAS throughput.

Storage Performance Council’s SPC-1 reports throughput results as IOPS and provide no direct measure of MB/s unless one looks to their SPC-2 benchmark results.  SPC-2 reports on a direct measure of MBPS which is an average of three different data intensive workloads including large file access, video-on-demand and a large database query workload.

Why response time and throughput matter

Historically, we used to say that OLTP (online transaction processing) activity performance was entirely dependent on response time – the better storage system response time, the better your OLTP systems performed.  Nowadays it’s a bit more complex, as some of todays database queries can depend as much on sequential database transfers (or throughput) as on individual IO response time.  Nonetheless, I feel that there is still a large component of response time critical workloads out there that perform much better with shorter response times.

On the other hand, high throughput has its growing gaggle of adherents as well.  When it comes to high sequential data transfer workloads such as data warehouse queries, video or audio editing/download or large file data transfers, throughput as measured by MB/s reigns supreme – higher MB/s can lead to much faster workloads.

The only question that remains is who needs higher throughput as measured by IO/s rather than MB/s.  I would contend that mixed workloads which contain components of random as well as sequential IOs and typically smaller data transfers can benefit from high IO/s storage systems.  The only confounding matter is that these workloads obviously benefit from better response times as well.   That’s why throughput as measured by IO/s is a much more difficult number to understand than any pure MB/s numbers.

—-

Now there is a contingent of performance gurus today that believe that IO response times no longer matter.  In fact if one looks at SPC-1 results, it takes some effort to find its LRT measurement.  It’s not included in the summary report.

Also, in the post mentioned above there appears to be a definite bifurcation of storage subsystems with respect to response time, i.e., some subsystems are focused on response time while others are not.  I would have liked to see some more of the top enterprise storage subsystems represented in the top LRT subsystems but alas, they are missing.

1954 French Grand Prix - Those Were The Days by Nigel Smuckatelli (cc) (from Flickr)
1954 French Grand Prix - Those Were The Days by Nigel Smuckatelli (cc) (from Flickr)

Call me old fashioned but I feel that response time represents a very important and orthogonal performance measure with respect to throughput of any storage subsystem and as such, should be much more widely disseminated than it is today.

For example, there is a substantive difference a fighter jet’s or race car’s top speed vs. their maneuverability.  I would compare top speed to storage throughput and its maneuverability to IO response time.  Perhaps this doesn’t matter as much for a jet liner or family car but it can matter a lot in the right domain.

Now do you want your storage subsystem to be a jet fighter or a jet liner – you decide.