Latest SPECsfs2008 results – chart of the month

(SCISFS110318-003) (c) 2011 Silverton Consulting, Inc., All Rights Reserved
(SCISFS110318-003) (c) 2011 Silverton Consulting, Inc., All Rights Reserved

The above chart comes from our last month’s newsletter on the lastest SPECsfs2008 file system performance benchmark results and depicts a scatter plot of system NFS throughput operations per second versus the number of disk drives in the system being tested.  We eliminate from this chart any system that makes use of Flash Cache/SSDS or any other performance use of NAND (See below on why SONAS was still included).

One constant complaint of benchmarks is that system vendors can just throw hardware at the problem to attain better results.   The scatter plot above is one attempt to get to the truth in that complaint.

The regression equation shows that NFS throughput operations per second = 193.68*(number of disk drives) + 23834. The regression coefficient (R**2) is 0.87 which is pretty good but not exactly perfect. So given these results, one would have to conclude there is some truth in the complaint but it doesn’t tell the whole story. (Regardless of how much it pains me to admit it).

A couple of other interesting things about the chart:

  • IBM released a new SONAS benchmark with 1975 disks, with 16 interface and 10 storage nodes to attain its 403K NFS ops/second. Now the SONAS had 512GB of NV Flash, which I assume is being used for redundancy purposes on writes and not as a speedup for read activity. Also the SONAS system complex had over 2.4TB of cache (includes the NV Flash).  So there was a lot of cache to throw at the problem.
  • HP BL860c results were from a system with 1480 drives, 4 nodes (blades) and ~800GB of cache to attain its 333KNFS ops/second.

(aside) Probably need to do a chart like this with amount of cache as the x variable (/aside)

In the same report we talked about the new #1 performing EMC VNX Gateway  that used 75TB of SAS-SSDs and 4 VNX5700’s as its backend. It was able to reach 497K NFS ops/sec.   It doesn’t show up on this chart because of its extensive use of SSDs.  But according to the equation above one would need to use ~2500 disk drives to attain similar performance without SSDS and I believe, a whole lot of cache.

—-

The full performance dispatch will be up on our website after the middle of next month (I promise) but if one is interested in seeing it sooner sign up for our free monthly newsletter (see subscription request, above right) or subscribe by email and we will send the current issue along with download instructions for this and other reports.  If you need an even more in-depth analysis of NAS system performance please consider purchasing SCI’s NAS Buying Guide also available from our website.

As always, we welcome any constructive suggestions on how to improve any of our storage performance analysis.

Comments?

 

Technology innovation

Newton & iPad by mac_ivan (cc) (from Flickr)
Newton & iPad by mac_ivan (cc) (from Flickr)

A recent post by Mark Lewis on innovation in large companies (see Episode 105: Innovation – a process problem?) brought to mind some ideas that have been intriguing me for quite awhile now.  While Mark’s post is only the start of his discussion on the management of innovation, I think the problem goes far beyond what he has outlined there.

Outside of Apple and a few select others, there doesn’t appear to be many large corporate organization that continually succeed at technology innovation.  On the other hand there are a number of large organizations which spend $Millions, if not $Billions on R&D with at best, mediocre return on such investments.

Why do startups innovate so well and corporations do so poorly.

  • Most startup cost is sweat equity and not money, at least until business success is more assured.  Well run companies have a gate review process which provide more resources as new ideas mature over time, but the cost of “fully burdened” resources applied to any project is much higher and more monetary right from the start.  As such, corporate innovation costs, for the exact same product/project, are higher at every stage in the process, hurting ROI.
  • Most successful startups engage with customers very early in the development of a product. Alpha testing is the life blood of technical startups. Find a customer that has (hopefully, a hard) problem you want to solve and take small, incremental steps to solve it, giving the customer everything you have, the moment you have it, so they can determine if it helped and where to go next.  If their problem is shared by enough other customers you have a business.  Large companies cannot readily perform alpha tests or in some cases even beta tests in real customer environments.  Falling down and taking the many missteps that alpha testing would require might have significant brand repercussions.  So large companies end up funding test labs to do this activity.  Naturally, such testing increases the real and virtual costs of corporate innovation projects versus a startup with alpha testing.  Also, any “simulated testing” may be far removed from real customer experience, often leading corporate projects down unproductive development paths, increasing development time and costs.
  • Many startups fail, hopefully before monetary investment has been significant. Large corporate innovation activities also fail often but typically much later in the development process and only after encountering higher real and virtual monetary costs.  Thus, the motivation for continuing innovation in major corporations typically diminishes after every failure, as does the ROI on R&D in general.  On the other hand, startup failures, as they generally cost little actual money, typically induce participants to re-examine customer concerns to better target future innovations.  Such failures often lead to an even higher motivation in startup personnel to successfully innovate.

There are probably many other problems with innovation in large corporate organizations but these seem most significant to me.  Solutions to such issues within large corporations are not difficult to imagine, but the cultural changes that may be needed to go along with such solutions may represent the truly harder problem to solve.

Comments?

 

SNIA CDMI plugfest for cloud storage and cloud data services

Plug by Samuel M. Livingston (cc) (from Flickr)
Plug by Samuel M. Livingston (cc) (from Flickr)

Was invited to the SNIA tech center to witness the CDMI (Cloud Data Managament Initiative) plugfest that was going on down in Colorado Springs.

It was somewhat subdued. I always imagine racks of servers, with people crawling all over them with logic analyzers, laptops and other electronic probing equipment.  But alas, software plugfests are generally just a bunch of people with laptops, ethernet/wifi connections all sitting around a big conference table.

The team was working to define an errata sheet for CDMI v1.0 to be completed prior to ISO submission for official standardization.

What’s CDMI?

CDMI is an interface standard for clients talking to cloud storage servers and provides a standardized way to access all such services.  With CDMI you can create a cloud storage container, define it’s attributes, and deposit and retrieve data objects within that container.  Mezeo had announced support for CDMI v1.0 a couple of weeks ago at SNW in Santa Clara.

CDMI provides for attributes to be defined at the cloud storage server, container or data object level such as: standard redundancy degree (number of mirrors, RAID protection), immediate redundancy (synchronous), infrastructure redundancy (across same storage or different storage), data dispersion (physical distance between replicas), geographical constraints (where it can be stored), retention hold (how soon it can be deleted/modified), encryption, data hashing (having the server provide a hash used to validate end-to-end data integrity), latency and throughput characteristics, sanitization level (secure erasure), RPO, and RTO.

A CDMI client is free to implement compression and/or deduplication as well as other storage efficiency characteristics on top of CDMI server characteristics.  Probably something I am missing here but seems pretty complete at first glance.

SNIA has defined a reference implementations of a CDMI v1.0 server [and I think client] which can be downloaded from their CDMI website.  [After filling out the “information on me” page, SNIA sent me an email with the download information but I could only recognize the CDMI server in the download information not the client (although it could have been there). The CDMI v1.0 specification is freely available as well.] The reference implementation can be used to test your own CDMI clients if you wish. They are JAVA based and apparently run on Linux systems but shouldn’t be too hard to run elsewhere. (one CDMI server at the plugfest was running on a Mac laptop).

Plugfest participants

There were a number people from both big and small organizations at SNIA’s plugfest.

Mark Carlson from Oracle was there and seemed to be leading the activity. He said I was free to attend but couldn’t say anything about what was and wasn’t working.  Didn’t have the heart to tell him, I couldn’t tell what was working or not from my limited time there. But everything seemed to be working just fine.

Carlson said that SNIA’s CDMI reference implementations had been downloaded 164 times with the majority of the downloads coming from China, USA, and India in that order. But he said there were people in just about every geo looking at it.  He also said this was the first annual CDMI plugfest although they had CDMI v0.8 running at other shows (i.e, SNIA SDC) before.

David Slik, from NetApp’s Vancouver Technology Center was there showing off his demo CDMI Ajax client and laptop CDMI server.  He was able to use the Ajax client to access all the CDMI capabilities of the cloud data object he was presenting and displayed the binary contents of an object.  Then he showed me the exact same data object (file) could be easily accessed by just typing in the proper URL into any browser, it turned out the binary was a GIF file.

The other thing that Slik showed me was a display of a cloud data object which was created via a “Cron job” referencing to a satellite image website and depositing the data directly into cloud storage, entirely at the server level.  Slik said that CDMI also specifies a cloud storage to cloud storage protocol which could be used to move cloud data from one cloud storage provider to another without having to retrieve the data back to the user.  Such a capability would be ideal to export user data from one cloud provider and import the data to another cloud storage provider using their high speed backbone rather than having to transmit the data to and from the user’s client.

Slik was also instrumental in the SNIA XAM interface standards for archive storage.  He said that CDMI is much more light weight than XAM, as there is no requirement for a runtime library whatsoever and only depends on HTTP standards as the underlying protocol.  From his viewpoint CDMI is almost XAM 2.0.

Gary Mazzaferro from AlloyCloud was talking like CDMI would eventually take over not just cloud storage management but also local data management as well.  He called the CDMI as a strategic standard that could potentially be implemented in OSs, hypervisors and even embedded systems to provide a standardized interface for all data management – cloud or local storage.  When I asked what happens in this future with SMI-S he said they would co-exist as independent but cooperative management schemes for local storage.

Not sure how far this goes.  I asked if he envisioned a bootable CDMI driver? He said yes, a BIOS CDMI driver is something that will come once CDMI is more widely adopted.

Other people I talked with at the plugfest consider CDMI as the new web file services protocol akin to NFS as the LAN file services protocol.  In comparison, they see Amazon S3 as similar to CIFS (SMB1 & SMB2) in that it’s a proprietary cloud storage protocol but will also be widely adopted and available.

There were a few people from startups at the plugfest, working on various client and server implementations.  Not sure they wanted to be identified nor for me to mention what they were working on. Suffice it to say the potential for CDMI is pretty hot at the moment as is cloud storage in general.

But what about cloud data consistency?

I had to ask about how the CDMI standard deals with eventual consistency – it doesn’t.  The crowd chimed in, relaxed consistency is inherent in any distributed service.  You really have three characteristics Consistency, Availability and Partitionability (CAP) for any distributed service.  You can elect to have any two of these, but must give up the third.  Sort of like the Hiesenberg uncertainty principal applied to data.

They all said that consistency is mainly a CDMI client issue outside the purview of the standard, associated with server SLAs, replication characteristics and other data attributes.   As such, CDMI does not define any specification for eventual consistency.

Although, Slik said that the standard does guarantee if you modify an object and then request a copy of it from the same location during the same internet session, that it be the one you last modified.  Seems like long odds in my experience.   Unclear how CDMI, with relaxed consistency can ever take the place of primary storage in the data center but maybe it’s not intended to.

—–

Nonetheless, what I saw was impressive, cloud storage from multiple vendors all being accessed from the same client, using the same protocols.  And if that wasn’t simple enough for you, just use your browser.

If CDMI can become popular it certainly has the potential to be the new web file system.

Comments?

 

AT&T personal hotspot on iPhone

My iPhone in Settings App
My iPhone in Settings App

I have been tied to local WIFI hot spots at most hotels and other venues for quite awhile but a recent trip to Japan where WiFi was less available, got me thinking about getting a Verizon MiFi or other personal internet device.  Some of my friends swear by the Verizon MIFI and others swear at it.

But I am sick and tired of paying $10/day for WIFI at a hotel or other venue where I happen to be at.  So I decided to go after AT&T’s hotspot for the iPhone.

I have been an iPhone user for a couple of years now and had grandfathered in an “unlimited data plan” which cost $30/month but to get the hotspot option I had to give that up for a 4GB/month which came with the hotspot option (DataPro 4GB for iPhone).

I looked back at some of my recent AT&T bills and I have been using around 250MB/month so thought this wasn’t going to be a problem. But then again, I don’t watch a lot of Netflix or Youtube video on my iPhone (at least not yet).

It turns out the iPhone hotspot has three operating modes:

  • WiFi – which allows up to 5 users to use your password protected WiFi broadcast from the iPhone.  I tried it at home and at a conference center (with lot’s of other networks active and was able to find the network without problem.
  • BlueTooth – I especially like this mode but you have to have bluetooth on for the phone and the computers you want to connect with.  Mac OSX seemed to make the blue tooth connection without problem and it was almost automatic
  • Tethered – this is where you connect your phone to the computer you are supplying internet access.  I found this approach worked great in most situations and as I looked around a recent conference hall there seemed to be a lot of laptops connected to an iPhone probably doing the same thing.

I was a little worried about AT&T’s signal strength. At home it’s not that great but I found most conferences I attend seem to be just fine.  (AT&T is offering me a free microcell for the home all I have to do is supply power and internet…).  I suppose in some major cities this can be a problem but most places I sit down to check email and other stuff on my phone AT&T’s signal strength is ok.

What about usage?

It was so easy to turn on and off (see Settings, 3rd line down from top) that I was using it only when I needed to.  My usage for the last 30days has been ~350MB received and ~60MB (according to the iPhone) sent so that is something I am going to have to watch a little more but with 4GB I seem to have room to grow.  It turns out I was at the conference for 2 nights and 3 days, but WiFi during at the convention center was free so I only used the hotspot at night or when the other WiFi was unavailable (sporadically during the day). So I seem to be using about another5 50MB of bandwidth for each night’s (probably a couple of hours) worth of work.   Which seems to say I could do this for the whole month and still have ~2.8X more bandwidth.

Well for the $15 a month extra, it seems a good deal and the best part about it is, I don’t have to haul yet another electronic device (like the MiFi) with yet another power cord/adapter. Its all tied to my iPhone which I carry around anyways.

—-

All in all, I like the iPhone AT&T personal hotspot option.

Comments?

New file system capacity tool – Microsoft’s FSCT

Filing System by BinaryApe (cc) (from Flickr)
Filing System by BinaryApe (cc) (from Flickr)

Jose Barreto blogged about a recent report Microsoft did on File Server Capacity Tool (FSCT) results (blog here, report here).  As you may know FSCT is a free tool released in September of 2009, available from Microsoft that verifies a SMB (CIFS) and/or SMB2 storage server configuration.

The FSCT can be used by anyone to verify that a SMB/SMB2 file server configuration can adequately support a particular number of users, doing typical Microsoft Office/Window’s Explorer work with home folders.

Jetstress for SMB file systems?

FSCT reminds me a little of Microsoft’s Jetstress tool used in the Exchange Solution Review Program (ESRP) which I have discussed extensively in prior blog posts (search my blog) and other reports (search my website).  Essentially, FSCT has a simulated “home folder” workload which can be dialed up or down by the number of users selected.  As such, it can be used to validate any NAS system which supports SMB/SMB2 or CIFS protocol.

Both Jetstress and FSCT are capacity verification tools.  However, I look at all such tools as a way of measuring system performance for a solution environment and FSCT is no exception.

Microsoft FSCT results

In Jose’s post on the report he discusses performance for five different storage server configurations running anywhere from 4500 to 23,000 active home directory users, employing white box servers running Windows (Storage) Server 2008 and 2008 R2 with various server hardware and SAS disk configurations.

Network throughput ranged from 114 to 650 MB/sec. Certainly respectable numbers and somewhat orthogonal to the NFS and CIFS throughput operations/second reported by SPECsfs2008.  Unclear if FSCT reports activity in an operations/second.

Microsoft ‘s FSCT reports did not specifically state what the throughput was other than at the scenario level.  I assume Network throughput that Jose reported was extracted concurrently with the test run from something akin to Perfmon.  FSCT seems to only report performance or throughput as the number of home folder scenarios sustainable per second and the number of users.  Perhaps there is an easy way to convert user scenarios to network throughput?

While the results for the file server runs looks interesting, I always want more. For whatever reason, I have lately become enamored with ESRPs log playback results (see my latest ESRP blog post) and it’s not clear whether FSCT reports anything similar to this.  Something like file server simulated backup performance would suffice from my perspective.

—-

Despite that, another performance tool is always of interest and I am sure my readers will want to take a look as well.  The current FSCT tester can be downloaded here.

Not sure whether Microsoft will be posting vendor results for FSCT similar to what they do for Jetstress via ESRP but that would be a great next step.  Getting the vendors onboard is another problem entirely.  SPECsfs2008 took almost a year to get the first 12 (NFS) submissions and today, almost 9 months later there are still only ~40 NFS and ~20 CIFS submissions.

Comments?

Initial impressions on Spring SNW/Santa Clara

I heard storage beers last nite was quite the party, sorry I couldn’t make it but I did end up at the HDS customer reception which was standing room only and provided all the food and drink I could consume.

Saw quite a lot of old friends too numerous to mention here but they know who they are.

As for technology on display there was some pretty impressive stuff.

Verident card (c) 2011 Silverton Consulting, Inc.
Verident card (c) 2011 Silverton Consulting, Inc.

Lots of great technology on display there.

Virident tachIOn SSD

One product that caught my eye was from Virident, their tachIOn SSD. I called it a storage subsystem on a board.  I had never talked with them before but they have been around for a while using NOR storage but now are focused on NAND.

Their product is a fully RAIDed storage device using flash aware RAID 5 parity locations, their own wear leveling and other SSD control software and logic with replaceable NAND modules.

Playing with this device I felt like I was swapping drives of the future. Each NAND module stack has a separate controller and supports high parallelism.  Talking with Shridar Subramanian, VP of marketing, he said the product is capable of over 200K IOPS running a fully 70% read:30% write workload at full capacity.

They have a Capacitor backed DRAM buffer which is capable of uploading the memory buffer to NAND after a power failure. It plugs into a PCIe slot and uses less than 25W of power, in capacities of 300-800GB.  It requires a software driver, they currently only support Linux and VMware (a Linux varient) but Windows and other O/Ss are on the way

Other SSDs/NAND storage

Their story was a familair refrain throughout the floor, lots of SSD/NAND technology coming out, in various formfactors.  I saw one system using SSDs from Viking Modular Systems that fit into a DRAM DIMM slot and supported a number of SSDs behind a SAS like controller. Also requiring a SW driver.

(c) 2011 Silverton Consulting, Inc.
(c) 2011 Silverton Consulting, Inc.

Of course TMS, Fusion-IO, Micron, Pliant and others were touting their latest SSD/Nand based technology showing off their latest solutions and technology.   For some reason lots of SSD’s at this show.

Naturally, all the other storage vendors were there Dell, HDS, HP, EMC, NetApp and IBM. IBM was showing off Watson, their new AI engine that won at Jeopardy.

And then there was cloud, …

Cloud was a hot topic as well. Saw one guy in the corner I have talked about before StorSimple which is a cloud gateway provider.  They said they are starting to see some traction in the enterprise. Apparently enterprise are starting to adopt cloud – who knew?

Throw in a few storage caching devices, …

Then of course there was the data caching products which ranged from the relaunched DataRAM XcelASAN to Marvel’s new DragonFLY card.  DragonFLY provides a cache on a PCI-E card which DataRAM is a FC caching appliance, all pretty interesting.

… and what’s organic storage?

And finally, Scality came out of the shadows with what they are calling an organic object storage device.  The product reminded me of Bycast (now with NetApp) and Archivas (now with HDS) in that they had a RAIN architecture, with mirrored data in an object store interface.  I asked them what makes them different and Jerome Lecat, CEO said they are relentlessly focused on performance and claims they can retrieve an object in under 40msec.  My kind of product.  I think they deserve a deeper dive sometime later.

—-

Probably missed a other  vendors but these are my initial impressions.  For some reason I felt right at home swapping NAND drive modules,…

Comments

 

Services and products, a match made in heaven

wrench rust by HVargas (cc) (from Flickr)
wrench rust by HVargas (cc) (from Flickr)

In all the hoopla about company’s increasing services revenues what seems to be missing is that hardware and software sales automatically drive lots of services revenues.

A recent Wikibon post by Doug Chandler (see Can cloud pull services and technology together …) showed a chart of leading IT companies percent of revenue from services.  The percentages ranged from a high of 57% for  IBM to a low of 12% for Dell, with the median being ~26.5%.

In the beginning, …

It seems to me that services started out being an adjunct to hardware and software sales – i.e., maintenance, help to install the product, provide operational support, etc. Over time, companies like IBM and others went after service offerings as a separate distinct business activity, outside of normal HW and SW sales cycles.

This turned out to be a great revenue booster, and practically turned IBM around in the 90s.   However, one problem with hardware and software vendors reporting of service revenue is that they also embed break-fix, maintenance and infrastructure revenue streams in these line items.

The Wikibon blog mentioned StorageTek’s great service revenue business when Sun purchased them.  I recall that at the time, this was primarily driven by break-fix, maintenance and infrastructure revenues and not mainly from other non-product related revenues.

Certainly companies like EDS (now with HP), Perot Systems (now with Dell), and other pure service companies generate all their revenue from services not associated with selling HW or SW.  Which is probably why HP and Dell purchased them.

The challenge for analysts is to try to extract the more ongoing maintenance, break-fix and infrastructure revenues from other service activity in order to understand how to delineate portions of service revenue growth:

  • IBM seems to break out their GBS (consulting and application mgnt) from their GTS (outsourcing, infrastructure, and maint) revenues (see IBM’s 10k).  However extracting break-fix and maintenance revenues from the other GTS revenues is impossible outside IBM.
  • EMC has no breakdown whatsoever in their services revenue line item in their 10K.
  • HP similarly, has no breakdown for their service revenues in their 10K.

Some of this may be discussed in financial analyst calls, but I could locate nothing but the above in their annual reports/10Ks.

IBM and Dell to the rescue

So we are all left to wonder how much of reported services revenue is ongoing maintenance and infrastructure business versus other services business.  Certainly IBM, in reporting both GBS and GTS gives us some inkling of what this might be in their annual report: GBS is $18B and GTS is $38B. So that means maint and break-fix must be some portion of that GTS line item.

Perhaps we could use Dell as a proxy to determine break-fix, maintenance and infrastructure service revenues. Not sure where Wikibon got the reported service revenue % for Dell but their most recent 10K shows services are more like 19% of annual revenues.

Dell had a note in their “Results from operations” section that said Perot systems was 7% of this.  Which means previous services, primarily break-fix, maintenance and other infrastructure support revenues accounted for something like 12% (maybe this is what Wikibon is reporting).

Unclear how well Dell revenue percentages are representative of the rest of the IT industry but if we take their ~12% of revenues off the percentages reported by Wikibon then the new ranges are from 45% for IBM to 7% for Dell with an median around 14.5% for non-break fix, maintenance and infrastructure service revenues.

Why is this important?

Break-fix, maintenance revenues and most infrastructure revenues are entirely associated with product (HW or SW) sales, representing an annuity once original product sales close.  The remaining service revenues are special purpose contracts (which may last years), much of which are sold on a project basis representing non-recurring revenue streams.

—-

So the next time some company tells you their service revenues are up 25% YoY, ask them how much of this is due to break-fix and maintenance.  This may tell you whether their product footprint expansion or their service offerings success is driving service revenue growth.

Comments?