15PB a year created by CERN

The Large Hadron Collider/ATLAS at CERN by Image Editor (cc) (from flickr)
The Large Hadron Collider/ATLAS at CERN by Image Editor (cc) (from flickr)

That’s what CERN produces from their 6 experiments each year.  How this data is subsequently accessed and replicated around the world is an interesting tale in multi-tier data grids.

When an experiment is run at CERN, the data is captured locally in what’s called Tier 0.  After some initial processing, this data is stored on tape at Tier 0 (CERN) and then replicated to over 10 Tier 1 locations around the world which then become a second permanent repository for all CERN experiment data.  Tier 2 centers can request data from Tier 1 locations and process the data and return results to Tier 1 for permanent storage.  Tier 3 data centers can request data and processing time from Tier 2 centers to analyze the CERN data.

Each experiment has it’s own set of Tier 1 data centers that store its results.  According to the latest technical description I could find, the Tier 0 (at CERN) and most Tier 1 data centers provide a tape storage permanent repository for experimental data frontended by a disk cache.  Tier 2 can have similar resources but are not expected to be a permanent repository for data.

Each Tier 1 data center has it’s own hierarchical management system (HMS) or mass storage system (MSS) based on any number of software packages such as HPSS, CASTOR, Enstore, dCache, DPM, etc., most of which are open source products.  But regardless of the HMS/MSS implementation they all a set of generic storage management services based on the Storage Resource Manager (SRM) as defined by a consortium of research centers and provide a set of file transfer protocols defined by yet another set of standards by Globus or gLite.

Each Tier 1 data center manages their own storage element (SE).  Each experiment storage element has disk storage with optionally tape storage (using one or more of the above disk caching packages) and provides authentication/security, provides file transport,  and maintains catalogs and local databases.  These catalogs and local databases index the data sets or files available on the grid for each experiment.

Data stored in the grid are considered read-only and can never be modified.  It is intended for users that need to process this data read it from Tier 1 data centers, process the data and create new data which is then stored in the grid. New data added Data to be placed in the grid must be registered to the LCG file catalogue and transferred to a storage element to be replicated throughout the grid.

CERN data grid file access

“Files in the Grid can be referred to by different names: Grid Unique IDentifier (GUID), Logical File Name (LFN), Storage URL (SURL) and Transport URL (TURL). While the GUIDs and LFNs identify a file irrespective of its location, the SURLs and TURLs contain information about where a physical replica is located, and how it can be accessed.” (taken from the gLite user guide).

  • GUIDs look like guid:<unique_string> files are given an unique GUID when created and can never be changed.  The unique string portion of the GUID is typically a combination of MAC address and time-stamp and is unique across the grid.
  • LFNs look like lfn:<unique_string> files can have many different LFNs all pointing or linking to the same data.  LFN unique strings typically follow unix-like conventions for file links.
  • SURLs look like srm:<se_hostname>/path and provide a way to access data located at a storage element.  SURLs are transformed to TURLs.  SURLs are immutable and are unique to a storage element.
  • TURLs look like <protocol>://<se_hostname>:<port>/path and are obtained dynamically from a storage element.  TURLs can have any format after the // that uniquely identifies the file to the storage element but typically they have a se_hostname, port and file path.

GUIDs and LFNs are used to lookup a data set in the global LCG file catalogue .  After file lookup a set of site specific replicas are returned (via SURLs) which are used to request file transfer/access from a nearby storage element.  The storage element accepts the file’s SURL and assigns a TURL which can then be used to transfer the data to wherever it’s needed.  TURLs can specify any file transfer protocol supported across the grid

CERN data grid file transfer protocols supported for transfer and access currently include:

  • GSIFTP – a grid security interface enabled subset of the GRIDFTP interface as defined by Globus
  • gsidcap – a grid security interface enabled feature of dCache for ftp access.
  • rfio – remote file I/O supported by DPM. There is both a secure and an insecure version of rfio.
  • file access – local file access protocols used to access the file data locally at the storage element.

While all storage elements provide the GSIFTP protocol, the other protocols supported depend on the underlying HMS/MSS system implemented by the storage element for each experiment.  Most experiments use one type of MSS throughout their  world wide storage elements and as such, offer the same file transfer protocols throughout the world.

If all this sounds confusing, it is.  Imagine 15PB a year of data replicated to over 10 Tier 1 data centers which can then be securely processed by over 160 Tier 2 data centers around the world.  All this supports literally thousands of scientist who have access to every byte of data created by CERN experiments and scientists that post process this data.

Just exactly how this data is replicated to the Tier 1 data centers and how a scientists processes such data must be the subject for other posts.

Future iBook pricing

Apple's iPad iBook app (from Apple.com)
Apple's iPad iBook app (from Apple.com)

All the news about iPad and iBooks app got me thinking. There’s been much discussion on e-book pricing but no one is looking at what to charge for items other than books.  I look at this as something like what happened to albums when iTunes came out.  Individual songs were now available without having to buy the whole album.

As such, I started to consider what iBooks should charge for items outside of books.  Specifically,

  • Poems – no reason the iBooks app should not offer poems as well as books but what’s a reasonable price for a poem.  I believe Natalie Goldberg in Writing Down the Bones: Freeing the Writer Within used to charge $0.25 per poem.  So this is a useful lower bound, however considering inflation (and assuming $0.25 was 1976 pricing), in today’s prices this would be closer to $1.66.  With iBooks app’s published commission rate (33% for Apple) future poets would walk away with $1.11 per poem.
  • Haiku – As a short form poem I would argue that a Haiku should cost less than a poem.  So, maybe $0.99 per haiku,would be a reasonable price.
  • Short stories – As a short form book pricing for short stories needs to be somehow proportional to normal e-book pricing.  A typical book has about 10 chapters and as such, it might be reasonable to consider a short story as equal to a chapter.  So maybe 1/10th the price of an e-book is reasonable.  With the prices being discussed for books this would be roughly the price we set for poems.  No doubt incurring the wrath of poets forevermore, I  am willing to say this undercuts the worth of short stories and would suggest something more on the order of $2.49 for a short story.  (Poets please forgive my transgression.)
  • Comic books – Comic books seem close to short stories and with their color graphics would do well on the iPad.  It seems to me that these might be priced somewhere in between short stories and poems,  perhaps at $1.99 each.
  • Magazine articles – I see no reason that magazine articles shouldn’t be offered as well as short stories outside the magazine itself. Once again, color graphics found in most high end magazines should do well on the iPad.  I would assume pricing similar to short stories would make sense here.

University presses, the prime outlet for short stories today, seem similar to small record labels.  Of course,  the iBooks app could easily offer to sell their production as e-books in addition to selling their stories separately. Similar considerations apply to poetry publishers. Selling poems and short stories outside of book form might provide more exposure for the authors/poets and in the long run, more revenue for them and their publishers.  But record companies will attest that your results may vary.

Regarding magazine articles and comic books there seems to be a dependance on advertising revenue that may suffer from iBook publishing.  This could be dealt with by incorporating publisher advertisements in iBook displays of an article or comic book.   However, significant advertisement revenue comes from ads placed outside of articles, such as in back matter, around the table of contents, in-between articles, etc.  This will need to change with the transition to e-articles – revenues may suffer.

Nonetheless, all these industries can continue to do what they do today.  Record companies still exist, perhaps not doing as well as before iTunes, but they still sell CDs.  So there is life after iTunes/iBooks, but one things for certain – it’s different.

Probably missing whole categories of items that could be separated from book form as sold today,  But in my view, anything that could be offered separately probably will be.  Comments?

Intel-Micron new 25nm/8GB MLC NAND chip

intel_and_micron_in_25nm_nand_technology
intel_and_micron_in_25nm_nand_technology

Intel-Micron Flash Technologies just issued another increase in NAND density. This one’s manages to put 8GB on a single chip with MLC(2) technology in a 167mm square package or roughly a half inch per side.

You may recall that Intel-Micron Flash Technologies (IMFT) is a joint venture between Intel and Micron to develop NAND technology chips. IMFT chips can be used by any vendor and typically show up in Intel SSDs as well as other vendor systems. MLC technology is more suitable for use in consumer applications but at these densities it’s starting to make sense for use by data centers as well. We have written before about MLC NAND used in the enterprise disk by STEC and Toshiba’s MLC SSDs. But in essence MLC NAND reliability and endurability will ultimately determine its place in the enterprise.

But at these densities, you can just throw more capacity at the problem to mask MLC endurance concerns. For example, with this latest chip, one could conceivably have a single layer 2.5″ configuration with almost 200GBs of MLC NAND. If you wanted to configure this as 128GB SSD you could use the additional 72GB of NAND for failing pages. Doing this could conceivably add more than 50% to the life of an SSD.

SLC still has better (~10X) endurance but being able to ship 2X the capacity in the same footprint can help.  Of course, MLC and SLC NAND can be combined in a hybrid device to give some approximation of SLC reliability at MLC costs.

IMFT made no mention of SLC NAND chips at the 25nm technology node but presumably this will be forthcoming shortly.  As such, if we assume the technology can support a 4GB SLC NAND in a 167mm**2 chip it should be of significant interest to most enterprise SSD vendors.

A couple of things missing from yesterday’s IMFT press release, namely

  • read/write performance specifications for the NAND chip
  • write endurance specifications for the NAND chip

SSD performance is normally a function of all the technology that surrounds the NAND chip but it all starts with the chip.  Also, MLC used to be capable of 10,000 write/erase cycles and SLC was capable of 100,000 w/e cycles but most recent technology from Toshiba (presumably 34nm technology) shows a MLC NAND write/erase endurance of only 1400 cycles.  Which seems to imply that as the NAND technology increases density write endurance rates degrade. How much is subject to much debate and with the lack of any standardized w/e endurance specifications and reporting, it’s hard to see how bad it gets.

The bottom line, capacity is great but we need to know w/e endurance to really see where this new technology fits.  Ultimately, if endurance degrades significantly such NAND technology will only be suitable for consumer products.  Of course at ~10X (just guessing) the size of the enterprise market maybe that’s ok.

Free P2P-Cloud Storage and Computing Services?

FFT_graph from Seti@home
FFT_graph from Seti@home

What would happen if somebody came up with a peer-to-peer cloud (P2P-Cloud) storage or computing service.  I see this as

  • Operating a little like Napster/Gnutella where many people come together and share out their storage/computing resources.
  • It could operate in a centralized or decentralized fashion
  • It  would allow access to data/computing resources anywhere from the internet

Everyone joining the P2P-cloud would need to set aside computing and/or storage resources they were willing to devote to the cloud.  By doing so, they would gain access to an equivalent amount (minus overhead) of other nodes computing and storage resources to use as they see fit.

P2P-Cloud Storage

For cloud storage the P2P-Cloud would create a common cloud data repository spread across all nodes in the network:

  • Data would be distributed across the network in such a way that would allow reconstruction within any reasonable time frame and would handle any reasonable amount of node outages without loss of data.
  • Data would be encrypted before being sent to the cloud rendering the data unreadable without the key.
  • Data would NOT necessarily be shared, but would be hosted on other users systems.

As such, if I were to offer up 100GB of storage to the P2P-Cloud, I would get at least a 100GB (less overhead) of protected storage elsewhere on the cloud to use as I see fit.  Some % of this would be lost to administration say 1-3% and redundancy protection say ~25% but the remaining 72GB of off-site storage could be very useful for DR purposes.

P2P-Cloud storage would provide a reliable, secure, distributed file repository that could be easily accessible from any internet location.  At a minimum, the service would be free and equivalent to what someone supplies (less overhead) to the P2P-Cloud Storage service.  If storage needs exceeded your commitment, more cloud storage could be provided at a modest cost to the consumer.  Such fees would be shared by all the participants offering excess [=offered – (consumed + overhead)] storage to the cloud .

P2P-Cloud Computing

Cloud computing is definitely more complex, but generally follows the Seti@HOME/BOINC model:

  • P2P-Cloud computing suppliers would agree to use something like a “new screensaver” which would perform computation while generating a viable screensaver.
  • Whenever the screensaver was invoked, it would start execution on the last assigned processing unit.  Intermediate work results would need to be saved and when completed, the answer could be sent to the requester and a new processing unit assigned.
  • Processing units would be assigned by the P2P-Cloud computing consumer, would be timeout-able and re-assignable at will.

Computing users won’t gain much if the computing time they consume is <= the computing time they offer (less overhead).  However, computing time offset may be worth something, i.e., computing time now might be more valuable than computing time tonite.  Which may offer a slight margin of value to help get this off the ground.  As such, P2P-Cloud computing suppliers would need to be able to specify when computing resources might be mostly available along with the type, quality and quantity.

Unclear how to secure the processing unit and this makes legal issues more prevalent.  That may not be much of a problem, as a complex distributed computing task makes little sense in isolation. But the (il-)legality of some data processing activities could conceivably put the provider in a precarious position. (Somebody from the legal profession would need clarify all this, but I would think that some “Amazon C2” like licensing might offer safe harbor here).

P2P-Cloud computing services wouldn’t necessarily be amenable to the more normal, non-distributed or linear computing tasks but one could view these as just a primitive version of distributed computing tasks.  In either case, any data needed for computation would need to be sent along with the computing software to be run on a distributed node.  Whether it’s worth the effort is something for the users to debate.

BOINC can provide a useful model here.  Also, the Condor(R) project at U. of Wisconsin/Madison can provide a similar framework for scheduling the work of a “less distributed” computing task model.  In my mind, both types of services ultimately need to be provided.

To generate more compute servers, the SETI@Home and similar BOINC projects rely on doing good deeds.  As such, if you can make your computing task  do something of value to most users then maybe that’s enough. In that case, I would suggest joining up as a BOINC project. For the rest of us, doing more mundane data processing, just offering our compute services to the P2P-Cloud will need to suffice.

Starting up the P2P-Cloud

Bootstrapping the P2P-Cloud might take some effort but once going it should be self sustaining (assuming no centralized infrastructure).  I envision an open source solution, taking off from the work done on Napster&Gnutella and/or Boinc&Condor.

I believe the P2P-Cloud Storage service would be the easiest to get started.  BOINC and SETI@home (list of active Boinc projects) have been around a lot longer than cloud storage but their existence suggests that with the right incentives, even the P2P-Cloud Computing service can make sense.

Strategy, as we know it, is dead

Or at least that’s how the WSJ reported it yesterday.

Years back when I was working in corporate strategy we used to have this yearly dance called strategic planning.  Every year we would fan out to all the business units, look at what they were doing and try to figure out what they needed to be doing three to five years down the road.

This process typically lasted the better part of a quarter or so and culminated in a presentation to upper management on a direction to pursue for the business unit.  What would happen next was often the best part.  Some business groups would shelve the work and not look at it again.  Other business units would invest time and effort to incorporate the strategic plan recommendations into what they were doing that year to try to make it happen in 3 to 5 years time.  At the end of this process, annual budgets would be declared “done” and the world would go back to work.

But that was the old, dead strategy.

The “New Strategy”

The new strategy is defined by adaptability and flexibility to take advantage of any opportunity that presents itself.  This results in strategic plans and operating budgets that are updated monthly, just-in-time decision making, and wider ranging planning scenarios.  For example:

  • Strategic plans and budgets updated  monthly – as the economy tanked over the last couple of years, baseline assumptions were rendered useless in no time at all.  Budgets updated yearly were no help.  Even budgets that were updated quarterly were subject to significant tracking error.  The only way to survive was to look at your budgets every month and adjust for cost of capital, inventory, and revenue mix.  This way a company could adjust their product mix immediately to best match what was selling and thus, maximize return.
  • Just-in-time decision making – the WSJ used a factory closing example in their article but I prefer to look at the SSD vs HDD product mix.  When to get on the SSD bandwagon is a strategic decision.  One can examine this decision yearly quarterly or monthly to see if it makes sense today or  take the time to identify the trigger points that would make the decision for you.  For SSDs, one could decide what price SLC-NAND memory has to drop to,  say $X/GB,  when SSDs would make sense.  To make this decision, one must determine how long it would take to create and launch SSD product offerings, what SLC-NAND pricing trends look like today and back up the trigger point to take this all in account.  But, after that all one need do is monitor SSD pricing daily and when it hits your trigger point start the product changeover.
  • Wider ranging scenarios – all old strategic planning used economic variables such as cost of capital, revenue growth, and cost of goods sold, many would use a range of +/- 5% on each of these factors to generate operating scenarios that were then fed into the strategic planning process.  The problem with such scenarios is that they didn’t take into account the extreme circumstances of the last couple of years.  By widening the scenarios to something like +/- 15%, they became much more useful and would have reflected actual experience.
F-15 F-16 F86 Sabre Jet Heritage Flight by TMWolf (cc) (from flickr)
F-15 F-16 F86 Sabre Jet Heritage Flight by TMWolf (cc) (from flickr)

But in the end most of this speaks to speed and taking advantage of opportunities that are present.

OODA

All this reminds me of Colonel John R. Boyd (USAF deceased) who came up with a new military and competitive strategic paradigm called OODA or Observation, Orientation, Decision, and Action.  Observe the competition (or market place), orient to (or appreciate what) the market is doing,  decide what the most appropriate action will be, and then do it.  John believed that the fastest OODA cycle always wins in the end.  Any OODA cycle takes time to perform, one that is fastest will change the marketplace such that by the time your (slower) adversary sees what’s happening and reacts, you have already changed the world out from under them.

There was a good book on Col. Boyd’s life by Robert Coram, Boyd: The Fighter Pilot Who Changed the Art of War. Also there was a bio, Genghis John, written by a close friend, Chuck Spinney.  If you are interested in understanding more on his views of conflict and strategy, I suggest starting at the bio but the book was an easy read.

How this all applies to the world with 6-18 month product development cycles, and 3 month marketing campaigns needs to be the subject of a future post…

Are SSDs an invasive species?

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

I was reading about pythons becoming an invasive species in the Florida Everglades and that brought to mind SSDs.  The current ecological niche in data storage has rotating media as the most prolific predator with tape going on the endangered species list in many locales.

So where does SSD enter into the picture.  We have written before on SSD shipments start to take off but that was looking at the numbers from another direction. Given recent announcements it appears that in the enterprise, SSDs seem to be taking over the place formerly held by 15Krpm disk devices.  These were formerly the highest performers and most costly storage around.  But today, SSDs, as a class of storage, are easily the most costly storage and have the highest performance currently available.

The data

Seagate announced yesterday that they had shipped almost 50M disk drives last quarter up 8% from the prior quarter or ~96M drives over the past 6 months.  Now Seagate is not the only enterprise disk provider (Hitachi, Western Digital and others also supply this market) but they probably have the lion’s share.  Nonetheless, Seagate did mention that the last quarter was supply constrained and believed that the total addressible market was 160-165M disk drives.  That puts Seagate’s market share (in unit volume) at ~31% and at that rate the last 6 months total disk drive production should have been ~312M units.

In contrast, IDC reports that SSD shipments last year totaled 11m units. In both the disk and SSD cases we are not just talking enterprise class devices, the numbers include PC storage as well.  If we divide this number in half we have a comparable number of 5.5M SSDs for the last 6 months, giving SSDs less than a 2% market share (in units).

Back to the ecosystem.  In the enterprise, there are 15Krpm disks, 10Krpm disks and 7.2Krpm rotating media disks.  As speed goes down, capacity goes up.  In Seagate’s last annual report they stated that approximately 10% of the drives they manufactured were shipped to the enterprise.  Given that rate, of the 312M drives, maybe 31M were enterprise class (this probably overstates the number but usable as an upper bound).

As for SSDs, in the IDC report cited above, they mentioned two primary markets the PC and enterprise markets for SSD penetration.  In that same Seagate annual report, they said their desktop and mobile markets were around 80% of disk drives shipped.  If we use that proportion for SSDs that would say that of the 5.5M units shipped last half year, 4.4 were in the PC space and 1.1M were for the enterprise.  Given that, it would state that the enterprise class SSDs represent ~3.4% of the enterprise class disk drives shipped.  This is over 10X more than my prior estimate of SSDs being (<0.2%) of enterprise disk drives.  Reality probably lies somewhere between these two estimates.

I wrote a research report a while back which predicted that SSDs would never take off in the enterprise, I was certainly wrong then.  If these numbers are correct, capturing 10% of the enterprise disk market in little under 2 years can only mean that high-end, 15Krpm drives are losing ground faster than anticipated.  Which brings up the analogy of the invasive species.  SSDs seem to be winning a significant beach head in the enterprise market.

In the mean time, drive vendors are fighting back by moving from the 3.5″ to 2.5″ form factor, offering both 15K and 10K rpm drives.   This probably means that the 15Krpm 3.5″ drive’s days are numbered.

I made another prediction almost a decade ago that 2.5″ drives would take over the enterprise around 2005 – wrong again, but only by about 5 years or so. I got to stop making predictions, …

Is M and A the only way to grow?

Photograph of Women Working at a Bell System Telephone Switchboard by US National Archives (cc) (from flickr)
Photograph of Women Working at a Bell System Telephone Switchboard by US National Archives (cc) (from flickr)

Oracle buys Sun, EMC buys Data Domain, Cisco buys Tandberg, it seems like every month another major billion dollar acquisition occurs.  Part of this is because of the recent economic troubles, which now values many companies at the lowest they have been for many years and thus, making it cheaper to acquire good (and/or failing) companies.  But one has to wonder is this the only way to grow?

I don’t think so.

Corporate growth can be purely internally driven or organic just as well as from acquisition.  But it’s definitely harder to do internally.  Why?

  • Companies are focused on current revenue producing products – Revolutionary products rarely make it into development in today’s corporations because they take resources away from other (revenue producing) products.
  • Companies are focused on their current customer base – Products that serve other customers rarely make out into the market from today’s corporations because such markets are foreign to the companies current marketing channels.
  • Company personnel understand current customer problems – To be successful, any new product must address it’s customer pain points and offer some sort of a unique, differentiated solution to those issues and because this takes understanding other customer problems, it seldom happens.
  • New products can sometimes threaten old product revenue streams – It’s a rare new product that doesn’t take market share aware from some old way of doing business.  As companies focus on a particular market, any new product development will no doubt focus on those customers as well.  Thus, many new internally developed products will often displace (or eat away at) current product revenue.  Early on, it’s hard to see how any such product can be justified with respect to current corporate revenue.
  • New products often take efforts above and beyond current product activities – To develop, market and sell revolutionary products takes enormous, “all-out” efforts to get off the ground.  Most corporations are unable to sustain this level of effort for long, as their startup phase was long ago and long forgotten.

We now know how hard it can be but how does Apple do it?  The iPod and iPhone were revolutionary products (at least from Apple’s perspective) and yet they both undeniably became great successes and helped to redefine industries in the process.  And no one can argue that they haven’t helped Apple to grow significantly in the process.  So how can this be done?

  • It takes strong visionary leadership in the company at the highest level – Such management can make the tough decisions to take resources away from current, revenue producting products and devote time and effort to new ones.
  • It takes marketing genius – Going after new markets, even if they are adjacent, requires in-depth understanding of new market dynamics and total engagement to be succesful.
  • It takes development genius – Developing entirely new products, even if based on current technology, takes development expertise above and beyond evolutionary product enhancement.
  • It takes hard work and a dedicated team – Getting new products off the ground takes a level of effort above and beyond current ongoing product activities.
  • It takes a willingness to fail – Most new internally developed products and/or startups fail.  This fact can be hard to live with and makes justifying future products even harder.

In general, all these items are easier to find in startups rather than an ongoing corporation today.  This is why most companies today find it easier and more successful to grow through acquisitions rather than through organic or internal development.

However, it’s not the only way.  ATT did it for almost a century in the telecom industry but they owned a monopoly.  IBM and HP did it occasionally over the past 60 years or so, but they had strong visionary leadership for much of that time and stumbled miserably, when such leadership was lacking.  Apple has done it over the past couple of decades or so but this is mainly due to Steve Jobs.  There are others of course, but I would venture to say all had strong leadership at the helm.

But these are the exceptions.  Strong visionary leaders usually don’t make it to the top of today’s corporations.  Why that’s the case needs to be the subject of a future post…

Latest SPECsfs2008 CIFS performance – chart of the month

Above we reproduce a chart from our latest newsletter StorInttm Dispatch on SPECsfs(R) 2008 benchmark results.  This chart shows the top 10 CIFS throughput benchmark results as of the end of last year.  As observed in the chart Apple’s Xserve running Snow Leopard took top performance with over 40K CIFS throughput operations per second.  My problem with this chart is that there are no enterprise class systems represented in the top 10 or for that matter (not shown in the above) in any CIFS result.

Now some would say it’s still early yet in the life of the 2008 benchmark but it has been out now for 18 months and still has not a single enterprise class system submission reported.  Possibly, CIFS is not considered an enterprise class protocol but I can’t believe that given the proliferation of Windows.  So what’s the problem?

I have to believe it’s part tradition, part not wanting to look bad, and part just lack of awareness on the part of CIFS users.

  • Traditionally, NFS benchmarks were supplied by SPECsfs and CIFS benchmarks were supplied elsewhere, i.e., NetBenc. However, there never was a central repository for NetBench results so comparing system performance was cumbersome at best.  I believe that’s one reason for SPECsfs’s CIFS benchmark.  Seeing the lack of a central repository for a popular protocol, SPECsfs created their own CIFS benchmark.
  • Performance on system benchmarks are always a mixed bag.  No-one wants to look bad and any top performing result is temporary until the next vendor comes along.  So most vendors won’t release a benchmark result unless it shows well for them.  Not clear if Apple’s 40K CIFS ops is a hard number to beat, but it’s been up there for quite awhile now, and has to tell us something.
  • CIFS users seem to be aware and understand NetBench but don’t have similar awareness on SPECsfs CIFS benchmark yet.  So, given today’s economic climate, any vendor wanting to impress CIFS customers would probably choose to ignore SPECsfs and spend their $s on NetBench.  The fact that comparing results was neigh impossible, could be considered an advantage for many vendors.

So SPECsfs CIFS just keeps going on.  One way to change this dynamic is to raise awareness.  So as more IT staff/consultants/vendors discuss SPECsfs CIFS results, its awareness will increase.  I realize some of  my analysis on CIFS and NFS performance results doesn’t always agree with the SPECsfs party line, but we all agree that this benchmark needs wider adoption.  Anything that can be done to facilitate that deserves my (and their) support.

So for all my storage admins, CIOs and other influencers of NAS system purchases friends out there, you need to start asking to about SPECsfs CIFS benchmark results.  All my peers out their in the consultant community, get on the bandwagon.  As for my friends in the vendor community, SPECsfs CIFS benchmark results should be part of any new product introduction.  Whether you want to release results is and always will be, a marketing question but you all should be willing to spend the time and effort to see how well new systems perform on this and other benchmarks.

Now if I could just get somebody to define an iSCSI benchmark, …

Our full report on the latest SPECsfs 2008 results including both NFS and CIFS performance, will be up on our website later this month.  However, you can get this information now and subscribe to future newsletters to receive the full report even earlier, just email us at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_Newsletter.