Micron’s new P300 SSD and SSD longevity

Micron P300 (c) 2010 Micron Technology
Micron P300 (c) 2010 Micron Technology

Micron just announced a new SSD drive based on their 34nm SLC NAND technology with some pretty impressive performance numbers.  They used an independent organization, Calypso SSD testing, to supply the performance numbers:

  • Random Read 44,000 IO/sec
  • Random Writes 16,000 IO/sec
  • Sequential Read 360MB/sec
  • Sequential Write 255MB/sec

Even more impressive considering this performance was generated using SATA 6Gb/s and measuring after reaching “SNIA test specification – steady state” (see my post on SNIA’s new SSD performance test specification).

The new SATA 6Gb/s interface is a bit of a gamble but one can always use an interposer to support FC or SAS interfaces.  In addition,today many storage subsystems already support SATA drives so its interface may not even be an issue.  The P300 can easily support 3Gb/s SATA if that’s whats available and sequential performance suffers but random IOPs won’t be too impacted by interface speed.

The advantages of SATA 6Gb/sec is that it’s a simple interface and it costs less to implement than SAS or FC.  The downside is the loss of performance until 6Gb/sec SATA takes over enterprise storage.

P300’s SSD longevity

I have done many posts discussing SSDs and their longevity or write endurance but this is the first time I have heard any vendor describe drive longevity using “total bytes written” to a drive. Presumably this is a new SSD write endurance standard coming out of JEDEC but I was unable to find any reference to the standard definition.

In any case, the P300 comes in 50GB, 100GB and 200GB capacities and the 200GB drive has a “total bytes written” to the drive capability of 3.5PB with the smaller versions having proportionally lower longevity specs. For the 200GB drive, it’s almost 5 years of 10 complete full drive writes a day, every day of the year.  This seems enough from my perspective to put any SSD longevity considerations to rest.  Although at 255MB/sec sequential writes, the P300 can actually sustain ~10X that rate per day – assuming you never read any data back??

I am sure over provisioning, wear leveling and other techniques were used to attain this longevity. Nonetheless, whatever they did, the SSD market could use more of it.  At this level of SSD longevity the P300 could almost be used in a backup dedupe appliance, if there was need for the performance.

You may recall that Micron and Intel have a joint venture to produce NAND chips.  But the joint venture doesn’t include applications of their NAND technology.  This is why Intel has their own SSD products and why Micron has started to introduce their own products as well.

—–

So which would you rather see for an SSD longevity specification:

  • Drive MTBF
  • Total bytes written to the drive,
  • Total number of Programl/Erase cycles, or
  • Total drive lifetime, based on some (undefined) predicted write rate per day?

Personally I like total bytes written because it defines the drive reliability in terms everyone can readily understand but what do you think?

SNIA’s new SSD performance test specification

Western Digital's Silicon Edge Blue SSD SATA drive (from their website)
Western Digital's Silicon Edge Blue SSD SATA drive (from their website)

A couple of weeks ago SNIA just released a new version of their SSSI (SSD) performance test specification for public comment. Not sure if this is the first version out for public comment or not but I discussed a prior version in a presentation I did for SNW last October and I have blogged before about some of the mystery of measuring SSD performance.  The current version looks a lot more polished than what I had to deal with last year but the essence of the performance testing remains the same:

  • Purge test – using vendor approved process, purge (erase) all the data on the drive.
  • Preconditioning test  – Write 2X the capacity of the drive using 128KiB blocksizes and sequentially writing through the whole device’s usable address space.
  • Steady state testing – varying blocksizes, varying read-write ratios, varying block number ranges, looped until steady state is achieved in device performance.

The steady state testing runs a random I/O mix for a minutes duration at whatever the current specified blocksize, RW ratio and block number range.  Also, according to the specification the measurements for steady state are done once 4KiB block sizes and 100% Read Write settles down.  This steady state determinant testing must execute over a number of rounds (4?) then the other performance test runs are considered at “steady state”.

SNIA’s SSSI performance test benefits

Lets start by saying no performance test is perfect.  I can always find fault in any performance test, even my own.  Nevertheless, the SSSI new performance test goes a long way towards fixing some intrinsic problems with SSD performance measurement.  Specifically,

  • The need to discriminate between fresh out of the box (FOB) performance and ongoing drive performance.  The preconditioning test is obviously a compromise in attempting to do this but writing double the full capacity of a drive will take a long time and should cause every NAND cell in the user space to be overwritten.  Once is not enough to overwrite all the devices write buffers.   However three times the device’s capacity may still show some variance in performance but it will take correspondingly longer.
  • The need to show steady state SSD performance versus some peak value.  SSDs are notorious for showing differing performance over time. Partially this is due to FOB performance (see above) but mostly this is due to the complexity of managing NAND erasure and programming overhead.

The steady state performance problem is not nearly as much an issue with hard disk drives but even here, with defect skipping, drive performance will degrade over time (but a much longer time than for SSDs).  My main quibble with the test specification is how they elect to determine steady state – 4KiB with 100% read write seems a bit over simplified.

Is write some proportion of read IO needed to define SSD “steady state” performance?

[Most of the original version of this post centered on the need for some write component in steady state determination.  This was all due to my misreading the SNIA spec.  I now realize that the current spec calls for a 100% WRITE workload with 4KiB blocksizes to settle down to determine steady state.   While this may be overkill, it certainly is consistent with my original feelings that some proportion of write activity needs to be a prime determinant of SSD steady state.]

Most of my concern with how the test determines SSD steady state performance is that lack of write activity. One concern is the lack of read activity in determining steady state. My other worry with this approach is the blocksize seems a bit too small, however this is minor in comparison.

Let’s start with the fact that SSDs are by nature assymetrical devices.  By that I mean their write performance differs substantially from their read performance due to the underlying nature of the NAND technology.  But much of what distinguishes an enterprise SSD from a commercial drive is the sophistication of its write processing.  By using a 100% read rate we are undervaluing this sophistication.

But using 100% writes to test for steady state may be too much.

In addition, it’s It is hard for me to imagine any commercial or enterprise class device in service not having some high portion of ongoing write read IO activity.  I can easily be convinced that a normal R:W activity for an SSD device is somewhere between 90:10 and 50:50.  But I have a difficult time seeing an SSD R:W ratio of 100:0 0:100 as realistic.  And I feel any viable interpretation of device steady state performance needs to be based on realistic workloads.

In SNIA’s defense they had to pick some reproducible way to measure steady state.  Some devices may have had difficulty reaching steady state with any 100% write activity.  However, most other benchmarks have some sort of cut off that can be used to invalidate results.  Reaching steady state is one current criteria for SNIA’s SSSI performance test.  I just think adding some portion of write mix of read and write activity would be a better measure of SSD stability.

As for the 4KiB block size, it’s purely a question of what’s the most probable blocksize in the use of SSDs and  may vary for  enterprise or consumer applications.  But 4KiB seems a bit behind the times especially with todays 128GB and higher drives…

What do you think should SSD steady state need some portion of write mix of read and write activity or not?

[Thanks to Eden Kim and his team at SSSI for pointing out my spec reading error.]

Reflections on this week’s SNW

SNW hall servers and storage
SNW hall servers and storage

The crowd seemed more end-user centric, the exhibit floor seemed less intense, and sigh, the bar less crowded.  But mostly what I heard at this week’s SNW was more interest on SSDs and on cloud computing and storage.

Admittedly, I am a different observer than most at SNW.  I typically do not attend tutorials/sessions unless I speak at them, I focus my time on the exhibit floor looking for new technology and I go out of my way to talk with strangers.

At past SNWs mostly I would meet other vendor personnel.  In contrast, at this SNW, I met many more end-users in these chance encounters.  Vendors were still present on the exhibit floor but not as evident  at lunch or the reception.  Perhaps,  there were less auxialliary vendor personnel attending SNW this spring.   The economy may be forcing vendors to cut-back.  Whether this trend continues will need to wait until the next SNW but it started at least a year and a half ago and has really taken off over the past two SNWs.

More SNW hall servers and storage
More SNW hall servers and storage

As for the exhibit floor, less giveaways, less booth babes, and less gambling/magic/raffles to entice customers.  While I was on the exhibit floor there didn’t seem to be any one booth that was drawing all the traffic and as such, all vendors seemed to share show participants equally.  Also, the kiosks in the hall were a bit more subdued as well not capturing people as they walked by as in past shows.

I don’t know the final SNW headcount but it would seem to be about the same as last fall’s SNW except for the minimal vendor personnel.  But I was especially surprised by the lack of Brocade, Cisco, HDS, and Microsoft on the exhibit floor as well as not having any executives present to talk with analysts.  This seems a significant departure from prior SNWs.  I am sure the ROI on SNW has changed as it’s audience mix evolves but one would think the higher end-user proportion would drive more pressure to be here not less.  Nevertheless, I believe their participation in tutorial sessions was not diminished as much as their presence on the exhibit floor.

More SNW hall servers and storage
More SNW hall servers and storage

I met a customer that has been to every SNW since the beginning.  He said that the Symantec Vision conference and NAB occurring during the same week made deciding where to go more of a problem than usual.

Future SNWs

Where SNW goes from here is anyone’s guess.  Some people I talked with thought all the information available on the web makes having a place to see equipment and talk to vendors like SNW redundant.  However, from a vendor perspective, there is an ongoing need to talk directly with customers and obtain new leads.  Something like SNW that concentrates this activity in one place and one time represents a significant advantage.  It certainly does for my business.

Email marketing was supposed to be the death of mail solicitation but my mailbox has seen no end to junk mail.  Similarly, blogging, facebook, and other social media was going to kill offline marketing but all it did was to create other ways to gain my attention.  Perhaps, the marketing spend must adjust for new approaches but old ways never seem to go away entirely. Each company is different, what makes sense for EMC, NetApp, and IBM may make no sense for Cisco, Brocade, HDS, and Microsoft.

More SNW hall servers and storage
More SNW hall servers and storage

One thing present at this SNW more than the last one was social media.  More tweets, more blogging, and more pod/videocasts were generated daily.  At Monday nites tweetup there was at least one more person there than last SNW’s tweetup and we had at least three different sets of vendors/analysts/customers show up as well.  One difference from last fall’s SNW tweetup was that it was held at a bar.

Another thing, I found personally significant, some of my vendor meetings were specifically focused on my role as a blogger versus industry analyst.  This seemed to dictate whether vendors discussed NDA material or not with me.  But the funny thing is I seem to be treated better as a blogger than as an analyst.

I think something like SNW will be around for a long time to come.  Video chats and webcasts have not eliminated meeting face-to-face. Yes information is widely available on the web.  But, obtaining such information depends on actively searching for it or something similar.  On the other hand, conferences like SNW, generate random, spur of the moment contacts.  Such encounters can lead to technology adoption that wasn’t even considered beforehand and potentially start significant sales conversations just by being in the right place at the right time.  Such randomness is impossible to replicate today with a purely web-based experience, there is just too much information and noise out there.

So yes, SNW and other conference/bazaars will be around for awhile longer.  They will change with the times but their essence will remain, that being providing a venue for customers to meet vendors and see first hand the technology that’s available.

SNIA Tech Center Grand Opening – Mystery Storage Contest

SNIA Technology Center Grand Opening Event - mingling before the show
SNIA Technology Center Grand Opening Event - mingling before the show

Yesterday in Colorado Springs SNIA held a grand opening for their new Technology Center.  They have moved their tech center about a half mile closer to Pikes Peak.

The new center has less data center floor space than the old one but according to Wayne Adams/EMC and Chairman of SNIA Board, this is a better fit for what SNIA is doing today.  These days SNIA doesn’t do as many plugfests requiring equipment to all be co-located on the same data center floor.  Most of SNIA’s plugfests today are done over the web, remotely, across continent wide distances.

SNIA’s new tech center is being leased from LSI which occupies the other half of the building.  If you were familiar with the old tech center it was leased from HP which resided in the other half of the old tech center.  This seems to work well for SNIA in the past, providing access to a large number of technical experts which can be called on to help out when needed.

A couple of things I didn’t realize about SNIA:

  • They have been in Colorado Springs since 2001
  • They have only 14 USA employees but over 4000 volunteers
  • They host a world wide IO trace library which member companies contribute and can access
  • All the storage equipment in their data center is provided for free by vendor/member companies.
  • SNIA certification training is one of the top 10 certifications as judged by an independent agency

Took a tour of their technology display area highlighting SNIA initiatives:

SNIA Green Storage Initiative display station
SNIA Green Storage Initiative display station
  • Green Storage Initiative display had a technician working on getting the power meter working properly.  One of only two analyzers I saw in their data centers.  The green storage initiative is all about the energy consumption of storage.
  • Solid State Storage Initiative display had a presentation on the SSSI activities and white papers which they have produced on this technology.
  • XAM initiative section had a couple of people talking about the importance of XAM to compliance activities and storage archives.
  • FCIA section had a talk on FC and its impact on storage present and futures.

Other initiatives were on display as well but I spent less time studying them.  In the conference room and 2 training rooms, SNIA had presentations on their Certification activity and storage training opportunities.  Howie Goldstein (HGAI Associates) was in one of the training rooms talking about education he provides through SNIA.

SNIA Tech Center Computer Lab 1
SNIA Tech Center Computer Lab 1

The new tech center has two computer labs.  Lab 1 seemed to have just about every vendors storage hardware.  As you can see from the photo, each storage subsystem was dedicated to SNIA initiative activities.  Didn’t see a lot of servers using this storage but they were probably located in computer lab 2.  In the picture one can see EMC, HDS, APC, and at the end, 3PAR storage.  On the other side of the aisle (not shown) was HP, NetApp, PillarData, and more HDS storage (and probably missed one or two more).

Don’t recall the SAN switch hardware but it wouldn’t surprise me to include a representative selection of all the vendors.  There was more switch hardware in Lab 2 and there we could make easily make out (McData or now) Brocade switch hardware.

SNIA Tech Center Computer Lab 2 switching hw
SNIA Tech Center Computer Lab 2 switching hw

Computer Lab 2 seemed to have most of the server hardware and more storage.  But both labs looked pretty clean from my perspective, probably due to all the press and grand opening celebration.  It ought to look more lived in/worked in over time.  I always like labs to be a bit more chaotic (see my price of quality post for a look at a busy HP EVA lab).

You would think with all SNIAs focus on plugfests there would be a lot more hardware analyzers floating around the two labs.  But outside of the power meter in the Green Storage Initiative display the only other analyzer like equipment I saw was a lone workstation behind some of the storage in lab 2.  It was way too clean to actually be in use. There ought to be post-it notes all over it, cables hanging all around it, with manuals and other documentation underneath it (but maybe I am a little old school docs should all be online nowadays).  Also, I didn’t see one white board in either of the labs, also a clear sign of early life.

SNIA Tech Center Lab 2 Lone portable workstation
SNIA Tech Center Lab 2 Lone portable workstation

We didn’t get to see much of the office space but it looked like plenty of windows and decent sized offices. Not sure how many people would be assigned to each but they have to put the volunteers and employees someplace.

Mystery Storage Contest – 1

And now for a new contest. See if you can determine the storage I am showing in the photo below.  Please submit your choice via comment(s) to this post and be sure to supply a valid email address.

Contest participant(s) will all receive a subscription to my (free) monthly Storage Intelligence email newsletter.  One winner will be chosen at random from all correct entries and will earn a free coupon code for 30% off any Silverton Consulting Briefings purchased through the web (once I figure out how to do this).

Correct entries must supply a valid email and identify the storage vendor and product model depicted in the picture.  Bonus points will be awarded for anyone who can tell the raw capacity of the subsystem in the picture.

SNIA volunteers, SNIA employees, SNIA member company employees and family members of any of these are not allowed to submit answers.  The contest will be closed 90 days after this post is published.  (And would someone from SNIA Technology Center please call me at 720-221-7270 and provide the identification and raw capacity of the storage subsystem depicted below in Computer Lab 2 on the NorthWest Wall.)

SNIA Technology Center - Mystery Storage 1
SNIA Technology Center - Mystery Storage 1

Remember our first Mystery Storage Contest closes in 90 days.

Also if you would like to submit an entry picture for future mystery storage contests please indicate so in your comment and I will be happy to contact you directly.

Why SSD performance is a mystery?

SSDs! :) by gimpbully (cc) (from flickr)
SSDs! 🙂 by gimpbully (cc) (from flickr)

SSD and/or SSS (solid state storage) performance is a mystery to most end-users. The technology is inherently asymmetrical, i.e., it reads much faster than it writes. I have written on some of these topics before (STEC’s new MLC drive, Toshiba’s MLC flash, Tape V Disk V SSD V RAM) but the issue is much more complex when you put these devices behind storage subsystems or in client servers.

Some items that need to be considered when measuring SSD/SSS performance include:

  • Is this a new or used SSD?
  • What R:W ratio will we use?
  • What blocksize should be used?
  • Do we use sequential or random I/O?
  • What block inter-reference interval should be used?

This list is necessarily incomplete but it’s representative of the sort of things that should be considered to measure SSD/SSS performance.

New device or pre-conditioned

Hard drives show little performance difference whether new or pre-owned, defect skips notwithstanding. In contrast, SSDs/SSSs can perform very differently when they are new versus when they have been used for a short period depending on their internal architecture. A new SSD can write without erasure throughout it’s entire memory address space but sooner or later wear leveling must kick in to equalize the use of the device’s NAND memory blocks. Wear leveling causes both reads and rewrites of data during it’s processing. Such activity takes bandwidth and controller processing away from normal IO. If you have a new device it may take days or weeks of activity (depending on how fast you write) to attain the device’s steady state where each write causes some sort of wear leveling activity.

R:W Ratio

Historically, hard drives have had slightly slower write seeks than reads, due to the need to be more accurately positioned to write data than to read it. As such, it might take .5msec longer to write than to read 4K bytes. But for SSDs the problem is much more acute, e.g. read times can be in microseconds while write times can almost be in milliseconds for some SSDs/SSSs. This is due to the nature of NAND flash, having to erase a block before it can be programmed (written) and the programming process taking a lot’s longer than a read.

So the question for measuring SSD performance is what read to write (R:W) ratio to use. Historically a R:W of 2:1 was used to simulate enterprise environments but most devices are starting to see more like 1:1 for enterprise applications due to the caching and buffering provided by controllers and host memory. I can’t speak as well for desktop environments but it wouldn’t surprise me to see 2:1 used to simulate desktop workloads as well.

SSDs operate a lot faster if their workload is 1000:1 than for 1:1 workloads. Most SSD data sheets tout a significant read I/O rate but only for 100% read workloads. This is like a subsystem vendor quoting a 100% read cache hit rate (which some do) but is unusual in the real world of storage.

Blocksize to use

Hard drives are not insensitive to blocksizes, as blocks can potentially span tracks which will require track-to-track seeks to be read or written. However, SSDs can also have some adverse interaction with varying blocksizes. This is dependent on the internal SSD architecture and is due to over optimizing write performance.

With an SSD, you erase a block of NAND and write a page or sector of NAND at a time. As writes takes much longer than reads, many SSD vendors add parallelism to improve write throughput. Parallelism writes or programs multiple sectors at the same time. Thus, if your blocksize is an integral multiple of the multi-sector size written performance is great, if not, performance can suffer.

In all honesty, similar issues exist with hard drive sector sizes. If your blocksize is an integral multiple of the drive sector size then performance is great, if not too bad. In contrast to SSDs, drive sector size is often configurable at the device level.

Sequential vs. random IO

Hard drives perform sequential IO much better than random IO. For SSDs this is not much of a problem, as once wear leveling kicks in, it’s all random to the NAND flash. So when comparing hard drives to SSDs the level of sequentiality is a critical parameter to control.

Cache hit rate

The block inter-reference interval is simply measures how often the same block is re-referenced. This is important for caching devices and systems because it ultimately determines the cache hit rate (reading data directly from cache instead of the device storage). Hard drives have onboard cache of 8 to 32MB today. SSD drives also have a DRAM cache for data buffering and other uses. SSDs typically publicize their cache size so in order to insure 0 cache hits one needs an block inter-reference interval close to the device’s capacity. Not a problem today with 146GB devices but as they move to 300GB and larger it becomes more of a problem to completely characterize device performance.

The future

So how do we get a handle on SSD performance? SNIA and others are working on a specification on how to measure SSD performance that will one day become a standard. When the standard is available we will have benchmarks and service groups that can run these benchmarks to validate SSD vendor performance claims. Until then – caveat emptor.

Of course most end users would claim that device performance is not as important as (sub)system performance which is another matter entirely…

XAM and data archives

Vista de la Biblioteca Vasconcelos by Eneas
Vista de la Biblioteca Vasconcelos by Eneas

XAM, a SNIA defined interface standard supporting reference data archives, is starting to become real. EMC and other vendors are starting to supply XAM compliant interfaces.  I could not locate (my Twitter survey for application vendors came back empty) any application vendors supporting XAM APIs but its only a matter of time .  What does XAM mean for your data archive?

The problem

Most IT shops with data archives use special purpose applications that support a vendor defined proprietary interface to store and retrieve data out of a dedicated archive appliance. For example, many email archives support EMC Centerra which has defined a proprietary Centerra API to store and retrieve data from their appliance.  Most other archive storage vendors have followed suit.  Leading to a proprietary vendor lock-in which slows adoption.

However, some proprietary APIs have been front-ended with something like NFS. The problem with NFS and other standard file interfaces is that they were never meant for reference data (data that does not change). So when you try to update an archived file one often gets some sort of weird system error.

Enter XAM

It was designed from the start for reference data. Moreover, XAM supports concurrent access to multiple vendor archive storage systems from the same application. As such, an application supplier need only code to one standard API to gain access to multiple vendor archive systems.

SNIA released the V1.0 XAM interface specfication last July  which defines XAM architecture, C- and JAVA-language API for both the application and the storage vendor.   Although from the looks of it the C version of vendor API is more complete.

However, currently I can only locate two archive storage vendors having released support for the XAM interface (EMC Centerra and SAND/DNA?).   A number of vendors have expressed interest in providing XAM interfaces (HP, HDS HCAP, Bycast StorageGrid and others).   How soon their XAM API support will be provided is TBD.

I would guess what’s really needed is for more vendors to start supporting XAM interface which would get the application vendors more interested in supporting XAM.   Its sort of a chicken and egg thing but I believe the storage vendors have the first move, the application vendors will take more time to see the need.

Does anyone know what other storage vendors support XAM today. Is there any single place where one could even find out? Ditto for applications supporting XAM today?