Tape vs. Disk, the saga continues

Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)
Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)

Was on a call late last month where Oracle introduced their latest generation T1000C tape system (media and drive) holding 5TB native (uncompressed) capacity. In the last 6 months I have been hearing about the coming of a 3TB SATA disk drive from Hitachi GST and others. And last month, EMC announced a new Data Domain Archiver, a disk only archive appliance (see my post on EMC Data Domain products enter the archive market).

Oracle assures me that tape density is keeping up if not gaining on disk density trends and capacity. But density or capacity are not the only issues causing data to move off of tape in today’s enterprise data centers.

“Dedupe Rulz”

A problem with the data density trends discussion is that it’s one dimensional (well literally it’s 2 dimensional). With data compression, disk or tape systems can easily double the density on a piece of media. But with data deduplication, the multiples start becoming more like 5X to 30X depending on frequency of full backups or duplicated data. And number’s like those dwarf any discussion of density ratios and as such, get’s everyone’s attention.

I can remember talking to an avowed tape enginerr, years ago and he was describing deduplication technology at the VTL level as being architecturally inpure and inefficient. From his perspective it needed to be done much earlier in the data flow. But what they failed to see was the ability of VTL deduplication to be plug-compatible with the tape systems of that time. Such ease of adoption allowed deduplication systems to build a beach-head and economies of scale. From there such systems have no been able to move up stream, into earlier stages of the backup data flow.

Nowadays, what with Avamar, Symantec Pure Disk and others, source level deduplication, or close by source level deduplication is a reality. But all this came about because they were able to offer 30X the density on a piece of backup storage.

Tape’s next step

Tape could easily fight back. All that would be needed is some system in front of a tape library that provided deduplication capabilities not just to the disk media but the tape media as well. This way the 30X density over non-deduplicated storage could follow through all the way to the tape media.

In the past, this made little sense because a deduplicated tape would require potentially multiple volumes in order to restore a particular set of data. However, with today’s 5TB of data on a tape, maybe this doesn’t have to be the case anymore. In addition, by having a deduplication system in front of the tape library, it could support most of the immediate data restore activity while data restored from tape was sort of like pulling something out of an archive and as such, might take longer to perform. In any event, with LTO’s multi-partitioning and the other enterprise class tapes having multiple domains, creating a structure with meta-data partition and a data partition is easier than ever.

“Got Dedupe”

There are plenty of places, that today’s tape vendors can obtain deduplication capabilities. Permabit offers Dedupe code for OEM applications for those that have no dedupe systems today. FalconStor, Sepaton and others offer deduplication systems that can be OEMed. IBM, HP, and Quantum already have tape libraries and their own dedupe systems available today all of which can readily support a deduplicating front-end to their tape libraries, if they don’t already.

Where “Tape Rulz”

There are places where data deduplication doesn’t work very well today, mainly rich media, physics, biopharm and other non-compressible big-data applications. For these situations, tape still has a home but for the rest of the data center world today, deduplication is taking over, if it hasn’t already. The sooner tape get’s on the deduplication bandwagon the better for the IT industry.

—-

Of course there are other problems hurting tape today. I know of at least one large conglomerate that has moved all backup off tape altogether, even data which doesn’t deduplicate well (see my previous Oracle RMAN posts). And at least another rich media conglomerate that is considering the very same move. For now, tape has a safe harbor in big science, but it won’t last long.

Comments?

Recent ESRP v3.0 (Exchange 2010) performance results – chart of the month

SCIESRP110127-003 (c) 2011 Silverton Consulting, Inc., All Rights Reserved
SCIESRP110127-003 (c) 2011 Silverton Consulting, Inc., All Rights Reserved

We return to our monthly examination of storage performance, and this month’s topic is Exchange 2010 performance comparing results from the latest ESRP v3.0 (Exchange Solution Review Program).  This latest batch is for the 1K-and-under mailbox category and the log playback chart above represents the time it takes to process a 1MB log file and apply this to the mailbox database(s).  Data for this report is taken from Microsoft’s ESRP v3.0 website published results and this chart is from our latest storage intelligence performance dispatch sent out in our free monthly newsletter.

Smaller is better on the log playback chart.  As one can see it takes just under 2.4 seconds for the EMC Celerra NX4 to process a 1MB log file whereas it takes over 7.5 seconds on an EMC Iomega IX12-300r storage subsystem.  To provide some perspective in the next larger category, for storage supporting from 1K-to-5K mailboxes,  the top 10 log playback times range from ~0.3 to ~4.5 seconds and as such, the Celerra NX4 system and the other top four subsystems here would be in the top 10 log playback times for that category as well.

Why log playback

I have come to believe that log playback is an important metric in Exchange performance, for mainly one reason, it’s harder to game using Jetstress paramaterization.   For instance, with Jetstress one must specify how much IO activity is generated on a per mailbox basis, thus generating more or less requests for email database storage. Such specifications will easily confound storage performance metrics such as database accesses/second when comparing storage. But with log playback, that parameter is immaterial and every system has the same 1MB sized log file to process as fast as passible, i.e., it has to be read and applied to the configured Exchange database(s).

One can certainly still use a higher performing storage system, and/or throw SSD, more drives or more cache at the problem to gain better storage performance but that also works for any other ESRP performance metric.  But with log playback, Jetstress parameters are significantly less of a determinant of storage performance.

In the past I have favored database access latency charts for posts on Microsoft Exchange performance but there appears to be much disagreement as to the efficacy of that metric in comparing storage performance (e.g., see the 30+ comments on one previous ESRP post).  I still feel that latency is an important metric and one that doesn’t highly depend on Jetstress IO/sec/mailbox parameter but log playback is even more immune to that parm and so, should be less disputable.

Where are all the other subsystems?

You may notice that there are less than 10 subsystems on the chart. These six are the only subsystems that have published results in this 1K-and-under mailbox category.  One hopes that the next time we review this category there will be more subsystem submissions available to discuss here.  Please understand, ESRP v3.0 is only a little over a year old when our briefing came out.

—-

The full performance dispatch will be up on our website after month end but if one needs to see it sooner, please sign up for our free monthly newsletter (see subscription widget, above right) or subscribe by email and we’ll send you the current issue along with download instructions for this and other reports.  Also, if you need an even more in-depth analysis of block storage performance please consider purchasing SCI’s SAN StorInt Briefing also available from our website.

As always, we welcome any constructive suggestions on how to improve any of our storage performance analyses.

Comments?

Oracle RMAN and data deduplication – part 2

Insight01C 0011 by watz (cc) (from Flickr)
Insight01C 0011 by watz (cc) (from Flickr)

I have blogged about the poor deduplication ratios seen when using Oracle 10G RMAN compression before (see my prior post) but not everyone uses compressed backupsets.  As such, the question naturally arises as how well RMAN non-compressed backupsets deduplicate.

RMAN backup types

Oracle 10G RMAN supports both full and incremental backups.  The main potential for deduplication would come when using full backups.  However, 10G also supports something called RMAN cumulative incremental backups in addition to the more normal differential backups.  Cumulative incrementals backs up all changes since the last full and as such, could readily duplicate many changes which occur between full backups also leading to higher deduplication rates.

RMAN multi-threading

In any event, the other issue with RMAN backups is Oracle’s ability to multi-thread or multiplex backup data. This capability was originally designed to keep tape drives busy and streaming when backing up data.  But the problem with file multiplexing is that file data is intermixed with blocks from other files within a single data backup stream, thus losing all context and potentially reducing deduplication ability.  Luckily, 10G RMAN file multiplexing can be disabled by setting FILESPERSET=1, telling Oracle to provide only a single file per data stream.

Oracle’s use of meta-data in RMAN backups also makes them more difficult to deduplicate but some vendors provide workarounds to increase RMAN deduplication (see Quantum DXIEMC Data Domain and others).

—-

So deduplication of RMAN backups will vary depending on vendor capabilities as well as admin RMAN backup specifications.  As such, to obtain the best data deduplication of RMAN backups follow deduplication vendor best practices, use periodic full and/or cumulative incremental backups, don’t use compressed backupsets, and set FILESPERSET=1.

Comments?

DC is back

Power sub distribution board by Tom Raftery (cc) (from Flickr)
Power sub distribution board by Tom Raftery (cc) (from Flickr)

Read an article in this month’s IEEE Spectrum on providing direct current (DC) power distribution to consumers.  Apparently various groups around the world are preparing standards to provide 24-V DC and 380-V DC power distribution to home and office.

Why DC is good

It turns out that most things electronic today (with the possible exception of electro-magnetic motors) run off DC power.  In fact, the LED desklamp I just purchased has a converter card in the plug adapter that converts 120-V alternating current (AC ) to 24-V DC to power the lamp.

If you look at any PC, server, data storage, etc., you will find power supplies that convert AC to DC for internal electronic use.  Most data centers take in 480-V AC which is converted to DC to charge up uninterruptible power supply batteries that discharge DC power that is  converted back to AC which is then converted internally back to DC for server electronics.  I count 3 conversions there: AC to DC, DC to AC and AC to DC.

But the problem with all this AC-DC conversion going on, is that it takes energy.

The War of Currents or why we have AC power today

Edison was a major proponent of DC power distribution early in the history of electronic power distribution. But the issues with DC or even AC for that matter is that voltage is lost over any serious line distances which required that DC generation stations of the time had to be located within a mile of consumers.

In contrast, Tesla and Westinghouse first proposed distributing AC power because of the ability to convert high voltage AC to low voltage using transformers.  To see why this made a difference read on, …

It turns out the major problem with the amount of line loss depends on the current being transmitted.  But current is only one factor in the equation that determines electrical power, the other factor being voltage.  You see any electrical power level can be represented by high current-low voltage or low current-high voltage.

Because AC at the time could easily be converted from high to low voltage or vice versa, high voltage-low current AC power lines could easily be converted (locally) to low voltage-high current power lines.   A high voltage-low current line lost less power and as AC voltage could be converted more easily, AC won the War of Currents.

Move ahead a century or so, and electronics have advanced to the point that converting DC voltage is almost as easy as AC today.  But more to the point, with today’s AC distribution, changing from lot’s of small, individual AC to DC and DC to AC converters in each appliance, server, UPS, etc., can be better served by a few, larger AC to DC converters at the building or household level, improving energy efficiency.

Where DC does better today

Batteries, solar panels, solid state electronics (pretty much any electronic chip, anywhere today), and LED lighting all operate on DC power alone and in most cases convert AC to DC to use today’s AC power distribution.  But by having 24-V or 380-V DC power in the home or office, it would allow these devices to operate without converters and be more efficient.  The Spectrum article states that LED lighting infrastructure can save up to 15% of the energy required if it was just powered by DC rather than having to convert AC to DC.

However, with the industry standards coming out of Emerge and European Telecommunications Standards Institute we may have one other significant benefit.  We could have one worldwide plug/receptacle standard for DC power.  Whether this happens is anyone’s guess, and given today’s nationalism may not be feasible.  But we can always hope for sanity to prevail…

Comments?

To iPad or not to iPad – part 4

Apple iPad (wi-fi) (from apple.com)
Apple iPad (wi-fi) (from apple.com)

I took the iPad to another conference last month. My experience the last time I did this (see To iPad or not to iPad – part 3) made me much more leary, but I was reluctant to lug the laptop for only a 2-day trip.

Since my recent experience, I have become a bit more nuanced and realistic with my expectations for iPad use on such trips. As you may recall, I have an iPad without 3G networking.

When attending a conference and using a laptop, I occasionally take a few notes, do email, twitter, blog and other work related items. With my iPad I often take copius notes – unclear why other than it’s just easier/quicker to get out of my backpack/briefcase and start typing on. When I take fewer notes usually I don’t have a table/desk to use for the iPad and keyboard.

As for the other items email, twitter, and blogging, my iPad can do all of these items just fine with proper WiFi connectivity. Other work stuff can occasionally be done offline but occasionally requires internet access, probably ~50:50.

iPhone and iPad together

I have found that an iPhone and iPad can make a very useable combination in situations with flaky/inadequate WiFi. While the iPad can attempt to use room WiFi, the iPhone can attempt to use 3G data network to access the Internet. Mostly, the iPhone wins in these situations. This works especially well when WiFi is overtaxed at conferences. The other nice thing is that the BlueTooth (BT) keypad can be paired with either the iPad or the iPhone (it does take time, ~2-5 minutes to make the switch, so I don’t change pairing often).

So at the meeting this past month, I was doing most of my note taking and offline work items with the iPad and blogging, tweeting and emailing with the iPhone.

If the iPad WiFi was working well enough, I probably wouldn’t use the iPhone for most of this. However, I find that at many conferences and most US hotels, WiFi is either not available in the hotel room or doesn’t handle conference room demand well enough to depend on. Whereas, ATT’s 3G network seems to work just fine for most of these situations (probably because, no one is downloading YouTube videos to their iPhone).

A couple of minor quibbles

While this combination works well enough, I do have a few suggestions to make this even better to use,

  • Mouse support – Although, I love the touch screen for most tasks, editing is painful without a mouse. Envision this, you are taking notes, see an error a couple of lines back, and need to fix it. With the iPad/iPhone, one moves your hand from keypad to point to the error on the screen to correct it. Finger pointing is not as quick to re-position cursors as a mouse and until magnification kicks in obscures the error, leading to poor positioning. Using the BT keypad arrow keys are more accurate but not much faster. So, do to bad cursor positioning, I end up deleting and retyping many characters that weren’t needed. As a result, I don’t edit much on the iPad/iPhone. If a BT mouse (Apple’s magic mouse) would pair up with the iPad&iPhone editing would work much better. Alternatively, having some like the old IBM ThinkPad Trackpoint in the middle of a BT keypad would work just fine. Having the arrow keys respond much faster would even be better.
  • iPad to iPhone file transfer capability – Now that I use the iPad offline with an online iPhone, it would be nice if there was some non-Internet way to move data between the two. Perhaps using the BT’s GOEB capabilities to provide FTP-lite services would work. It wouldn’t need high bandwidth as typical use would be to only move a Pages, Numbers, or Keynote file to the iPhone for email attachment or blog posting . It would be great if this were bi-directional. Another option is supporting a USB port but would require more hardware. A BT file transfer makes more sense to me.
  • iPad battery power – Another thing I find annoying at long conferences is iPad battery power doesn’t last all day. Possibly having BT as well as WiFi active may be hurting battery life. My iPad often starts running out of power around 3pm at conferences. To conserve energy, I power down the display between note taking and this works well enough it seems. The display comes back alive whenever I hit a key on the BT keypad and often I don’t even have to retype the keystrokes used to restart the display. More battery power would help.

—-

So great, all this works just fine domestically, but my next business trip is to Japan. To that end, I have been informed that unless I want to spend a small fortune in roaming charges, I should disable iPhone 3G data services while out of country. As such, if I only take my iPad and iPhone, I will have no email/twitter/blog access whenever WiFi is unavailable. If I took a laptop at least it could attach to an Ethernet cable if that were available. However, I have also been told that WiFi is generally more available overseas. Wish me luck.

Anyone know how prevalent WiFi is in Tokyo hotels and airports and how well it works with iPhone/iPad?

Other comments?

Personal medical record archive

MRI of my brain after surgery for Oligodendroglioma tumor by L_Family (cc) (From Flickr)
MRI of my brain after surgery for Oligodendroglioma tumor by L_Family (cc) (From Flickr)

I was reading a book the other day and it suggested that sometime in the near future we will all have a personal medical record archive. Such an archive would be a formal record of every visit to a healthcare provider, with every x-ray, MRI, CatScan, doctor’s note, blood analysis, etc. that’s ever done to a person.

Such data would be our personal record of our life’s medical history usable by any future medical provider and accessible by us.

Who owns medical records?

Healthcare is unusual.  For any other discipline like accounting, you provide information to the discipline expert and you get all the information you could possibly want back, to store, send to the IRS or or whatever, to do with it as you want.  If you decide to pitch it, you can pretty much request a copy (at your cost) of anything for a certain number of years after the information was created.

But, in medicine, X-rays are owned and kept by the medical provider, same with MRIs, CT scans, etc. and you hardly ever get a copy.  Occasionally, if the physician deems it useful for explicative reasons, you might get a grainy copy of an X-ray that shows a break or something but other than that and possible therapeutic instructions, typically nothing.

Getting Doctor’s notes is another question entirely.  It’s mostly text records in some sort of database somewhere online to the medical unit.  But, mainly what we get as patients, is a verbal diagnosis to take in and mull over.

Personal experience with medical records

I worked for an enlightened company a while back that had their own onsite medical practice providing all sorts of healthcare to their employees.  Over time, new management decided this service was not profitable and terminated it.  As they were winding down the operation, they offered to send patient medical information to any new healthcare provider or to us.  Not having a new provider, I asked they send them to me.

A couple of weeks later, a big brown manilla envelope was delivered.  Inside was a rather large, multy-page printout of notes taken by every medical provider I had visited throughout my tenure with this facility.  What was missing from this assemblage was lab reports, x-rays and other ancillary data that was taken in conjunction with those office visits. I must say the notes were comprehensive and somewhat laden with medical terminology but they were all there to see.

Printouts were not very useful to me and probably wouldn’t be to any follow-on medical group caring for me. However the lack of x-rays, blood work, etc. might be a serious deficiency for any follow-on treatment.  But, as far as I was concerned it was the first time any medical entity even offered me any information like this.

Making personal medical records useable, complete, and retrievable

To take this to the next level, and provide something useful for patients and follow-on healthcare, we need some sort of standardization of medical records across the healthcare industry.  This doesn’t seem that hard, given where we are today and need not be that difficult.  Standards for most medical data already exist, specifically,

  • DICOM or Digital Imaging and Communications in Medicine – is a standard file format used to digitally record X-Rays, MRIs, CT scans and more.  Most digital medical imaging technology (except for ultrasound) out there today optionally records information in DICOM format.  There just so happens to be an open source DICOM viewer that anyone can use to view these sorts of files if one is interested.
  • Ultrasound imaging –  is typically rendered and viewed as a sort of movie and is often used for soft tissue imaging and prenatal care.  I don’t know for sure but cannot find any standard like DICOM for ultrasound images.  However, if they are truly movies, perhaps HD movie files would suffice for a standard ultrasound imaging file.
  • Audiograms, blood chemistry analysis, etc. – is provided by many technicians or labs and could all be easily represented as PDFs, scanned images, JPEG/MPEG recordings, etc.  Doctors or healthcare providers often discuss salient items off these reports that are of specific interest to the patients condition.  Such affiliated notes could all be in an associated text file or even a recording made of the doctor discussing the results of the analysis that somehow references the other artifact  (“Blood chemistry analysis done on 2/14/2007 indicates …”).
  • Other doctor/healthcare provider notes – I find that everytime I visit a healthcare provider these days, they either take copious notes using WIFI connected laptops, record verbal notes to some voice recorder later transcribed into notes, or some combination of these. Any of such information could be provided in standard RTF  (text files) or MPEG recordings and viewed as is.

How patients can access medical data

Most voice recordings or text notes could easily be emailed to the patient.  As for DICOM images, ultrasound movies, etc., they could all be readily provided on DVDs or other removable media sent to the patient.

Another and possibly better alternative, is to have all this data uploaded to a healthcare provider’s designated URL, stored in a medical record cloud someplace, allowing patient access for viewing, downloading and/or copying.   I envision something akin to a photo sharing site, upload-able by any healthcare provider but accessible for downloads by any authorized user/patient.

Medical information security

Any patient data stored in such a medical record cloud would need to be secured and possibly encrypted by a healthcare provider supplied pass code which could be used for downloading/decrypting by the patient.  There are plenty of open source cryptographic algorithms which would suffice to encrypt this data (see GNU Privacy Guard for instance).

As for access passwords, possible some form of public key cryptography would suffice but it need not be that sophisticated.  I prefer to use open source tools for these security mechanisms as then it would be readily available to the patient or any follow-on medical provider to access and decrypt the data.

Medical information retention period

The patient would have a certain amount of time to download these files.  I lean towards months just to insure it’s done in a timely fashion but maybe it should be longer, something on the order of 7-years after a patients last visit might work.  This would allow the patient sufficient time to retrieve the data and to supply it to any follow-on medical provider or stored it in their own, personal medical record archive. There are plenty of cloud storage providers I know, that would be willing to store such data at a fair, but high price, for any period of time desired.

Medical information access credentials

All the patient would need is an email and/or possible a letter that provides the accessing URL, access password and encryption passcode information for the files.  Possibly such information could be provided in plaintext, appended to any bill that is cut for the visit which is sure to find its way to the patient or some financially responsible guardian/parent.

How do we get there

Bootstrapping this personal medical record archive shouldn’t be that hard.  As I understand it, Electronic Medical Record (EMR) legislation in the US and elsewhere has provisions stating that any patient has a legal right to copies of any medical record that a healthcare provider has for them.  If this is true, all we need do then is to institute some additional legislation that requires the healthcare provider to make those records available in a standard format, in a publicly accessible place, access controlled/encrypted via a password/passcode, downloadable by the patient and to provide the access credentials to the patient in a standard form.  Once that is done, we have all the pieces needed to create the personal medical record archive I envision here.

—-

While such legislation may take some time, one thing we could all do now, at least in the US, is to request access to all medical records/information that is legally ours already.  Once all the healthcare providers, start getting inundated with requests for this data, they might figure having some easy, standardized way to provide it would make sense.  Then the healthcare organizations could get together and work to finalize a better solution/legislation needed to provide this in some standard way.  I would think university hospitals could lead this endeavor and show us how it could be done.

Am I missing anything here?