Disk density hits new record, 1Tb/sqin with HAMR

Seagate has achieved 1Tb/sqin recording (source: http://www.gizmag.com)

Seagate has achieved 1Tb/sqin recording (source: http://www.gizmag.com)

Well I thought 36TB on my Mac was going to be enough.  Then along comes Seagate with this weeks announcement of reaching 1Tb/sqin (1 Trillion bits per square inch) using their new HAMR (heat assisted magnetic recording) technology.

Current LFF drive technology runs at about 620Gb/sqin providing a  3.5″ drive capacity of around 3TB or about 500Gb/sqin for 2.5″ drives supporting ~750GB.  The new 1Tb/sqin drives will easily double these capacities.

But the exciting part is that with the new HAMR or TAR (thermally assisted recording) heads and media, the long term potential is even brighter.  This new technology should be capable of 5 to 10Tb/sqin which means 3.5″ drives of 30 to 60TB and 2.5″ drives of 10 t0 20TB.

HAMR explained

HAMR uses both lasers and magnetic heads to record data in even smaller spaces than current PMR (perpendicular magnetic recording) or vertical recording heads do today.   You may recall that PMR was introduced in 2006 and now, just 6 years later we are already seeing the next generation head and media technologies in labs.

Denser disks requires smaller bits and with smaller bits disk technology runs into three problems readability, writeability and stability, AKA the magnetic recording trilemma.  Smaller bits require better stability, but better stability makes it much harder to write or change a bits magnetic orientation.  Enter the laser in HAMR, with laser heating the bits can become much more maleable.  These warmed bits can be more easily written bypassing the stability-writeability problem, at least for now.

However, just as in any big technology transition there are other competing ideas with the potential to win out.  One possibility we have discussed previously is shingled writes using bit patterned media (see my Sequential only disk post) but this requires a rethinking/re-architecting of disk storage.  As such, at best it’s an offshoot of today’s disk technology and at worst, it’s a slight detour on the overall technology roadmap.

Of course PMR is not going away any time soon. Other vendors (and proboblf Seagate) will continue to push PMR technology as far as it can go.  After all, it’s a proven technology, inside millions of spinning disks today.  But, according to Seagate, it can achieve 1Tb/sqin but go no further.

So when can I get HAMR disks

There was no mention in the press release as to when HAMR disks would be made available to the general public, but typically the drive industry has been doubling densities every 18 to 24 months.  Assuming they continue this trend across a head/media technology transition like HAMR, we should have those 6GB hard disk drives sometime around 2014, if not sooner.

HAMR technology will likely make it’s first appearance in 72oorpm drives.  Bigger capacities seem to always first come out in slower performing disks (see my Disk trends, revisited post)

HAMR performance wasn’t discussed in the Seagate press release, but with 2Mb per linear track inch and 15Krpm disk drives, the transfer rates would seem to need to be on the order of at least 850MB/sec at the OD (outer diameter) for read data transfers.

How quickly HAMR heads can write data is another matter. The fact that the laser heats the media before the magnetic head can write it seems to call for a magnetic-plus-optical head contraption where the laser is in front of the magnetics (see picture above).

How long it takes to heat the media to enable magnetization is one critical question in write performance. But this could potential be mitigated by the strength of the laser pulse and how far the  laser has to be in front of the recording head.

With all this talk of writing, there hasn’t been lots of discussion on read heads. I guess everyone’s assuming the current PMR read heads will do the trick, with a significant speed up of course, to handle the higher linear densities.

What’s next?

As for what comes after HAMR, checkout another post I did on using lasers to magnetize (write) data (see Magnetic storage using lasers alone).  The advantage of this new “laser-only” technology was a significant speed up in transfer speeds.  It seems to me that HAMR could easily be an intermediate step on the path to laser-only recording having both laser optics and magnetic recording/reading heads in one assembly.

~~~~

Lets see 6TB in 2014, 12TB in 2016 and 24TB in 2018, maybe I won’t need that WD Thunderbolt drive string as quickly as I thought.

Comments?

 

 

Posted in Disk storage, Storage density, Strategic Inflection Points | Tagged , , , , , , , , , , | Leave a comment

NSA’s huge (YBs) new data center to turn on in 2013

 

National_Security_Agency_seal

National_Security_Agency_seal

Ran across a story in Wired about the new NSA Utah data center today which is scheduled to be operational in September of 2013.

This new data center is intended to house copies of all communications intercepted the NSA.  We have talked about this data center before and how it’s going to store YB of data (See my Yottabytes by 2015?! post).

One major problem with having a YB of communications intercepts is that you need to have multiple copies of it for protection in case of human or technical error.

Apparently, NSA has a secondary data center to backup its Utah facility in San Antonio.   That’s one copy.   We also wrote another post on protecting and indexing all this data (see my Protecting the Yottabyte Archive post)

NSA data centers

The Utah facility has enough fuel onsite to power and cool the data center for 3 days.  They have a special power station to supply the 65MW of power needed.   They have two side by side raised floor halls for servers, storage and switches, each with 25K square feet of floor space. That doesn’t include another 900K square feet of technical support and office space to secure and manage the data center.

In order to help collect and temporarily storage all this information, apparently the agency has been undergoing a data center building boom, renovating and expanding their data centers throughout the states.  The article discusses some of other NSA information collection points/data centers, in Texas, Colorado, Georgia, Hawaii, Tennessee, and of course,  Maryland.

New NSA super computers

In addition to the communication intercept storage, the article also talks about a special purpose, decrypting super computer that NSA has invented over the past decade which will also be housed in the Utah data center.  The NSA seems to have created a super powerful computer that dwarfs the current best Cray XT5 super computer clusters that operate at 1.75 petaflops available today.

I suppose what with all the encrypted traffic now being generated, NSA would need some way to decrypt this information in order to understand it.  I was under the impression that they were interested in the non-encrypted communications, but I guess NSA is even more interested in any encrypted traffic.

Decrypting old data

With all this data being stored, the thought is that the data now encrypted with unbreakable AES-128, -192 or -256 encryption will eventually become decypherable.  At that time, foriegn government and other secret communications will all be readable.

By storing this secret communications now, they can scan this treasure trove for patterns that eventually occur and once found, such patterns will ultimately lead to decrypting the data.  Now we know why they need YB of storage.

So NSA will at least know what was going on in the past.  However, how soon they can move that up to do real time decryption of communications today is another question.  But knowing the past, may help in understanding what’s going on today.

~~~~

So be careful what you say today even if it’s encrypted.  Someone (NSA and its peers around the world) will probably be listening in and someday soon, will understand every word that’s been said.

Comments?

Posted in Data index, data protection, Data security, Distributed computing, Information economy, Storage density, Strategic Inflection Points, System effectiveness | Tagged , , , , , , | Leave a comment

What to do with 36TB on my Mac?

(Back of) Western Digital's Thunderbolt Duo (from their website)

(Back of) Western Digital's Thunderbolt Duo (from their website)

Western Digital (WD) just released their new Digital MyBook Thunderbold Duo the other day and it features 2-2TB or -3TB disks and of course you can daisy chain up to 6 of these together just in case, for up to 36TB on a Mac.

I have been happy with my desktop storage which has been running about 80% full.  Plus I have a 1TB time machine external drive for online backups which I use more than I care to admit.  But what the heck am I going to do with 36TB.

Enter Apple TV

Well, now that the new Apple TV is out and it supports 1080p video that problem might be solved.   I am starting to think of transfering my entire DVD/BlueRay collection to digital format and loading it all on iTunes. That way I could use Airplay and Apple TV to play it to a TV.

This is where the 6 to 36TB of storage could come in handy.  Especially if I wasn’t interested in streaming everything off of iCloud and having a local iTunes repository onsite for all my videos.

Digital video for the iPad

Today, I don’t have a lot of videos on my desktop, mostly ones I wanted to view on my  iPad so, they are highly compressed and only take up about 1GB per video (Handbrake encoded from DVDs).

I am thinking the new 1080p iTunes encoded videos would take up more space at least 4-5GB per video but would still be considerably better than 9GB for DVD and ~36GB for BluRay, high definition videos.

Given current storage I could probably handle converting my current iPad videos over to the 1080p version (if I actually owned them in hi-def) but if I wanted to put the rest of my video library on my desktop I don’t have enough space.

Bulk storage meet the Mac

Then WD came out with their new Thunderbolt Duo drives.  It seems to have it all, Thunderbolt I/O at 10Gbps, with all the storage I could possibly need.  Presumably the 2 or 3TB drives are 5400 or 7200 SATA 3.0 drives.  But they are user swappable, so could concievably be changed out to whatever comes out next but probably in pairs.

Of course with SATA 3.0 they can only go 6Gbps to the disks, but it’s not a bad match to have 2 drives per single bi-directional Thunderbolt channel.  Although whether 6 of these  daisy chained on a single Thunderbolt cable would generate decent performance is another question.  Then again, how much performance can one Mac use?

I suppose my next steps are to upgrade my Mac to hardware that supports Thunderbolt, get Apple TV, buy a Duo drive or two and then start encoding my DVD/BluRay library.

But that’s too logical, instead maybe I’ll just get Apple TV and give iCloud a try, at least for awhile and save the WD Duo for the next evolution.  Maybe by then WD have come out with their 4TB drives, providing 8TB per Duo.

Comments?

Posted in Data density, Disk storage, File Storage, Strategic Inflection Points | Tagged , , , , , , , , , | 1 Comment

Archeology meets Big Data

Polynya off the Antarctic Coast by NASA Earth Observatory (cc) (From Flickr)

Polynya off the Antarctic Coast by NASA Earth Observatory (cc) (From Flickr)

Read an article yesterday about the use of LIDAR (light detection and ranging, Wikipedia) to map the residues of an pre-columbian civilization in Central America, the little know Purepecha empire, peers of the Aztecs.

The original study (seeLIDAR at Angamuco) cited in the piece above was a result of the Legacies of Resilience project sponsored by Colorado State University (CSU) and goes into some detail about the data processing and archeological use of the LIDAR maps.

Why LIDAR?

LIDAR sends a laser pulse from an airplane/satellite to the ground and measures how long it takes to reflect back to the receiver. With that information and “some” data processing, these measurements can be converted to an X, Y, & Z coordinate system or detailed map of the ground.

The archeologists in the study used LIDAR to create a detailed map of the empire’s main city at a resolution of +/- 0.25m (~10in). They mapped about ~207 square kilometers (80 square miles) at this level of detail. In 4 days of airplane LIDAR mapping, they were able to gather more information about the area then they were able to accumulate over 25 years of field work. Seems like digital archeology was just born.

So how much data?

I wanted to find out just how much data this was but neither the article or the study told me anything about the size of the LIDAR map. However, assuming this is a flat area, which it wasn’t, and assuming the +/-.25m resolution represents a point every 625sqcm, then the area being mapped above should represent a minimum of ~3.3 billion points of a LIDAR point cloud.

Another paper I found (see Evaluation of MapReduce for Gridding LIDAR Data) said that a LIDAR “grid point” (containing X, Y & Z coordinates) takes 52 bytes of data.

Given the above I estimate the 207sqkm LIDAR grid point cloud represents a minimum of ~172GB of data. There are LIDAR compression tools available, but even at 50% reduction, it’s still 85GB for 210sqkm.

My understanding is that the raw LIDAR data would be even bigger than this and the study applied a number of filters against the LIDAR map data to extract different types of features which of course would take even more space. And that’s just one ancient city complex.

With all the above the size of LIDAR raw data, grid point fields, and multiple filtered views is approaching significance (in storage terms). Moving and processing all this data must also be a problem. As evidence, the flights for the LIDAR runs over Angamuco, Mexico occurred in January 2011 and they were able to analyze the data sometime that summer, ~6 months late. Seems a bit long from my perspective maybe the data processing/analysis could use some help.

Indiana Jones meets Hadoop

That was the main subject of the second paper mentioned above done by researchers at the San Diego Supercomputing Center (SDSC). They essentially did a benchmark comparing MapReduce/Hadoop running on a relatively small cluster of 4 to 8 commodity nodes against an HPC cluster (running 28-Sun x4600M2 servers, using 8 processor, quad core nodes, with anywhere from 256 GB to 512GB [only on 8 nodes] of DRAM running a C++ implementation of the algorithm.

The results of their benchmarks were that the HPC cluster beat the Hadoop cluster only when all of the LIDAR data could fit in memory (on a DRAM per core basis), after that the Hadoop cluster performed just as well in elapsed wall clock time. Of course from a cost perspective the Hadoop cluster was much more economical.

The 8-node, Hadoop cluster was able to “grid” a 150M LIDAR derived point cloud at the 0.25m resolution in just a bit over 10 minutes. Now this processing step is just one of the many steps in LIDAR data analysis but it’s probably indicative of similar activity occurring earlier and later down the (data) line.

~~~~

Let’s see 172GB per 207sqkm, the earth surface is 510Msqkm, says a similar resolution LIDAR grid point cloud of the entire earth’s surface would be about 0.5EB (Exabyte, 10**18 bytes). It’s just great to be in the storage business.

 

Posted in Data analytics, Data growth, Data reduction, Distributed computing, Information economy, Strategic Inflection Points | Tagged , , , , , , , | Leave a comment

Super Talent releases a 4-SSD, RAIDDrive PCIe card

RAIDDrive UpStream (c) 2012 Super Talent (from their website)

RAIDDrive UpStream (c) 2012 Super Talent (from their website)

Not exactly sure what is happening, but PCIe cards are coming out containing multiple SSD drives.

For example, the recently announced Super Talent RAIDDrive UpStream card contains 4 SAS embedded SSDs that can push storage capacity up to almost a TB of MLC NAND.   They have an optional SLC version but there were no specs provided on this.

It looks like the card uses an LSI RAID controller and SANDforce NAND controller.  Unlike the other RAIDDrive cards that support RAID5, the UpStream can be configured with RAID 0, 1 or 1E (sort of RAID 1 only striped across even or odd drive counts) and currently supports capacities of 220GB, 460GB or 960GB total.

Just like the rest of the RAIDDrive product line, the UpStream card is PCIe x8 connected and requires host software (drivers) for Windows, NetWare, Solaris and other OSs but not for “most Linux distributions”.  Once the software is up, the RAIDDrive can be configured and then accessed just like any other “super fast” DAS device.

Super Talent’s data sheet states UpStream performance at are 1GB/sec Read and 900MB/Sec writes. However, I didn’t see any SNIA SSD performance test results so it’s unclear how well performance holds up over time and whether these performance levels can be independently verified.

It seems just year ago that I was reviewing Virident’s PCIe SSD along with a few others at Spring SNW.   At the time, I thought there were a lot of PCIe NAND cards being shown at the show.  Given Super Talent’s and the many other vendors sporting PCIe SSDs today, there’s probably going to be a lot more this time.

No pricing information was available.

~~~~

Comments?

Posted in SSD storage, Storage, System effectiveness | Tagged , , , , , , , , , | Leave a comment

DNA computing and the end of natural evolution

DNA Molecule Arrangement in the Chip (from http://dnacomputing.design.officelive.com)

DNA Molecule Arrangement in the Chip (from http://dnacomputing.design.officelive.com)

Read an article the other day in the Economist on how researchers are now performing computation using DNA.  The intent is to someday come up with small biologic computers that can be inserted into cells/organisms which can cure or kill cells that are in trouble and leave the rest alone.

Computing soup?!

Research in the area of molecular computing has been going on since 1994, when a scientist created a DNA based solution to compute an answer to a specified traveling salesman problem.

In those days the answer was derived from running a centrifuge on the end-product soup of DNA strings and extracting the answer from the resultant gel matrix.

Molecular computing redefined

Since then, there has been significant improvements in DNA computing.  Currently, most are based on DNA strand displacement.  Today’s molecular computers consists of free floating DNA or RNA snippets.  A logic gate is made up of two strands, one of which is the “computational logic” and the other an “output signal”.  In addition to the logic gate there is another DNA/RNA strand which is an “input signal” or almost like input data.  Input signals are matched up to a specific logic gate and cause the output signal snippet to be detached creating yet another input signal for other computations cascading down the pipeline.

DNA-RNA based digital logic

2-bit_ALU (from wikimedia.org)

2-bit_ALU (from wikimedia.org)

By doing all this, researchers have been able to create DNA snippets that perform various logical computing operations such as AND, OR and NOT logic gates and producing the signal pathways to connect them in a computational sequence or “program”.

The molecular automata all looks like elementary electronic circuits made up of base level logic gates logic to me but just as in electronic digital logic it seems to gets the job done.  One gets a computation done by adding 1000′s of copies of the logic gates and input sequences together and some how assaying the end result many hours later.

Using these capabilities, they have created DNA programs made up of 74 different DNA strands that could calculate the square roots of 4 digit numbers.

Next, they tied an artificial neuron to fire when input signals hit a certain level together with a soup of 114 different DNA strands to do rudimentary pattern recognition.  They used then “programed” their DNA neural net to recognize Yes/No answers provided by different  scientists.  The report said that the neural net, was able to get the correct answer every time but took 8 hours to perform the calculations.

There are a couple of groups working on a programming language and a simulator tool for DNA or molecular computing called the DNA Strand Displacement (DSD) tool.

The report went on to say how another set of researchers were fabricating synthetic genes which when introduced into cell could be used to trick the cell into producing the cellular computer itself.

The end of natural evolution?

The end game for all this is to create a computational device that can somehow be injected into tissue cells which would identify “sick” cells then cure or destroy them.

A couple of years ago, I was waiting in a doctor’s office for something or another and penned a poem on the end of human evolution involving ECC combined with DNA.  (No, you can’t see the poem.)

You see in computers today there is a computational device called an ECC or error correcting code which is a circuit and a special code word that can be appended to a sequence of data that together can then be used to correct for errors in transmission or storage of that data.

Once someone can build digital logic out of DNA-RNA, it’s not a big leap to have build an ECC circuit.  Once the circuit is ready, anyone could potentially have their DNA modified to have an appropriate ECC codeword appended to it.  With DNA + ECC code word and an active ECC circuit in the cell, it’s quite possible than any single, double, or triple mutation could be detected and fixed inside a cell.  Of course ECC can go beyond triple error detection if needed.  Also, Reed-Solomon and other erasure codes can even go much beyond that.

After such a device was incorporated into the human genome, it would seem to signal the end to natural evolution, at least for humans.

~~~~

Comments?

Posted in Distributed computing | Tagged , , , , , , , , , , , | Leave a comment

Forgetting is important and other news from cognitive research

A female student reading a Serbian contract law book, her face is obscured by the book

Study time by Stanković Vlada

It turns out retrieval is more important (at least for the brain) than storage.

Recent research from cognitive scientists such as Robert Bjork at the UCLA Learning & Forgetting lab have shown that most of what we think we know about learning is wrong.  (See Learning and Forgetting Lab,  Getting it wrong, UCLA Learning and Forgetting Lab for more).

 

The researchers have been testing people to see which approaches are better to recalling some information they were trying to study. They found that the key to studying and actually remembering better is working on better retrieval not better storage.

It’s somewhat interesting that the scientists aren’t talking about learning as much as retrieval of information.  Almost as if learning were actually the equivalent to information retrieval.

Stop studying the same items over and over again, just try something different

It seems that studying a single item over and over again is the wrong way to try to learn something.  A better way is to vary your studying, to examine different but related items, which somehow lets you better classify the information and provides more accessible paths for retrieving that data.

Stop studying in the same place, go someplace else

Further guidance is when trying to learn something new vary the location, decor, or any other characteristic of the environment you are trying to study in.  The key here is that these other locations add another tag/handle/indexes to the data and the more indexing, the better for retrieval success.

Stop studying, start testing

An additional way to remember better is trying to retrieve information early and often, even if it doesn’t work.  It appears that the more you try to recall, some tidbit of information, irregardless of success, the stronger the access path is burned into your brain.  So that the next time you try to recollect that information, it becomes much easier to do.  In fact, the suggestion is to try to test yourself after learning something new, right away, sort of retrieval exercise without studying it.  Struggling to recollect something helps?!

Stop taking notes during class, start taking them afterwards

Following on in that vein yet almost unbelievable, is another recommendation to abandon note taking altogether and rather, spend time after class to summarize (exercising that retrieval path again) what you were taught.  The important part is to do this immediately afterwards.  (Don’t tell my kids!)

Stop studying continually, wait before you study again

Moreover, another suggestion is to wait before you study something again. It seems if you study something too soon after having just studied it, you are not exercising that recall path well enough. Rather, they advocate waiting around a couple of days/weeks before studying something again to remember it better.  Struggling to recall information is better for remembering it than having an easy time of it.

With (relatively) infinite storage, forgetting is important

Finally, the cognitive scientists seem to think that forgetting is almost as important as remembering.  From a storage perspective, it appears that the brain has an unlimited capacity to store information.  But the downside is that any retrieval takes time and effort (something akin to searching through a bunch of indexes).

What we really want is to be better able to retrieve information that’s important.  Keeping all that extraneous junk readily recallable just slows down the retrieval of the really good stuff.  So forgetting helps purge un-needed access paths/tags/indexes freeing up space for what needs to be remembered.

~~~~

Gosh, and to think all along all those illegible notes I took in college (and still do) really did help me learn!?

Comments?

Posted in Cognitive science | Tagged , , , , , | Leave a comment

eBay cools Phoenix data center with hot water from the desert

Two people talking to one another in a data center hallway about one person wide with bunches of racks and cabling on either side

Microsoft Bing Maps' datacenter by Robert Scoble

Read a report today about how eBay was cooling their new data center outside Phoenix with hot water at desert warmed 86F (30C) temperatures (see Breaking new ground on data center efficiency).

And to literally top it all off, they are running data center containers on the roof which they claim have a Green Grid’s PUE™ (Power Use Efficiency) of 1.044 in summer with servers at maximum load.  Now this doesn’t count some of the transformers and other power conditioning that is needed but is still impressive nevertheless.

The average for the whole data center a PUE of 1.35 is not the best in the industry but considerably better than average.  We have talked about green data centers before with a NetApp data center having an expected PUE of 1.2 (see Building a green data center).  One secret to these PUE’s is running the servers at hotter than normal temperatures.

New data center designed, servers and other equipment selected with PUE in mind

This is a data center consolidation project so they were also able to start with a blank sheet of paper.  They started by reducing the number of server types down to two, one for high performance computing and the other for big data analytics (Hadoop cluster).  Both sets of servers were selected with power efficiency in mind.  Another server capability requested by eBay was the ability to dynamically change server clock speed so it could idle or speed up servers as demand dictated. In this way they could turn down servers sheding power consumption and/or turn up servers to peak performance, remotely.

The data center cooling was designed with two independent loops, one a traditional  standard air conditioned loop that delivered water at 55F(13C) and the other, a hot water loop that delivered hot water 86F(30C), using water from a cooling tower exposed to the desert air.

eBay started out thinking they would use the air conditioned loop more often in the summer months and less often in winter. But in the end they found they could get by with just using the hot water loop year round and use the cold water loop for some spot cooling, where necessary.

Data center containers on a hot roof

Also the building was specially built to be able to support up to 12-data center containers on the roof.  There were over 4920 servers deployed in three containers currently on the roof and one container of 1500 servers was lifted from the truck and in place in 22 minutes. The containers were designed for direct exposure the desert environment (up tho 122F or 50C) and were cooled using adiabatic cooling.

More details are available in the Green Grid report.

~~~~~~

I wonder what they do when they have to swap out components, especially in the containers – maybe they only do this in winter;)

Comments?

Posted in Distributed computing, Energy efficiency | Tagged , , , , , , , , | Leave a comment