Read an article today about Safaricom creating a domestic cloud service offering outside Nairobi in Kenya (see Chasing the African Cloud).
But this got me to thinking that cloud services may be just like mobile phones in that developing countries can use it to skip over older technologies like wired phone lines and gain advantages of more recent technology that offers similar services, the mobile phone without the need to bother with the expense and time to build telephone wires across the land.
Leapfrogging IT infrastructure buildout
In the USA, cloud computing, cloud storage, and SAAS services based in the cloud are essentially taking the place of small business IT infrastructure services today. Many small businesses skip over building their own IT infrastructures, absolutely necessary years ago for email, web services, back office processing, etc., and are moving directly to using cloud service providers for these capabilities.
In some cases, it’s even more than just the IT infrastructure, as the application, data and processing services all can be supplied from SAAS providers.
Today, it’s entirely possible to run a complete, very large business without owning a stitch of IT infrastructure (other than desktops, laptops, tablets and mobile phones) by doing this
Developing countries can show us the way
Developing countries can do much the same for their economic activity. Rather than have their small businesses spend time building out homegrown IT infrastructure just lease it out from one or more domestic (or international) cloud service providers and skip the time, effort and cost of doing it your self.
Given this dynamic, cloud service vendors ought to be focusing more time and money on developing countries. They should adopt such services more rapidly because they don’t have the sunk costs in current, private IT infrastructure and applications.
China moves into the cloud
I probably should have caught on earlier. Earlier this year I was at a vendor analyst meeting, having dinner with a colleague from the China Center for Information Industry Development (CCID) Consulting. He mentioned that Cloud was one of a select set of technologies that China was focusing considerable state and industry resources on. At the time, I just thought this was prudent thinking to keep up with industry trends. What I didn’t realize at the time was that the cloud could be a leap frog technology that would help them avoid a massive IT infrastructure build out in millions of small companies in their nation.
One can see that early adopter nations have understood that with the capabilities of mobile phones they can create a fully functioning telecommunications infrastructure almost overnight. Much the same can be done with cloud computing, storage and services.
Now if they can only get WiMAX up and running to eliminate cabling their cities for internet access.
Recently a report came out highlighted in IEEE Spectrum on a Transistor Made to Run on Protons. It seems that transistor technology used everywhere today runs on electrons or negative charges but bio-neurological systems all run on positive IONs and/or protons. This makes a proton transistor especially appealing for a biocompatible electronic interface.
Proton transistors are born
Apparently the chip is made out of “nanofibers of chitosan” originally derived from a squid. Also the device works in the presence of only high humidity and when “loaded with proton-donating acid groups”.
All sounds a bit biological to me but that’s probably the point.
It seems in the past when they tried to provide a bio-electronic interface like this they used micro-fluidics with a flow of positive IONs through a pipe. But this new approach does not use flowing liquid at all but rather just a flow of protons across an acid. The proton flow is controlled by an electrostatic potential applied to the transistors gate electrode.
Today the proton transistor has a channel width of 3.5 μm. At that size, it’s ~1000X bigger than current transistor technology (maybe even more). Which means it will be some time before they embed a proton based, 12-core Xeon processor in a brain.
Apparently the protons have a flow rate of ~5×10−3 cm2 V−1 s−1 through the transistor or by my calculations, roughly about ~1/2 bit per 100 seconds. Seems like we are going to need a lot more channels, but its only a start. For more information on the new transistor read the original article in Nature, A polysaccharide bioprotonic field-effect transistor.
But what can it do for me?
A proton transistor has the potential to interface directly with human neurons and as such, can form a biocompatible electronic interface. Such a bionanoprotonic device can conceivably create an in the brain-to-electronics interface that can bring digital information directly into a person’s consciousness. Such a capability would be a godsend to the blind, deaf and handicapped.
Of course if information can go in, it can also come out
One can imagine that such an interface can provide a portal to the web, an interface to a desktop or mobile computing device without the burden of displays, keyboards, or speakers. Such a device, when universally available, may make today’s computing paradigm look like using a manual typewriter.
This all sounds like science fiction but it feels like it just got a step closer to to reality.
But buried in the comments was one from Michael Hay (HDS) which pointed to another blog post by Andrew Huang in his bunnie’s blog (Why the best days of open hardware are ahead) where he has an almost brillant discussion on how Moore’s law will eventually peter out (~5nm) and as such, will take much longer to double transistor density. At that time, hardware customization (by small companies/startups) will once again, come to the forefront in new technology development.
Custom hardware, here now and the foreseeable future
Although it would be hard to argue against Andrew’s point nevertheless, I firmly believe there is still plenty of opportunity today to customize hardware that brings true value to the market. The fact is that Moore’s law doesn’t mean that hardware customization cannot still be worthwhile.
But will anyone be around that can do custom chip design?
Nigel Poulton commented on Chris’s post that with custom hardware seemingly going away, the infrastructure, training and people will no longer be around to support any re-invigorated custom hardware movement.
I disagree. Intel, IBM, Samsung, and many others large companies still maintain an active electronics engineering team/chip design capability, any of which are capable of creating state of the art ASICs. These capabilities are what make Moore’s law a reality and will not go away over the long run (the next 20-30 years).
The fact that these competencies are locked up in very large organizations doesn’t mean it cannot be used by small companies/startups as well. It probably does mean that these wherewithal may cost more. But the market place will deal with that in the long run, that is if the need continues to exist.
But do we still need custom hardware?
Custom hardware creates capabilities that magnify Moore’s law processing capabilities to do things that standard, off the shelf hardware cannot. The main problem with Moore’s law from a custom hardware perspective is it takes functionality that once took custom hardware yesterday (or 18 months ago) and makes it available on off the shelf components with custom software today.
This dynamic just means that custom hardware needs to keep moving, providing ever more user benefits and functionality to remain viable. When custom hardware cannot provide any real benefit over standard off the shelf components – that’s when it will die.
Andrew talks about the time it takes to develop custom ASICs and the fact that by the time you have one ready, a new standard chip has come out which doubles processor capabilities. Yes custom ASICs take time to develop, but FPGAs can be created and deployed in much less time. FPGA’s, like custom ASICs, also take advantage of Moore’s law with increased transistor density every 18 months. Yes, FPGAs may be run slower than custom ASICs, but what it lacks in processing power, it makes up in time to market.
Custom hardware has a bright future as far as I can see.
We have talked before ePathology and data growth, but Technology Review recently reported that researchers at Stanford University have used Electronic Medical Records (EMR) from multiple medical institutions to identify a new harmful drug interaction. Apparently, they found that when patients take Paxil (a depressant) and Pravachol (a cholresterol reducer) together, the drugs interact to raise blood sugar similar to what diabetics have.
Data analytics to the rescue
The researchers started out looking for new drug interactions which could result in conditions seen by diabetics. Their initial study showed a strong signal that taking both Paxil and Pravachol could be a problem.
Their study used FDA Adverse Event Reports (AERs) data that hospitals and medical care institutions record. Originally, the researchers at Stanford’s Biomedical Informatics group used AERs available at Stanford University School of Medicine but found that although they had a clear signal that there could be a problem, they didn’t have sufficient data to statistically prove the combined drug interaction.
They then went out to Harvard Medical School and Vanderbilt University and asked that to access their AERs to add to their data. With the combined data, the researchers were now able to clearly see and statistically prove the adverse interactions between the two drugs.
But how did they analyze the data?
I could find no information about what tools the biomedical informatics researchers used to analyze the set of AERs they amassed, but it wouldn’t surprise me to find out that Hadoop played a part in this activity. It would seem to be a natural fit to use Hadoop and MapReduce to aggregate the AERs together into a semi-structured data set and reduce this data set to extract the AERs which matched their interaction profile.
Then again, it’s entirely possible that they used a standard database analytics tool to do the work. After all, we were only talking about a 100 to 200K records or so.
Nonetheless, the Technology Review article stated that some large hospitals and medical institutions using EMR are starting to have database analysts (maybe data scientists) on staff to mine their record data and electronic information to help improve healthcare.
Although EMR was originally envisioned as a way to keep better track of individual patients, when a single patient’s data is combined with 1000s more patients one creates something entirely different, something that can be mined to extract information. Such a data repository can be used to ask questions about healthcare inconceivable before.
Digitized medical imagery (X-Rays, MRIs, & CAT scans), E-pathology and now EMR are together giving rise to a new form of electronic medicine or E-Medicine. With everything being digitized, securely accessed and amenable to big data analytics medical care as we know is about to undergo a paradigm shift.
Big data and eMedicine combined together are about to change healthcare for the better.
It appears that the system uses 200K disk drives to support the 120PB of storage. The disk drives are packed in a new wider rack and are water cooled. According to the news report the new wider drive trays hold more drives than current drive trays available on the market.
For instance, HP has a hot pluggable, 100 SFF (small form factor 2.5″) disk enclosure that sits in 3U of standard rack space. 200K SFF disks would take up about 154 full racks, not counting the interconnect switching that would be required. Unclear whether water cooling would increase the density much but I suppose a wider tray with special cooling might get you more drives per floor tile.
There was no mention of interconnect, but today’s drives use either SAS or SATA. SAS interconnects for 200K drives would require many separate SAS busses. With an SAS expander addressing 255 drives or other expanders, one would need at least 4 SAS busses but this would have ~64K drives per bus and would not perform well. Something more like 64-128 drives per bus would have much better performer and each drive would need dual pathing, and if we use 100 drives per SAS string, that’s 2000 SAS drive strings or at least 4000 SAS busses (dual port access to the drives).
Shared storage cluster – where GPFS front end nodes access shared storage across the backend. This is generally SAN storage system(s). But the requirements for high density, it doesn’t seem likely that the 120PB storage system uses SAN storage in the backend.
Networked based cluster – here the GPFS front end nodes talk over a LAN to a cluster of NSD (network storage director?) servers which can have access to all or some of the storage. My guess is this is what will be used in the 120PB storage system
Shared Network based clusters – this looks just like a bunch of NSD servers but provides access across multiple NSD clusters.
Given the above, with ~100 drives per NSD server means another 1U extra per 100 drives or (given HP drive density) 4U per 100 drives for 1000 drives and 10 IO servers per 40U rack, (not counting switching). At this density it takes ~200 racks for 120PB of raw storage and NSD nodes or 2000 NSD nodes.
Unclear how many GPFS front end nodes would be needed on top of this but even if it were 1 GPFS frontend node for every 5 NSD nodes, we are talking another 400 GPFS frontend nodes and at 1U per server, another 10 racks or so (not counting switching).
If my calculations are correct we are talking over 210 racks with switching thrown in to support the storage. According to IBM’s discussion on the Storage challenges for petascale systems, it probably provides ~6TB/sec of data transfer which should be easy with 200K disks but may require even more SAS busses (maybe ~10K vs. the 2K discussed above).
IBM GPFS is used behind the scenes in IBM’s commercial SONAS storage system but has been around as a cluster file system designed for HPC environments for over 15 years or more now.
Given this many disk drives something needs to be done about protecting against drive failure. IBM has been talking about declustered RAID algorithms for their next generation HPC storage system which spreads the parity across more disks and as such, speeds up rebuild time at the cost of reducing effective capacity. There was no mention of effective capacity in the report but this would be a reasonable tradeoff. A 200K drive storage system should have a drive failure every 10 hours, on average (assuming a 2 million hour MTBF). Let’s hope they get drive rebuild time down much below that.
The system is expected to hold around a trillion files. Not sure but even at 1024 bytes of metadata per file, this number of files would chew up ~1PB of metadata storage space.
GPFS provides ILM (information life cycle management, or data placement based on information attributes) using automated policies and supports external storage pools outside the GPFS cluster storage. ILM within the GPFS cluster supports file placement across different tiers of storage.
All the discussion up to now revolved around homogeneous backend storage but it’s quite possible that multiple storage tiers could also be used. For example, a high density but slower storage tier could be combined with a low density but faster storage tier to provide a more cost effective storage system. Although, it’s unclear whether the application (real world modeling) could readily utilize this sort of storage architecture nor whether they would care about system cost.
Nonetheless, presumably an external storage pool would be a useful adjunct to any 120PB storage system for HPC applications.
Can it be done?
Let’s see, 400 GPFS nodes, 2000 NSD nodes, and 200K drives. Seems like the hardware would be readily doable (not sure why they needed watercooling but hopefully they obtained better drive density that way).
It would seem that a 20X multiplier times a current Isilon cluster or even a 10X multiple of a currently supported SONAS system would take some software effort to work together, but seems entirely within reason.
Of course, IBM Almaden is working on project to support Hadoop over GPFS which might not be optimum for real world modeling but would nonetheless support the node count being talked about here.
I wish there was some real technical information on the project out on the web but I could not find any. Much of this is informed conjecture based on current GPFS system and storage hardware capabilities. But hopefully, I haven’t traveled to far astray.
IBM with the help of a Columbia, Cornell, University of Wisconsin (Madison) and University of California creates the first generation of neuromorphic chips (press release and video) which mimics the human brain’s computational architecture implemented via silicon. The chip is a result of Project SyNAPSE (standing for Systems of Neuromorphic Adaptive Plastic Scalable Electronics)
Hardware emulating wetware
Apparently the chip supports two cores one with 65K “learning” synapses and the other with ~256K “programmable” synapses. Not really sure from reading the press release but it seems each core contains 256 neuronal computational elements.
IBM’s goal is to have a trillion neuron processing engine with 100 trillion synapses occupy a 2-liter volume (about the size of the brain) and consuming less than one kilowat of power (about 500X the brains power consumption).
The IBM research team has demonstrated some typical AI applications such as simple navigation, machine vision, pattern recognition, associative memory and classification applications with the chip.
Given my history with von Neuman computing it’s kind of hard for me to envision how synapses represent “programming” in the brain. Nonetheless, wikipedia defines a synapse as a connection between any two nuerons which can take two forms electrical or chemical. A chemical synapse (wikipedia), can have different levels of strength, plasticity, and receptivity. Sounds like this might be where the programmability lies.
Just what the “learning” synapses do, how they relate to the programmatical synapses and how they do it is another question entirely.
Stay tuned, a new, non-von Neuman computing architecture was born today. Two questions to ponder
I wonder if they will still call it artificial intelligence?
An announcement this week by VMware on their vSphere 5 Virtual Storage Appliance has brought back the concept of shared DAS (see vSphere 5 storage announcements).
Over the years, there have been a few products, such as Seanodes and Condor Storage (may not exist now) that have tried to make a market out of sharing DAS across a cluster of servers.
Arguably, Hadoop HDFS (see Hadoop – part 1), Amazon S3/cloud storage services and most scale out NAS systems all support similar capabilities. Such systems consist of a number of servers with direct attached storage, accessible by other servers or the Internet as one large, contiguous storage/file system address space.
Why share DAS? The simple fact is that DAS is cheap, its capacity is increasing, and it’s ubiquitous.
Shared DAS system capabilities
VMware has limited their DAS virtual storage appliance to a 3 ESX node environment, possibly lot’s of reasons for this. But there is no such restriction for Seanode Exanode clusters.
On the other hand, VMware has specifically targeted SMB data centers for this facility. In contrast, Seanodes has focused on both HPC and SMB markets for their shared internal storage which provides support for a virtual SAN on Linux, VMware ESX, and Windows Server operating systems.
Although VMware Virtual Storage Appliance and Seanodes do provide rudimentary SAN storage services, they do not supply advanced capabilities of enterprise storage such as point-in-time copies, replication, data reduction, etc.
But, some of these facilities are available outside their systems. For example, VMware with vSphere 5 will supports a host based replication service and has had for some time now software based snapshots. Also, similar services exist or can be purchased for Windows and presumably Linux. Also, cloud storage providers have provided a smattering of these capabilities from the start in their offerings.
Although distributed DAS storage has the potential for high performance, it seems to me that these systems should perform poorer than an equivalent amount of processing power and storage in a dedicated storage array. But my biases might be showing.
On the other hand, Hadoop and scale out NAS systems are capable of screaming performance when put together properly. Recent SPECsfs2008 results for EMC Isilon scale out NAS system have demonstrated very high performance and Hadoops claim to fame is high performance analytics. But you have to throw a lot of nodes at the problem.
In the end, all it takes is software. Virtualizing servers, sharing DAS, and implementing advanced storage features, any of these can be done within software alone.
However, service levels, high availability and fault tolerance requirements have historically necessitated a physical separation between storage and compute services. Nonetheless, if you really need screaming application performance and software based fault tolerance/high availability will suffice, then distributed DAS systems with co-located applications like Hadoop or some scale out NAS systems are the only game in town.
There was some twitter traffic yesterday on how Facebook was locked into using MySQL (see article here) and as such, was having to shard their MySQL database across 1000s of database partitions and memcached servers in order to keep up with the processing load.
The article indicated that this was painful, costly and time consuming. Also they said Facebook would be better served moving to something else. One answer was to replace MySQL with recently emerging, NewSQL database technology.
One problem with old SQL database systems is they were never architected to scale beyond a single server. As such, multi-server transactional operations was always a short-term fix to the underlying system, not a design goal. Sharding emerged as one way to distribute the data across multiple RDBMS servers.
Relational database tables are sharded by partitioning them via a key. By hashing this key one can partition a busy table across a number of servers and use the hash function to lookup where to process/access table data. An alternative to hashing is to use a search lookup function to determine which server has the table data you need and process it there.
In any case, sharding causes a number of new problems. Namely,
Cross-shard joins – anytime you need data from more than one shard server you lose the advantages of distributing data across nodes. Thus, cross-shard joins need to be avoided to retain performance.
Load balancing shards – to spread workload you need to split the data by processing activity. But, knowing ahead of time what the table processing will look like is hard and one weeks processing may vary considerably from the next weeks load. As such, it’s hard to load balance shard servers.
Non-consistent shards – by spreading transactions across multiple database servers and partitions, transactional consistency can no longer be guaranteed. While for some applications this may not be a concern, traditional RDBMS activity is consistent.
These are just some of the issues with sharding and I am certain there are more.
What about Hadoop projects and its alternatives?
One possibility is to use Hadoop and its distributed database solutions. However, Hadoop systems were not intended to be used for transaction processing. Nonetheless, Cassandra and HyperTable (see my post on Hadoop – Part 2) can be used for transaction processing and at least Casandra can be tailored to any consistency level. But both Cassandra and HyperTable are not really meant to support high throughput, consistent transaction processing.
Also, the other, non-Hadoop distributed database solutions support data analytics and most are not positioned as transaction processing systems (see Big Data – Part 3). Although Teradata might be considered the lone exception here and can be a very capable transaction oriented database system in addition to its data warehouse operations. But it’s probably not widely distributed or scaleable above a certain threshold.
The problems with most of the Hadoop and non-Hadoop systems above mainly revolve around the lack of support for ACID transactions, i.e., atomic, consistent, isolated, and durable transaction processing. In fact, most of the above solutions relax one or more of these characteristics to provide a scaleable transaction processing model.
NewSQL to the rescue
There are some new emerging database systems that are designed from the ground up to operate in distributed environments called “NewSQL” databases. Specifically,
Clustrix – is a MySQL compatible replacement, delivered as a hardware appliance that can be distributed across a number of nodes that retains fully ACID transaction compliance.
NimbusDB – is a client-cloud based SQL service which distributes copies of data across multiple nodes and offers a majority of SQL99 standard services.
VoltDB – is a fully SQL compatible, ACID compliant, distributed, in-memory database system offered as a software only solution executing on 64bit CentOS system but is compatible with any POSIX-compliant, 64bit Linux platform.
Xeround – is a cloud based, MySQL compatible replacement delivered as a (Amazon, Rackspace and others) service offering that provides ACID compliant transaction processing across distributed nodes.
I might be missing some, but these seem to be the main ones today. All the above seem to take a different tack to offer distributed SQL services. Some of the above relax ACID compliance in order to offer distributed services. But for all of them distributed scale out performance is key and they all offer purpose built, distributed transactional relational database services.
RDBMS technology has evolved over the last century and have had at least ~35 years of running major transactional systems. But todays hardware architecture together with web scale performance requirements stretch these systems beyond their original design envelope. As such, NewSQL database systems have emerged to replace old SQL technology, with a new, intrinsically distributed system architecture providing high performing, scaleable transactional database services for today and the foreseeable future.