Peak server, the cloud & NetApp storage for AWS

I was at a conference a month or so ago and one speaker mentioned that the number of x86 servers being sold has peaked and is dropping. I can imagine a number of reasons for this and the main one being server virtualization. But this speaker had a different view and it seemed to be the cloud.

Peak server is here.

He said that three companies were purchasing over 1/2 the x86 servers these days. I feel that there should be at least four Google, Facebook, Amazon & Microsoft and maybe five, if you add in Apple.

Something has happened over the past year or so. Enterprise IT has continued along its merry way but the adoption of cloud services is starting to take off.

I have seen this before, with mainframes, then mini-computers, and now client-server. Minicomputers came out and were so easy to use and develop/deploy applications on, that people stopped creating new apps on the mainframe. Mainframes never died out, and probably have never really stopped shipping increasing MIPS every year. But the share of WW MIP installations for mainframes has been shrinking for decades and have never got going again.

Ultimately, the proprietary minicomputer was just a passing fad and only lasted about 25 years or so. It was wounded by the PC, and then killed off by proprietary Unix workstations.

Then it happened again, the new upstart this time was Windows Server and Linux. Once again it was just easier to build apps on these new and cheaper servers, than any of the older Unix servers. Of course there’s still plenty of business in proprietary Unix servers, but again I would venture to say that their share of WW installed MIPS has been shrinking for a long time.

Nowadays, the cloud is mortally wounding the server market. Server virtualization is helping a lot but it’s also enabling the cloud to eliminate many physical server sales. This is because new applications, new IT environments are being ported/moved/deployed onto the cloud.

Peak server means less enterprise networking, storage and server hardware

In this new, cloud world, customers need less servers, less networking and less enterprise class storage. Yes not every application is suitable to cloud deployment but that’s why there’s still mainframes, still Unix servers, and a continuing need for standalone, physical or virtual x86 servers in the enterprise. But their share of MIPs will start shrinking soon if it hasn’t already.

Ok, so enterprise data center share of MIPs will start shrinking vis a vis cloud data centers. But what happens to networking and storage. My view is that networking becomes software defined and there’s a component of that which operates on special purpose hardware. This will increase in shipments but the more complex, enterprise class networking equipment will flatline and never see any more substantial growth.

And up until yesterday I felt much the same about enterprise class storage. Software defined storage in my future, DAS and SSDs for the capacity and the smarts exist in software if at all. Today, most of the cloud and many service providers have been moving off enterprise class storage and onto DAS.

NetApp’s new enterprise storage in AWS

But yesterday I heard about NetApp private storage for the cloud. This is a configuration of NetApp storage installed in a CoLo facility with a “direct connection” to Amazon compute cloud. In this way, enterprise customers can maintain data stewardship/ownership/governance over their data while at the same time deploying applications onto AWS compute cloud.

This seems to be one of the sticking points to enterprise customers adopting the cloud. By having (data) storage owned lock/stock&barrel by the enterprise it seems much easier and less risky to deploy new and old applications to the cloud.

Whether this pans out and can provide enough value to cover the added expense of the enterprise class storage, only the market can decide. But this is the first time I can remember, where any vendor has articulated a role for enterprise class storage in the cloud. Let’s hope it works.

Image: PDP8/s by ajmexico

Posted in Cloud services, Cloud storage, DAS, Data, data protection, Networking, Server virtualization, Software Defined Network, SSD storage, Storage, Storage Features, Strategic Inflection Points, Strategy, Systems | Tagged , , , , | 9 Comments

Windows Server 2012 R2 storage changes announced at TechEd

Microsoft TechEd Trends driving IT todayMicrosoft TechEd USA is this week and they announced a number of changes to the storage services that come with Windows Server 2012 R2

  • Azure DRaaS - Microsoft is attempting to democratize DR by supporting a new DR-as-a-Service (DRaaS).  They now have an Azure service that operates in conjunction with Windows Server 2012 R2 that provides orchestration and automation for DR site failover and fail back to/from remote sites.  Windows Server 2012 R2 uses Hyper-V Replica to replicate data across to the other site. Azure DRaaS supports DR plans (scripts) to identify groups of Hyper-V VMs which need to be brought up and their sequencing. VMs within a script group are brought up in parallel but different groups are brought up in sequence.  You can have multiple DR plans, just select the one to execute. You must have access to Azure to use this service. Azure DR plans can pause for manual activities and have the ability to invoke PowerShell scripts for more fine tuned control.  There’s also quite a lot of setup that must be done, e.g. configure Hyper-V hosts, VMs and networking at both primary and secondary locations.  Network IP injection is done via mapping primary to secondary site IP addresses. The Azzure DRaaS really just provides the orchestration of failover or fallback activity. Moreover, it looks like Azure DRaaS is going to be offered by service providers as well as private companies. Currently, Azure’s DRaaS has no support for SAN/NAS replication but they are working with vendors to supply an SRM-like API to provide this.
  • Hyper-V Replica changes – Replica support has been changed from a single fixed asynchronous replication interval (5 minutes) to being able to select one of 3 intervals: 15 seconds; 5 minutes; or 30 minutes.
  • Storage Spaces Automatic Tiering - With SSDs and regular rotating disk in your DAS (or JBOD) configuration , Windows Server 2012 R2 supports automatic storage tiering. At Spaces configuration time one dedicates a certain portion of SSD storage to tiering.  There is a scheduled Windows Server 2012 task which is then used to scan the previous periods file activity and identify which file segments (=1MB in size) that should be on SSD and which should not. Then over time file segments are moved to an  appropriate tier and then, performance should improve.  This only applies to file data and files can be pinned to a particular tier for more fine grained control.
  • Storage Spaces Write-Back cache - Another alternative is to dedicate a certain portion of SSDs in a Space to write caching. When enabled, writes to a Space will be cached first in SSD and then destaged out to rotating disk.  This should speed up write performance.  Both write back cache and storage tiering can be enabled for the same Space. But your SSD storage must be partitioned between the two. Something about funneling all write activity to SSDs just doesn’t make sense to me?!
  • Storage Spaces dual parity - Spaces previously supported mirrored storage and single parity but now also offers dual parity for DAS.  Sort of like RAID6 in protection but they didn’t mention the word RAID at all.  Spaces dual parity does have a write penalty (parity update) and Microsoft suggests using it only for archive or heavy read IO.
  • SMB3.1 performance improvements of ~50% – SMB has been on a roll lately and R2 is no exception. Microsoft indicated that SMB direct using a RAM DISK as backend storage can sustain up to a million 8KB IOPS. Also, with an all-flash JBOD, using a mirrored Spaces for backend storage, SMB3.1 can sustain ~600K IOPS.  Presumably these were all read IOPS.
  • SMB3.1 logging improvements - Changes were made to SMB3.1 event logging to try to eliminate the need for detail tracing to support debug. This is an ongoing activity but one which is starting to bear fruit.
  • SMB3.1 CSV performance rebalancing - Now as one adds cluster nodes,  Cluster Shared Volume (CSV) control nodes will spread out across new nodes in order to balance CSV IO across the whole cluster.
  • SMB1 stack can be (finally) fully removed - If you are running Windows Server 2012, you no longer need to install the SMB1 stack.  It can be completely removed. Of course, if you have some downlevel servers or clients you may want to keep SMB1 around a bit longer but it’s no longer required for Server 2012 R2.
  • Hyper-V Live Migration changes - Live migration can now take advantage of SMB direct and its SMB3 support of RDMA/RoCE to radically speed up data center live migration. Also, Live Migration can now optionally compress the data on the current Hyper-V host, send compressed data across the LAN and then decompress it at target host.  So with R2 you have three options to perform VM Live Migration traditional, SMB direct or compressed.
  • Hyper-V IO limits - Hyper-V hosts can now limit the amount of IOPS consumed by each VM.  This can be hierarchically controlled providing increased flexibility. For example one can identify a group of VMs and have a IO limit for the whole group, but each individual VM can also have an IO limit, and the group limit can be smaller than the sum of the individual VM limits.
  • Hyper-V supports VSS backup for Linux VMs – Windows Server 2012 R2 has now added support for non-application consistent VSS backups for Linux VMs.
  • Hyper-V Replica Cascade Replication - In Windows Server 2012, Hyper V replicas could be copied from one data center to another. But now with R2 those replicas at a secondary site can be copied to a third, cascading the replication from the first to the second and then the third data center, each with their own replication schedule.
  • Hyper-V VHDX file resizing – With Windows Server 2012 R2 VHDX file sizes can now be increased or reduced for both data and boot volumes.
  • Hyper-V backup changes - In previous generations of Windows Server, Hyper-V backups took two distinct snapshots, one instantaneously and the other at quiesce time and then the two were merged together to create a “crash consistent” backup. But with R2, VM backups only take a single snapshot reducing overhead and increasing backup throughput substantially.
  • NVME support - Windows Server 2012 R2 now ships with a Non-Volatile Memory Express (NVME) driver for PCIe flash storage.  R2′s new NVME driver has been tuned for low latency and high bandwidth and can be used for non-clustered storage spaces to improve write performance (in a Spaces write-back cache?).
  • CSV memory read-cache - Windows Server 2012 R2 can be configured to set aside some host memory for a CSV read cache.  This is different than the Spaces Write-Back cache.  CSV caching would operate in conjunction with any other caching done at the host OS or elsewhere.

That’s about it. Some of the MVPs had a preview of R2 up in Redmond, but all of this was to be announced in TechEd, New Orleans, this week.

~~~~

Image: Microsoft TechEd by BetsyWeber

Posted in CIFS/SMB, Cloud storage, DAS, desktop virtualization, Ethernet, File Storage, RDMA, RoCE, Server virtualization, SSD storage, Storage performance | Tagged , , , , , , , , , , , , , , , , , , , , , | 8 Comments

Racetrack memory gets rolling

A recent MIT study showed how new technology can be used to control and write magnetized bits in nano-structures, using voltage alone. This new technique also consumes much less power than using magnets or magnetism as well.

They envision a sort of nano-circuit, -wire or -racetrack with a series of transistor-like structures spaced at regular intervals above it.  Nano-bits would be racing around these nano-wires as a series of magnetized domains.  These new transitor-like devices would be a sort of onramp for the bits as well as stop-lights/speed limits for the racetrack.

Magnetic based racetrack memory issues

The problems with using magnets to write the bits in nano-racetrack is that magnetism casts a wide shadow and can impact adjacent race tracks, sort of like shingled writes (we last discussed in Shingled magnetic recording disks).   The other problem has been a way to (magnetically) control the speed of racing bits so they can be isolated and read or written effectively.

Magneto-ionic racetrack memory solutions

But MIT researchers have discovered a way to use voltage to change the magnetic orientation of a bit on a race track.  They also found a way through the use of voltage to precisely control the position of magnetic bits speeding around the track and to electronically isolate and select a bit.

What they have created is sort of a transistor for magnetized domains using ion-rich materials.  Voltages can be used to attract or repel those ions and then those ions can interact with flowing magnetic domains to speed up or slow down the movement of magnetic domains.

Thus, the transistor-like device can  be set to attract (or speed up) magnetized domains, slow down magnetized domains or stop them and also be used to change the magnetic orientation of a domain.  MIT researchers call these devices Magneto-ionic devices.

Racetrack memory redefined

So now we have a way to (electronically) seek to bit data on a race track,  a way to precisely (electronically) select bits on the race track, and a way to precisely (electronically) write data on a race track.  And presumably, with an appropriate (magnetic) read head, a way to read this data.  As an added bonus, apparently data once written on the racetrack requires no additional power to stay magnetized.

So the transistor-like devices are a combination of write heads, motors and brakes for the racetrack memory.  Not sure,  but if they can write, slow down and speedup magnetic domains, why can’t they read them as well that way the transistor-like devices could be a read head as well.

Why do they need more than one write-head per track. It seems to me that one should suffice for a fairly long track, not unlike disk drives. I suppose  more of them would make the track faster to write. But  they would all have to operate in tandem, speeding up or stoping the racing bits on the track all together and then starting them all back up, together again.  Maybe this way they can write a byte or a word or a chunk of data all at the same time.

In any event, it seems that race track memory took a (literally) quantum leap  forward with this new research out of MIT.

Racetrack memory futures

IBM has been talking about race track memory for some time now and this might be the last hurdle to overcome to getting there (we last discussed this in A “few exabytes-a-day” from SKA post).

In addition,  there doesn’t appear to be any write cycle, bit duration or the need for erasing whole page issues with this type of technology.  So as an underlying storage for a new sort of semi-conductor storage device (SSD) this has significant inherent advantages.

Not to mention that is all based on nano-based device sizes which means that it can pack a lot of bits in very little volume or area.  So SSDs based on these racetrack memory technologies will be denser, faster, and require less energy – could you want.

Image: Nürburgring 2012 by Juriën Minke

 

Posted in R&D measures, SSD storage, Strategic Inflection Points | Tagged , , , , | 1 Comment

EMCworld 2013 Day 3

IMG_1431Rich Napolitano, President Unified Storage Division got up and showed some technology demonstrations of what they had working in their labs.  Rich had some of his long time engineers up on the stage to show what was running in their labs.

  • First up was a dual controller, dual processors per controller 8 core processing chips (32cores in all) running against an all SSD backend. The configuration was up for a short time but it seemed like 96 SSDs, so an all-flash VNX array.  They used Iometer, random-8KB IO to drive almost 975K IOPS at sub-msec. response time. They hit 1M IOPS with just slightly above 1 msec. response time. You could see the processor utilization of the 32 cores going up as the workload reached higher levels.  Couldn’t see precisely but all the cores were running at ~70-80% busy at the 1Miops level and it seemed like the system performance was entering the knee-of-the-curve
  • Next up was the new VNX data app store demonstration. Similar to iPhone and Android App stores. EMC has identified a select set of apps that can be run directly on VNX hardware. The current demonstration had two versions of anti-virus, Recover Point Virtual Appliance (vRPA), (v?)VPLEX, CloudAccess and MySQL server.  The engineers showed how AV software could be installed and be running on the VNX as well as how vRPA could be installed and provide onboard replication services.
  • Then, they demonstrated a VNX virtual appliance (vVNX?) which was able to run on white box server which I think was running ESX.  In this case, vVNX was running with onboard DAS storage but had all the advanced functionality of VNX
  • Finally, they showed a vVNX running in a cloud services environment. Not sure if this was VMware vCloud or some other compute cloud but Rich stated that they will support many clouds.  With vVNX running in the cloud accessing storage behind the compute engine it’s unclear what the performance would be and how one would access the storage (file or iSCSI no doubt) but it did open up new possibilities as to where one could run VNX services.

It’s readily apparent that the next iteration of VNX software seems focused on taking advantage of multi-core processing (called MCx) to boost storage system performance, providing a virtualized environment within the VNX engine to run specialized data services and supplying a new vVNX functionality which can be deployed just about anywhere you would want.

That’s all for the public sessions, spent much of the rest of the day in NDA sessions.

I had a good time at EMCworld 2013, seeing old friends again and meeting new ones and thank EMC for inviting me.  For information on previous days at EMCworld 2013 please see my Day 1 and Day 2 posts.

Posted in Block Storage, Disk storage, Distributed computing, File Storage, Server virtualization, SSD storage, Storage architecture, Storage Features, Storage performance, Strategic Inflection Points | Tagged , , , , , , | 1 Comment

EMCworld 2013 Day 2

IMG_1382The first session of the day was with  Joe Tucci EMC Chairman and CEO.  He talked about the trends transforming IT today. These include Mobile, Cloud, Big Data and Social Networking. He then discussed  IDC’s 1st, 2nd and 3rd computing platform framework where the first was mainframe, the second was client-server and the third is mobile. Each of these platforms had winers and losers.  EMC wants definitely to be one of the winners in the coming age of mobile and they are charting multiple paths to get there.

Mainly they will use Pivotal, VMware, RSA and their software defined storage (SDS) product to go after the 3rd platform applications.  Pivotal becomes the main enabler to help companies gain value out of the mobile-social networking-cloud computing data deluge.  SDS helps provide the different pathways for companies to access all that data. VMware provides the software defined data center (SDDC) where SDS, server virtualization and software defined networking (SDN) live, breathe and interoperate to provide services to applications running in the data center.

Joe started talking about the federation of EMC companies. These include EMC, VMware, RSA and now Pivotal. He sees these four brands as almost standalone entities whose identities will remain distinct and seperate for a long time to come.

Joe mentioned the internet of things or the sensor cloud as opening up new opportunities for data gathering and analysis that dwarfs what’s coming from mobile today. He quoted IDC estimates that says by 2020 there will be 200B devices connected to the internet, today there’s just 2 to 3B devices connected.

Pivotal’s debut

Paul Maritz, Pivotal CEO got up and took us through the Pivotal story. Essentially they have three components a data fabric, an application development fabric and a cloud fabric. He believes the mobile and internet of things will open up new opportunities for organizations to gain value from their data wherever it may lie, that goes well beyond what’s available today. These activities center around consumer grade technologies  which 1) store and reason over very large amounts of data; 2) use rapid application development; and 3) operate at scale in an entirely automated fashion.

He mentioned that humans are a serious risk to continuous availability. Automation is the answer to the human problem for the “always on”, consumer grade technologies needed in the future.

Parts of Pivotal come from VMware, Greenplum and EMC with some available today in specific components. However by YE they will come out with Pivotal One which will be the first framework with data, app development and cloud fabrics coupled together.

Paul called Pivotal Labs as the special forces of his service organization helping leading tech companies pull together the awesome apps needed for the technology of tomorrow, consisting of Extreme programming, Agile development and very technically astute individuals.  Also, CETAS was mentioned as an analytics-as-a-service group providing such analytics capabilities to gaming companies doing log analysis but believes there’s a much broader market coming.

IMG_1393Paul also showed some impressive numbers on their new Pivotal HD/HAWQ offering which showed it handled many more queries than Hive and Cloudera/Impala. In essence, parts of Pivotal are available today but later this year the whole cloud-app dev-big data framework will be released for the first time.

IMG_1401Next up was a media-analyst event where David Goulden, EMC President and COO gave a talk on where EMC has come from and where they are headed from a business perspective.

Then he and Joe did a Q&A with the combined media and analyst community.  The questions were mostly on the financial aspects of the company rather than their technology, but there will be a more focused Q&A session tomorrow with the analyst community.

IMG_1403 Joe was asked about Vblock status. He said last quarter they announced it had reached a $1B revenue run rate which he said was the fastest in the industry.  Joe mentioned EMC is all about choice, such as Vblock different product offerings, VSpex product offerings and now with ViPR providing more choice in storage.

Sometime today Joe had mentioned that they don’t really do custom hardware anymore.  He said of the 13,000 engineers they currently have ~500 are hardware engineers. He also mentioned that they have only one internally designed ASIC in current shipping product.

Then Paul got up and did a Q&A on Pivotal.  He believes there’s definitely an opportunity in providing services surrounding big data and specifically mentioned CETAS as offering analytics-as-a-service as well as Pivotal Labs professional services organization.  Paul hopes that Pivotal will be $1B revenue company in 5yrs.  They already have $300M so it’s well on its way to get there.

IMG_1406Next, there was a very interesting media and analyst session that was visually stimulating from Jer Thorp, co-founder of The Office for Creative Research. And about the best way to describe him is he is a data visualization scientist.

IMG_1409He took some NASA Kepler research paper with very dry data and brought it to life. Also he did a number of analyzes of public Twitter data and showed twitter user travel patterns, twitter good morning analysis, twitter NYT article Retweetings, etc.  He also showed a video depicting people on airplanes around the world. He said it is a little known fact but over a million people are in the air at any given moment of the day.

Jer talked about the need for data ethics and an informed data ownership discussion with people about the breadcrumbs they leave around in the mobile connected world of today. If you get a chance, you should definitely watch his session.IMG_1410

Next Juergen Urbanski, CTO T-Systems got up and talked about the importance of Hadoop to what they are trying to do. He mentioned that in 5 years, 80% of all new data will land on Hadoop first.  He showed how Hadoop is entirely different than what went before and will take T-Systems in vastly new directions.

Next up at EMCworld main hall was Pat Gelsinger, VMware CEO’s keynote on VMware.  The story was all about Software Defined Data Center (SDDC) and the components needed to make this happen.   He said data was the fourth factor of production behind land, capital and labor.

Pat said that networking was becoming a barrier to the realization of SDDC and that they had been working on it for some time prior to the Nicera acquisition. But now they are hard at work merging the organic VMware development with Nicera to create VMware NSX a new software defined networking layer that will be deployed as part of the SDDC.

Pat also talked a little bit about how ViPR and other software defined storage solutions will provide the ease of use they are looking for to be able to deploy VMs in seconds.

Pat demo-ed a solution specifically designed for Hadoop clusters and was able to configure a hadoop cluster with about 4 clicks and have it start deploying. It was going to take 4-6 minutes to get it fully provisioned so they had a couple of clusters already configured and they ran a pseudo Hadoop benchmark on it using visual recognition and showed how Vcenter could be used to monitor the cluster in real time operations.

Pat mentioned that there are over 500,000 physical servers running Hadoop. Needless to say VMware sees this as a prime opportunity for new and enhanced server virtualization capabilities.

That’s about it for the major keynotes and media sessions from today.

Tomorrow looks to be another fun day.

Posted in Cloud services, desktop virtualization, Information economy, Internet traffic, Mobile computing, R&D measures, Server virtualization, Software Defined Network, Visionary leadershp | Tagged , , , , , , , | 4 Comments

EMCworld 2013 day 1

Lines for coffee at the Cafe were pretty long this morning and I missed my opportunity to have breakfast to do some work. But eventually made my way to the press room and got some food and coffee.

Spent the morning in Analyst sessions mostly under NDA but it seems safe to say that EMC sees plenty of opportunity ahead.

The first session Q&A with BRS executives and customers was enlightening but the main message from the customers was that data protection is hard, legacy systems often can’t adjust quick enough and sometimes a completely new architecture is warranted. The executives were upbeat about current BRS business and where they were headed in the future.

20130506-142735.jpgRest of the morning was with Jeremy Burton EVP Product, Operations and Marketing and John Roese, the new SVP and CTO of EMC (6 months on the job). Jeremy talked about an IDC insight that there’s a new world emerging so-called 3rd platform applications based on mobile and consumer grade technology  with literally billions of users, millions of apps built on mobile-cloud-bigdata-social infrastructure which complements the 2nd platform built on lan/wan, client server frameworks.

For an example of this environment Jeremy mentioned that AT&T provisions 12PB of storage a month.

What’s needed for this new platform is a new type of storage built for the 3rd platform but taking advantage of current enterprise storage characteristics.  This is ViPR (more on that later)

John comes by way of Huawei, Nortel and myriad others and offers a broad insight to the way forward for EMC. It looks like a bright future ahead if they can do half of what John has outlined.

John talked about the intersections between the carrier market (or services), enterprise IT and consumer market.  There is convergence between these regions and at each of these intersections new technology is going to answer many of the problems which exist. For instance in the carrier space:

  • The amount of information they gather is frightening they know everything about you. Pivotal will be the key here because its good at 1) ability to correlate information across different information sources. Most carriers have a whole bunch of disparate information stores; and 2) It’s not just focused on Big Data as a non-realtime problem but also provides realtime analytics as well.
  • Capital costs are going down but $/bits are going way down.  VMware & Software defined data center is the right way to drive down costs.  Today servers are ~50% virtualized but networking is not virtualized at all.
  • Customers are dissatisfied with service providers (carriers).  Again Pivotal is key here. One carrier customer was focused on customer churn and tried to figure out how to minimize this. They used  Gemfire’ high speed infrastructure that could watchc all transactions on cell tower infrastructure pick out dropped calls, send it to Greenplum and correlate this with the customer attributes (good or bad), and within 100msec supply an interaction with the customer in to apologize and offer some services to make it better.
  • Internet is the new wild west –use at your own risk,  spoofing websites, respond to email could be anyone, chaos to security. RSA can become the trusted internet provider by looking at the internet holistically, combining information from many customers, aggregating and sharing these interactions to deterimine the trust of every transaction. Trust is becoming a new big data problem.
  • Hybrid and public cloud is their biggest opportunity but they don’t know how to attack it. VMware and SDDC will evolve to provide orchestrated movement from private to public and closed to open.

The thinking seems pretty straightforward given what they are trying to accomplish and the framework he applied to EMC’s strategy going forward made a lot of sense.

20130506-172955.jpgBrian Gallagher did a keynote on enterprise storage new functions and features which covered VMAX, VPLEX, RecoverPoint, and XtremIO/SF/SW. Mentioned RecoverPoint virtual appliance and sort of a statement of direction on being able to move application functionality directly on VMAX. He kind of demoed this with VPLEX running on VMAX.

He also talked about FAST speed of reaction versus the competition, mentioned that FAST provides information about the storage tiering to up to 4 different VMAX arrays. Showed a comparison of VMAX 10K against another prime competitor that looked downright embarrassing.  And talked about VMAX cloud edition.

20130506-173022.jpgAfter that 1 on 1 meetings all under strict NDA. But then the big Keynote with Jeremy again and David Goulden President and COO on ViPR. They have implemented software defined storage (SDS).  Last week I did a post on SDS trying to layout some of the problems and promises of SDS (please see The promise of SDS post).

But what I missed was the data path transformation that ViPR can do to provide object and HDFS access to traditional and commodity storage systems.  ViPR starts out primarily in the control layer providing automated provisioning, self management, across heterogeneous storage pools. With ViPR one can define virtual storage arrays and then configure virtual storage pools across those arrays regardless of the physical infrastructure underneath them.

More on ViPR in a separate post but suffice it to say EMC has been working on this for awhile now. But how it’s positioned with VPLEX and the other storage virtualization capabilities in VMAX and other products is another matter. But it seems they are carving out a space for ViPR between and above the current storage solutions.

End of day one is in the Expo and then cocktail parties… stay tuned for day 2.

 

Posted in Block Storage, Cloud services, Cloud storage, DAS, Disk storage, Distributed computing, File Storage, Information economy, Market dynamics, Mobile computing, Object storage, Server virtualization, Software Defined Network, SSD storage, Storage, Storage architecture, Storage virtualization, Strategic Inflection Points, System effectiveness | Tagged , , , , | 2 Comments

The promise of software defined storage

Data hypervisor, software defined storage, data plane, control plane

(c) 2012 Silverton Consulting, Inc. All rights reserved

Not sure why but all the hype around software defined storage seems to be reaching a crescendo.  Possible due to conference season coming up but it started earlier this year.  I attended an SNW analyst session that was talking about software defined storage had on its panel technical people from HDS, IBM, Data Core and VMware.  It seems the distinction between storage virtualization and software defined storage is getting slimmer every time we talk about it.  I have written before about software defined storage (see my Data Hypervisor post).

Server, networking and storage virtualization today

Server virtualization makes an awful lot of sense, has made lots of money and arguably been around for decades now especially in mainframe systems.  Servers have so much power today that dedicating one to a single workload just doesn’t make any sense anymore.

Network virtualization from OpenFlow and others also makes a lot of sense (see OpenFlow the next wave in networking and OpenFlow part 2, Cisco’s response posts). Here we aren’t necessarily boosting network utilization as much as changing resource allocation to deal with altered traffic flows.  That and the fact that provisioning, monitoring and other management characteristics can now be under pragmatic control from the user makes these systems very appealing. Especially, to organizations that exhibit varying network activity over time.

Storage virtualization has been around for a long time too and essentially places a storage system abstraction layer on top of a group of other, heterogeneous storage systems. This provides a number of capabilities such as allowing data to be migrated from one storage system to another without host knowledge or intervention.  Other storage virtualization features include, centralized, management, common storage features, different storage personalities (protocols), etc. But just being able to migrate data from one storage system to another without host intervention or knowledge provides an awful lot of value, especially to large data centers which refresh technology frequently.

Software defined storage compared to server virtualization

Software defined storage seems to imply some ability to marry storage virtualization services to RESTful and other APIs which would allow programatic storage provisioning, monitoring and management.  This would allow data centers to manage and control their storage without involving storage administrators in day-to-day activities.

When I compare this to server virtualization the above described capabilities really don’t increase storage utilization much.  Yes, by automating provisioning or even running thin provisioning one can potentially boost storage capacity utilization but you really haven’t increased the IO utilization much by doing this.

Looking under the covers of most storage systems one might find that CPU cores are pretty idle, but data paths and storage devices are typically running flat out.  One problem is that today’s enterprise storage subsystems are already highly shared across applications and users.  So there is really no barrier to sharing these resources as widely as they can.   As such, storage system IOPS and/or bandwidth utilization is already pretty high.   I would say a typical enterprise application environment storage subsystem performance usually runs above 30% and reaching 50% or more during peak time periods. Increasing IOPS utilization much beyond that risks seriously impacting peak performance periods.

Now if somehow one could migrate slower data around a complex to lower performing storage when there’s no need for high performance and higher performing data to higher performing storage when there is a need then that could help increase performance utilization considerably.   But, many storage systems already do this internally through automated storage tiering and even some can do this across storage systems using storage virtualization.

But the underlying problem here is that in takes a lot of time, resources and effort to move TBs of data around a data center, especially when its doing other work.  So other than something akin to storage tiering across storage systems we are unlikely to see much increase in storage performance utilization with a gaggle of multiple storage systems.  I suppose in the future moving TB of data may take much less time & resources than today but then the problem becomes moving PB of data around.

Software defined storage compared to network virtualization

When I compare the above capabilities to network virtualization it doesn’t look very similar.   There’s really no way to change the storage performance to optimize it for one direction (or application) at this instant and then move storage performance around to another application a couple of hours later.  Yes, again automated storage tiering can do this, and yes some of these systems can tier across storage systems using storage virtualization but in general barring storage tiering there’s nothing like this available today.  

Maybe if inside a storage system the data paths could somehow be programatically reconfigured to offer say more internal bandwidth to the Device-to-Cache path vs. the Cache-to-Frontend path. Changing or reconfiguring data path resources like this could certainly optimize the internal performance of a storage system and this would be a worthwhile feature of any software defined storage.  Knowing which is more important to one application and less important to all the others will take some smarts, across the storage system and host O/S but it’s certainly feasible.  So, with RESTful interfaces, APIs or application hints data paths could be reconfigurations on demand to support applications that are all vieing for IO activity.  

With these sorts of capabilities software defined storage starts to look a little more like software defined networking.

Software defined storage on its own

But in the end we always reach a fundamental limit of IO capabilities in today’s storage systems which is the devices. Yes you can have 2000 or more devices in high-end storage  today and yes you can have all-flash arrays. However, most storage systems are configured to keep whatever devices they have pretty busy as much of the time as possible.

Until we create some sort of storage device that can provide more performance than most applications can ever use, even when they are shared via a storage system, software defined storage capabilities will be limited.  Today’s SSDs have certainly boosted performance considerably but this just means that most applications that warrant all flash arrays are performing faster.  It just so happens that some applications can take all the performance you throw at them and still want more.

I suppose if SSDs cost were to come down to match NL-SAS storage prices and still maintain the 100X faster IOP rate, then maybe a storage system built on such devices could be more “software defined” than others.  And maybe that’s where everyone is headed, believing NAND/SSD price trends will drive costs down so much that everyone can have all the IOPS performance they will ever need out of a single storage system.

Yet, this still just looks like shared storage we have today, only more of it. So we return back to our roots and see that software defined storage is just another way to add more storage sharing. Storage virtualization is nice, new more programmatical storage systems is even better but faster-cheaper storage devices is best of all.

So what we really need is much cheaper SSDs to realize the full promise of software defined storage.   In the mean time opening up APIs and providing RESTful interfaces to provide programatic interfaces to provisioning, monitoring, managing and tuning storage system data paths and other performance characteristics are all we can hope for.

Comments?

 

 

 

Posted in Server virtualization, Software Defined Network, Storage, Storage performance, Storage utilization, Storage virtualization, System effectiveness | Tagged , , , , , | 2 Comments

Cheap phones + big data = better world

Big data visualization, Facebook friend connections, Data science

Facebook friend carrousel by antjeverena (cc) (from flickr)

Read an article today in MIT Technical Review website (Big data from cheap phones) that shows how cheap phones, call detail records (CDRs) and other phone logs can be used to help fight disease and help understand disaster impacts.

Cheap phones generate big data

In one example, researchers took cell phone data from Kenya and used it to plot people movements throughout the country. What they were looking for is people who frequented malaria disease hot spots so that they could try to intervene in the transmission of this disease. Researchers discovered one region (cell tower) that had many people that were frequenting a particular bad location for malaria.  It turned out the region they identified had a large plantation with many migrant workers. These workers moved around a lot.  In order to reduce the transmission of the disease public health authorities could target this region to use more bed nets or try to reduce infestation at source of the disease.  In either case, people mobility was easier to see with cell phone data than actually putting people on the ground and counting where people go or come from.

In another example, researchers took cell phone data from Haiti before and after the earthquake and were able to calculate how many people were in the region hardest hit by the earthquake.  They were also able to identify how many people left the region and where the went to.  As a follow on to this, researchers were able to in real time show how many people had fled the cholera epidemic.

Gaining access to cheap phone data

Most of this call detail record data is limited to specific researchers for very specialized activities requested by the host countries. But recently  Orange released 2.5 billion cell phone call and text data records for five million customers they have in Ivory Coast that occurred during five months time.  They released the data to the public under some specific restrictions in order to see what data scientists could do with it. The papers detailing their activities will be published at a MIT Data for Development conference.

~~~~

Big data’s contribution to a better world is just beginning but from what we see here there’s real value in data that already exists, if only the data were made more widely available.

Comments?

Posted in Crowdsourcing, Data analytics, Data availability, Data science, Information economy, Strategic Inflection Points, Visionary organizations | Tagged , , | 2 Comments