Platform9, a whole new way to run OpenStack

logo-long2
At TechFieldDay 10 (TFD10), in Austin this past week we had a presentation from Platform9‘s Shirish Raghuram Co-founder and CEO and Bich Le, Co-founder and Chief Architect. Both Shirish and Bich seemed to have come from having  worked a long time at VMware and prior to that, other tech giants.

Platform9 provides a user friendly approach to running OpenStack in your data center. Their solution is a SaaS based, management portal or control plane for running compute, storage and networking infrastructure under OpenStack, the open source cloud software.

Importing running virtualization environments

If you have a current, running VMware vSphere environment, you can onboard or import portions of or all of your VMs, datastores, NSX nodes, and the rest of the vSphere cluster and have them all come up as OpenStack core compute instances, Cinder storage volumes, and use NSX as a replacement for Neutron networking nodes.

In this case, once your vSphere environment is imported, users can fire up more compute instances, terminate ones they have, allocate more Cinder volumes, etc. all from an AWS-like management portal.  It’s as close to an AWS console as I have seen.

Platform9 also works for KVM environments, that is you can import currently running KVM environments into OpenStack and run them from their portal.

Makes OpenStack, almost easy to run/use/operate

Historically, the problem with OpenStack was its user interface. Platform9 solves this problem and makes it easy to import, use, and deploy VMware and KVM environments into an OpenStack framework. Once there, users and administrators have the same level of control that AWS and Microsoft Azure users have, i.e., fire up compute instances, allocate storage volumes and attach the two together, terminate the compute activities, detach the volumes and repeat, all in your very own private cloud.

Bare metal OpenStack support too

If you don’t have a current KVM or VMware environment, Platform9 will deploy a KVM virtualization environment on bare metal servers and storage and use that for your OpenStack cloud.

Security comes from tenant attributes, certain tenants have access and control over certain compute/storage/networking instances.

Customers can also use Platform9 as a replacement for vCenter, and once deployed under OpenStack, tenants/users have control over their segments of the private cloud deployment.

It handles multiple vSphere & KVM clusters as well and can also handle mixed virtualization environments within the same OpenStack cloud.

A few things missing

The only things I found missing from the Platform9 solution was Swift Object storage support and support for Hyper-V environments.

The Platform9 team mentioned that multi-region support was scheduled to come out this week, so then your users could fire up compute and storage instances across your world wide data centers, all from a single Platform9 management portal.

Pricing for the Platform9 service is on a socket basis, with volume pricing available for larger organizations.

If you are interested in a private cloud and are considering  OpenStack in order to avoid vendor lock-in, I would find it hard not to give Platform9 a try.

While at Dell


Later in the week, at TFD10 we talked with Dell, and they showed off their new VRTX Server product. Dell’s VRTX server is a very quiet, 4-server, 48TB tower or rackmount enclosure, which would make a very nice 8 or 16 socket CPU, private cloud for my home office environment (the picture doesn’t do it justice). And with a Platform9 control plane, I could offer OpenStack cloud services out of my home office, to all my neighbors around the world, for a fair but high price…

Comments?

 

Primary data’s path to better data storage presented at SFD8

IMG_5606rz A couple of weeks ago we met with Primary Data, Lance Smith, CEO, David Flynn, CTO and Kaycee Lai, SVP Product & Sales who were presenting at Storage Field Day 8 (SFD8, videos of their sessions available here). Primary Data has just emerged out of stealth late last year and has ~$60M in funding. Also they have Steve Wozniak (of Apple fame) as Chief Scientist, but he wasn’t at the SFD8 session 🙁

Primary Data seems out to change the world. At first I thought this was just another form of storage virtualization but they are laser focused on data virtualization or what they call data mobility. It differs from pure storage virtualization by being outside the data path.  (I have written about data virtualization before as well as the data hypervisor a long time ago). Nowadays they seem to be using the tag line of data in motion.

Why move data?

David has a theory behind the proliferation of startup storage companies. The spectrum behind capacity and performance has gotten immense, over time, which has provided an opening for a number of companies to address these widening needs.

David believes that caching at the storage system or in the servers is an attempt to address this issue by “loaning” the data from the storage silo to the cache. This is trying to supply a lower cost $/IOP for the data. Similar considerations are apparent at the other side where customer’s use archive or backup services to take advantage of much cheaper $/GB storage.

However, given the difficulty of moving data around in present day storage environments, customer data has become essentially immobile. Primary Data is trying to bring about a data mobility revolution and allow data to move over this spectrum of performance and capacity of storage with ease. Doing so easily, will provide significant benefits as customers can more fully take advantage of the various levels of performance and capacity in their data center storage environments.

Primary Data architecture

IMG_5607Primary Data is providing data mobility by using their meta-data service called the DataSphere appliance and their client software running on host servers called the Data Portal. Their offering can be best explained in three layers:

  • Data virtualization layer – provides continuity of identity and continuity of access across multiple physical storage systems. That is the same data (identity continuity) can be accessed wherever it resides (access continuity) by server applications. Such access and identity must transcend access protocols and interfaces. The Data Portal client software intercepts the server data activity and does control plane activity using the DataSphere appliance and performs IO directly using the physical storage.
  • Objective based data management – supplies a data affinity service. That is data can have a temporary location relationship with physical storage depending on the current performance (R:W, IOPS, bandwidth, latency) and protection (durability, availability, disaster recoverability, security, copy-ability, version-ability) characteristics of the data. These data objectives are matched to the capabilities or service catalog of the storage infrastructure and data objectives can change over time
  • Analytics in the loop – detects the performance and other characteristics of the storage and data in real-time. That is by monitoring the storage IO activity Primary Data can determine the actual performance attribute of the storage. Similarly, by monitoring the applications IO characteristics over time the system can determine the performance objectives of its data. The system also takes advantage of SMI-S to define some of the other characteristics of the storage systems.

How does Primary Data work?

Primary Data has taken advantage of parallel NFS extensions (pNFS) in NFSv4 to externalize and separate the storage control plane from the IO data plane. This works well for native Linux where the main developer of the Linux file system stack is on their payroll.IMG_5608rz

In Windows they put a filter driver in front of SMB to split off the control from data IO plane. Something similar is done for VMware ESX servers to supply the control-data plane split but in this case there is a software defined Data Portal that goes along with the DataSphere Service client that can do it all within the same ESX server. Another alternative exists and that is to use the Data Portal appliance as a storage virtualization service but then the IO data path goes through the portal.

According to their datasheet they currently support data virtualization services for NetApp cDOT and 7-mode, EMC Isilon OneFS7.2, and Nexenta 4.x&5.0 but plan on more.

They are not quite GA yet, but are close.

Comments?

 

 

 

When 64 nodes are not enough

Why would VMware with years of ESX development behind them want to develop a whole new virtualization system for Docker and other container frameworks. Especially since they already have a compatible Docker support in their current product line.

The main reason I can think of is that a 64 node cluster may be limiting to some container services and the likelihood of VMware ESX/vSphere to supporting 1000s of nodes in a single cluster seems pretty unlikely. So given that more and more cloud services are being deployed across 1000s of nodes using container frameworks, VMware had to do something or say goodbye to a potentially lucrative use case for virtualization.

Yes over time VMware may indeed extend vSphere clusters to 128 or even 256 nodes but by then the world will have moved beyond VMware services for these services and where will VMware be then – left behind.

Photon to the rescue

With the new Photon system VMware has an answer to anyone that needs 1000 to 10,000 server cluster environments. Now these customers can easily deploy their services on a VMware Photon Platform which is was developed off of ESX but doesn’t have any cluster limitations of ESX.

Thus, the need for Photon was now. Customers can easily deploy container frameworks that span 1000s of nodes. Of course it won’t be as easy to manage as a 64 node vSphere cluster but it will be easy automated and easier to deploy and easier to scale when necessary, especially beyond 64 nodes.

The claim is that the new Photon will be able to support multiple container frameworks without modification.

So what’s stopping you from taking on the Amazons, Googles, and Apples of the worlds data centers?

  • Maybe storage, but then there’s ScaleIO, and the other software defined storage solutions that are there to support local DAS clusters spanning almost incredible sizes of clusters.
  • Maybe networking, I am not sure just where NSX is in the scheme of things but maybe it’s capable of handling 1000s of nodes and maybe not but networking could be a clear limitation to what how many nodes can be deployed in this sort of environment.

Where does this leave vSphere? Probably continuation of the current trajectory, making easier and more efficient to run VMware clusters and over time extending any current limitations. So for the moment two development streams based off of ESX and each being enhanced for it’s own market.

How much of ESX survived is an open question but it’s likely that Photon will never see the VMware familiar services and operations that is readily available to vSphere clusters.

Comments?

Photo Credit(s): A first look into Dockerfile system

Springpath SDS springs forth

Springpath presented at SFD7 and has a new Software Defined Storage (SDS) that attempts to provide the richness of enterprise storage in a SDS solution running on commodity hardware. I would encourage you to watch the SFD7 video stream if you want to learn more about them.

HALO software

Their core storage architecture is called HALO which stands for Hardware Agnostic Log-structured Object store. We have discussed log-structured file systems before. They are essentially a sequential file that can be randomly accessed (read) but are sequentially written. Springpath HALO was written from scratch, operates in user space and unlike many SDS solutions, has no dependencies on Linux file systems.

HALO supports both data deduplication and compression to reduce storage footprint. The other unusual feature  is that they support both blade servers and standalone (rack) servers as storage/compute nodes.

Tiers of storage

Each storage node can optionally have SSDs as a persistent cache, holding write data and metadata log. Storage nodes can also hold disk drives used as a persistent final tier of storage. For blade servers, with limited drive slots, one can configure blades as part of a caching tier by using SSDs or PCIe Flash.

All data is written to the (replicated) caching tier before the host is signaled the operation is complete. Write data is destaged from the caching tier to capacity tier over time, as the caching tier fills up. Data reduction (compression/deduplication) is done at destage.

The caching tier also holds read cached data that is frequently read. The caching tier also has a non-persistent segment in server RAM.

Write data is distributed across caching nodes via a hashing mechanism which allocates portions of an address space across nodes. But during cache destage, the data can be independently spread and replicated across any capacity node, based on node free space available.  This is made possible by their file system meta-data information.

The capacity tier is split up into data and a meta-data partitions. Meta-data is also present in the caching tier. Data is deduplicated and compressed at destage, but when read back into cache it’s de-compressed only. Both capacity tier and caching tier nodes can have different capacities.

HALO has some specific optimizations for flash writing which includes always writing a full SSD/NAND page and using TRIM commands to free up flash pages that are no longer being used.

HALO SDS packaging under different Hypervisors

In Linux & OpenStack environments they run the whole storage stack in Docker containers primarily for image management/deployment, including rolling upgrade management.

In VMware and HyperVM, Springpath runs as a VM and uses direct path IO to access the storage. For VMware Springpath looks like an NFSv3 datastore with VAAI and VVOL support. In Hyper-V Springpath’s SDS is an SMB storage device.

For KVM its an NFS storage, for OpenStack one can use NFS or they have a CINDER plugin for volume support.

The nice thing about Springpath is you can build a cluster of storage nodes that consists of VMware, HyperV and bare metal Linux nodes that supports all of them. (Does this mean it’s multi protocol, supporting SMB for Hyper-V, NFSv3 for VMware?)

HALO internals

Springpath supports (mostly) file, block (via Cinder driver) and object access protocols. Backend caching and capacity tier all uses a log structured file structure internally to stripe data across all the capacity and caching nodes.  Data compression works very well with log structured file systems.

All customer data is supported internally as objects. HALO has a write-log which is spread across their caching tier and a capacity-log which is spread across the capacity tier.

Data is automatically re-balanced across nodes when new nodes are added or old nodes deleted from the cluster.

Data is protected via replication. The system uses a minimum of 3 SSD nodes and 3 drive (capacity) nodes but these can reside on the same servers to be fully operational. However, the replication factor can be configured to be less than 3 if you’re willing to live with the potential loss of data.

Their system supports both snapshots (2**64 times/object) and storage clones for test dev and backup requirements.

Springpath seems to have quite a lot of functionality for a SDS. Although, native FC & iSCSI support is lacking. For a file based, SDS for hypbervisors, it seems to have a lot of the bases covered.

Comments?

Other SFD7 blogger posts on Springpath:

Picture credit(s): Architectural layout (from SpringpathInc.com) 

VMworld 2014 projects Marvin, Mystic, and more

IMG_2902[This post was updated after being published to delete NDA material – sorry, RL] Attended VMworld2014 in San Francisco this past week. Lots of news, mostly about vSphere 6 beta functionality and how the new AirWatch acquisition will be rolled into VMware’s End-User Computing framework.

vSphere 6.0 beta

Virtual Volumes (VVOLs) is in beta and extends VMware’s software-defined storage model to external NAS and SAN storage.  VVOLs transforms SAN/NAS  storage into VM-centric devices by making the virtual disk a native representation of the VM at the array level, and enables app-centric, policy-based automation of SAN and NAS based storage services, somewhat similar to the capabilities used in a more limited fashion by Virtual SAN today.

Storage system features have proliferated and differentiated over time and to be able to specify and register any and all of these functional nuances to VMware storage policy based management (SPBM) service is a significant undertaking in and of itself. I guess we will have to wait until it comes out of beta to see more. NetApp had a functioning VVOL storage implementation on the show floor.

Virtual SAN 1.0/5.5 currently has 300+ customers with 30+ ready storage nodes from all major vendors, There are reference architecture documents and system bundles available.

Current enhancements outside of vSphere 6 beta

vRealize Suite extends automation and monitoring support for a broad mix of VMware and non VMware infrastructure and services including OpenStack, Amazon Web Services, Azure, Hyper-V, KVM, NSX, VSAN and vCloud Air (formerly vCloud Hybrid Services), as well as vSphere.

New VMware functionality being released:

  • vCenter Site Recovery Manager (SRM) 5.8 – provides self service DR through vCloud Automation Center (vRealize Automation) integration, with up to 5000 protected VMs per vCenter and up to 2000 VM concurrent recoveries. SRM UI will move to be supported under vSphere’s Web Client.
  • vSphere Data Protection Advanced 5.8 – provides configurable parallel backups (up to 64 streams) to reduce backup duration/shorten backup windows, access and restore backups from anywhere, and provides support for Microsoft Exchange DAGs, and SQL Clusters, as well as Linux LVMs and EXT4 file systems.

VMware NSX 6.1 (in beta) has 150+ customers and provides micro segmentation security levels which essentially supports fine grained security firewall definitions almost at the VM level, there are over 150 NSX customers today.

vCloud Hybrid Cloud Services is being rebranded as vCloud Air, and is currently available globally through data centers in the US, UK, and Japan. vCloud Air is part of the vCloud Air Network, an ecosystem of over 3,800 service providers with presence in 100+ countries that are based on common VMware technology.  VMware also announced a number of new partnerships to support development of mobile applications on vCloud Air.  Some additional functionality for vCloud Air that was announced at VMworld includes:

  • vCloud Air Virtual Private Cloud On Demand beta program supports instant, on demand consumption model for vCloud services based on a pay as you go model.
  • VMware vCloud Air Object Storage based on EMC ViPR is in beta and will be coming out shortly.
  • DevOps/continuous integration as a service, vRealize Air automation as a service, and DB as a service (MySQL/SQL server) will also be coming out soon

End-User Computing: VMware is integrating AirWatch‘s (another acquisition) enterprise mobility management solutions for mobile device management/mobile security/content collaboration (Secure Content Locker) with their current Horizon suite for virtual desktop/laptop support. VMware End User Computing now supports desktop/laptop virtualization, mobile device management and security, and content security and file collaboration. Also VMware’s recent CloudVolumes acquisition supports a light weight desktop/laptop app deployment solution for Horizon environments. AirWatch already has a similar solution for mobile.

OpenStack, Containers and other collaborations

VMware is starting to expand their footprint into other arenas, with new support, collaboration and joint ventures.

A new VMware OpenStack Distribution is in beta now to be available shortly, which supports VMware as underlying infrastructure for OpenStack applications that use  OpenStack APIs. VMware has become a contributor to OpenStack open source. There are other OpenStack distributions that support VMware infrastructure available from HP, Cannonical, Mirantis and one other company I neglected to write down.

VMware has started a joint initiative with Docker and Pivotal to broaden support for Linux containers. Containers are light weight packaging for applications that strip out the OS, hypervisor, frameworks etc and allow an application to be run on mobile, desktops, servers and anything else that runs Linux O/S (for Docker Linux 3.8 kernel level or better). Rumor has it that Google launches over 15M Docker containers a day.

VMware container support expands from Pivotal Warden containers, to now also include Docker containers. VMware is also working with Google and others on the Kubernetes project which supports container POD management (logical groups of containers). In addition Project Fargo is in development which is VMware’s own lightweight packaging solution for VMs. Now customers can run VMs, Docker containers, or Pivotal (Warden) containers on the same VMware infrastructure.

AT&T and VMware have a joint initiative to bring enterprise grade network security, speed and reliablity to vCloud Air customers which essentially allows customers to use AT&T VPNs with vCloud Air. There’s more to this but that’s all I noted.

VMware EVO, the next evolution in hyper-convergence has emerged.

  • EVO RAIL (formerly known as project Marvin) is appliance package from VMware hardware partners that runs vSphere Suite and Virtual SAN and vCenter Log Insight. The hardware supports 4 compute/storage nodes in a 2U tall rack mounted appliance. 4 of these appliances can be connected together into a cluster. Each compute/storage node supports ~100VMs or ~150 virtual desktops. VMware states that the goal is to have an EVO RAIL implementation take at most 15 minutes from power on to running VMs. Current hardware partners include Dell, EMC (formerly named project Mystic), Inspur (China), Net One (Japan), and SuperMicro.
  • EVO RACK is a data center level hardware appliance with vCloud Suite installed and includes Virtual SAN and NSX. The goal is for EVO RACK hardware to support a 2hr window from power on to a private cloud environment/datacenter deployed and running VMs. VMware expects a range of hardware partners to support EVO RACK but none were named. They did specifically mention that EVO RACK is intended to support hardware from the Open Compute Project (OCP). VMware is providing contributions to OCP to facilitate EVO RACK deployment.

~~~~

Sorry about the stream of consciousness approach to this. We got a deep dive on what’s in vSphere 6 but it was all under NDA. So this just represents what was discussed openly in keynotes and other public sessions.

Comments?

 

Proximal Data, server SSD caching software

7707062406_6508dba2a4_oI attended Storage Field Day 4 (SFD4) about a month ago now and had a chance to visit with Rory Bolt, CEO/Founder of Proximal Data, a new server side caching software solution. Last month the GreyBeards (Howard Marks and I) talked with Satyam Vaghani, Co-founder and CTO of PernixData another server side caching solution. You can find that podcast here. But this post is about Proximal Data. These guys could use some better marketing but when you spend 90% of your funding on engineers this is what you get.

Proximal Data doesn’t believe in agent software. because it takes a long time to deploy and could potentially disrupt IT operations when being installed. In contrast, Proximal Data installs their AutoCache solution software into the hypervisor as a VIB (vSphere Installation Bundle). There was some discussion at SFD4, on whether installing the VIB would be disruptive or not to customer operations. Not being a VMware expert I won’t comment on the results of the discussion but if you want to find out more I suggest viewing the SFD4 video of their Proximal Data’s presentation.

Of course, being at the Hypervisor layer can give them IO activity information at the VM level and could use this to control their caching software at VM granularity. In addition, by executing at the Hypervisor layer AutoCache doesn’t require any guest OS specific functionality or hooks. Another nice thing about executing at the hypervisor level is that they can cache RDM devices.

To use AutoCache you will need one or more PCIe SSD(s) or DAS SSD(s) in your ESXi server.  Once the PCIe SSD or DAS SSD is installed and after you have installed/activated the AutoCache software you will need to partition or dedicate the device to Proximal Data’s AutoCache.

AutoCache is managed as a virtual appliance with a Web server GUI.  With the networking setup and AutoCache VIB, installed you can access their operator panels via a tab in vCenter. Once the software is installed you don’t have to use their GUI ever again.

AutoCache read caching algorithms

Not every read IO for a VM being cached is brought into AutoCache’s SSD cache. They are trying to insure that cached data will be referenced again. As such, they typically wait for two reads before the data is placed into cache.

They support two different read caching algorithms called during the presentation as Algorithm A or Algorithm B. (They really need some marketing – Turbo Boost and Extreme Boost sounds better to me). Not sure they ever described the differences between the two, but the fact that they have multiple caching algorithms speaks to some sophistication. They also maintain a “Ghost data list”. Ghost data is data whose metadata is still in cache, but whose actual data is no longer in cache.

When a miss occurs, they determine if the data would have been a hit in Ghost data, in Algorithm A or in Algorithm B if they were active on the VM.  If it would have been a hit in Ghost data then in general, you probably need more SSD caching space on this ESXi server for the VMs being cached. If Algorithm A or B, probably should be using that algorithm for this VM’s IO.

Another approach AutoCache supports is called “Glimmer IO”. I liken this to sequential read-ahead where AutoCache keeps track, on a VM basis, all the IO being performed and try to determine if it is sequential or random. If the VM is doing sequential IO, AutoCache can start reading ahead of where the VM is currently reading. By doing so, they could stage the data in cache before the VM needs it/reads it. According to Rory there are policies which can be set on a VM basis to limit how much read-ahead is being performed. I assume there are policies associated with the use of Algorithm A and B on a VM basis as well but they didn’t go into this.

AutoCache cache warmup for vMotion

The other nice thing that AutoCache does is it provides a cache warmup for the target ESXi server when moving VMs via vMotion. This is done by registering for Vmotion API and trapping Vmotion requests. Once they detect that a VM is being moved they send the VM’s  Autocache metadata over to the target Host at which time the target system AutoCache can start to fill it’s cache from the shared storage. Not a bad approach from my perspective. The amount of data that needs to be moved is minimal and you get the AutoCache code running in the target machine to start preloading blocks that were in cache from the source Host. They also mentioned that once they have copied the metadata over to the target Host, they could free up (invalidate) all the space in the source Host’s cache that was being held by the VM being moved.

Proximal Data for Hyper-V

At SFD4, Rory mentioned that a Hyper-V version of AutoCache was coming out shortly. And although they specifically indicated that write back caching was not a great idea (in contrast to Satyam and PernixData), there was a potential for them to look at implementing this as well over time.

The product is sold through resellers, distributors and OEMs.  They claim support for any flash device although they have an approved HCL.

Current pricing is $1000 for the AutoCache software to support a SSD cache of 500GB or less. From what we see in the enterprise storage systems having a cache of 2-5% of your total backend storage is about right. (But see my VM working set inflection points and  SSD caching post for another side on this).   So a 500GB SSD cache should be able to support 10-25TB of backend data if all goes well.

~~~~

After the podcast on PernixData’s clustering, write-back caching software, Proximal Data didn’t seem as complex or useful. But there is a place for read-only caching. The fact that they can help warm the target Host’s cache for a vMotion is a great feature if you plan on doing a lot of movement of VMs in your shop. The fact that they have distinct support for multiple cache algorithms, understand sequential detect and have some way of telling you that you could use more SSD caching is also good in my mind.

Comments?

Photo: 20-nanometer NAND flash chip, IntelFreePress’ photostream

 

 

Storage changes in vSphere 5.5 announced at VMworld 2013

Pat Gelsinger, VMworld2013 Keynote, vSphere 5.5 storage changesVMworld2013 is going on in San Francisco this week. The big news is the roll out of network virtualization in NSX and vCloud Hybrid Service (vCHS) but there were a few tidbits in the storage arena worth discussing.

  • Virtual SAN public beta – VSAN was released as a public beta and customers can now download a copy of VSAN from www.vsanbeta.com. VSAN will construct a pool of storage out of local attached disks and flash across two or more hosts. It uses the flash as a read-write cache for the local disks. With VSAN customers can elect to have multiple tiers of storage be supported within a single VSAN pool, as well as support different availability (replication) levels, and some other, select characteristics. VSAN can easily scale in performance and capacity by just adding more hosts that have local storage. Now all that stranded local storage and flash server level resources can be used as a VM storage pool. VMware stated that they see VSAN as usefull for tier 2/tier 3 application storage and/or backup-archive storage uses. However they showed one chart with a View Planner application simulation using a 3-host VSAN (presumably with lots of SSD and disk storage) compared against an all-flash array (vendor unknown). In this benchmark the VSAN exactly matched the all-flash external storage in performance (VMs supported). [late update] Lot’s of debate on what VSAN means to enterprise storage but it appears to be a limited in scope and mainly focused on SMB applications.  Chad Sakac did a (real) lengthy post on EMC’s perspective on VSAN and Software Defined Storage if you want to know more check it out.
  • Virsto – VMware announced GA of Virsto which uses any external storage and creates a new global storage pool out of them. Apparently, it maps a log structured file system across the external SAN storage. By doing this it sequentializes all the random write IO coming off of ESX hosts. It supports thin provisioning, snapshot and read-write clones. One could see this as almost a write cache for VM IO activity but read IOs are also by definition spread across (extremely wide striped) across the storage pool which should improve read performance as well. You configure external storage as normal and present those LUNs to Virsto which then converts that storage pool into “vDisks” which can then be configured as VM storage. Probably more to see here but it’s available today. Before acquisition one had to install Virsto into each physical host that was going to define VMs using Virsto vDisks. It’s unclear how much Virsto has been integrated into the hypervisor but over time one would assume like VSAN this would be buried underneath the hypervisor and be available to any vSphere host.
  • vSphere Flash Read Cache – customers with PCIe flash cards and vCenter Ops Manager, can now use them to support a read cache for data access. vSphere Flash Read Cache is apparently vmotion aware such that as you move VMs from one ESX host to another the read cache buffer will move with it. Flash Read Cache is transparent to the VMs and can be assigned on a VMDK basis.
  • vSphere 5.5 low-latency support – unclear what VMware actually did but they now claim vSphere 5.5 now supports low latency applications, like FinServ apps. They claim to have reduced the “jitter” or variability in IO latency that was present in previous versions of vSphere. Presumably they shortened the IO and networking paths through the hypervisor which should help.  I suppose if you have a VMDK which ends up on an SSD storage someplace one can have a more predictable response time. But the critical question is how much overhead does the hypervisor IO path add to the base O/S. When all-flash arrays now sporting latencies under 100 µsecs, adding another 10 or 100 µsecs can make a big difference. In VMware’s quest to virtualize any and all mission critical apps, low-latency apps are one of the last bastions of physical server apps left to conquer. Consider this a step to accommodate them.
  • vVols – VMware keeps talking about vVols as an attempt to extend their VSAN “policy driven control plane” functionality out to networked storage but there’s still no GA yet. The (VASA 2 or vVol) spec’s seem to be out for awhile now, and I have heard from at least two “major” vendors that they have support in place today but VMware still isn’t announcing formal availability yet. Unclear what the hold up is, but maybe the spec’s are more in a state of flux than what’s depicted externally.

Most of this week was spent talking about NSX, VMware’ network virtualization and vCloud Hybrid Services. When they flashed the list of NSX partners on the screen Cisco was absent. Not sure what this means but perhaps there’s some concern that NSX will take revenue away from Cisco.

As for vCHS apparently this is a VMware run public cloud with two now expanding to three data centers in US, that customers can use to support their own hybrid cloud services. VMware announced that SAVVIS is now offering vCHS services as well as VMware with data centers in NY and Chicago.  There was some talk about vCHS offering object storage services like Amazon’s S3 but there was nothing specific about when. [Late update] Pat did mention that a future offering will provide DR-as-a-Service using vCHS as a target for SRM. That seems to be matching what Microsoft seems to be planning for Azzure and Hyper-V DR.

That’s about it as far as I can tell. Didn’t hear any other news on storage changes in vSphere 5.5. But this is the year of network virtualization. Can’t wait to see what they roll out next year.

EMCworld 2013 Day 2

IMG_1382The first session of the day was with  Joe Tucci EMC Chairman and CEO.  He talked about the trends transforming IT today. These include Mobile, Cloud, Big Data and Social Networking. He then discussed  IDC’s 1st, 2nd and 3rd computing platform framework where the first was mainframe, the second was client-server and the third is mobile. Each of these platforms had winers and losers.  EMC wants definitely to be one of the winners in the coming age of mobile and they are charting multiple paths to get there.

Mainly they will use Pivotal, VMware, RSA and their software defined storage (SDS) product to go after the 3rd platform applications.  Pivotal becomes the main enabler to help companies gain value out of the mobile-social networking-cloud computing data deluge.  SDS helps provide the different pathways for companies to access all that data. VMware provides the software defined data center (SDDC) where SDS, server virtualization and software defined networking (SDN) live, breathe and interoperate to provide services to applications running in the data center.

Joe started talking about the federation of EMC companies. These include EMC, VMware, RSA and now Pivotal. He sees these four brands as almost standalone entities whose identities will remain distinct and seperate for a long time to come.

Joe mentioned the internet of things or the sensor cloud as opening up new opportunities for data gathering and analysis that dwarfs what’s coming from mobile today. He quoted IDC estimates that says by 2020 there will be 200B devices connected to the internet, today there’s just 2 to 3B devices connected.

Pivotal’s debut

Paul Maritz, Pivotal CEO got up and took us through the Pivotal story. Essentially they have three components a data fabric, an application development fabric and a cloud fabric. He believes the mobile and internet of things will open up new opportunities for organizations to gain value from their data wherever it may lie, that goes well beyond what’s available today. These activities center around consumer grade technologies  which 1) store and reason over very large amounts of data; 2) use rapid application development; and 3) operate at scale in an entirely automated fashion.

He mentioned that humans are a serious risk to continuous availability. Automation is the answer to the human problem for the “always on”, consumer grade technologies needed in the future.

Parts of Pivotal come from VMware, Greenplum and EMC with some available today in specific components. However by YE they will come out with Pivotal One which will be the first framework with data, app development and cloud fabrics coupled together.

Paul called Pivotal Labs as the special forces of his service organization helping leading tech companies pull together the awesome apps needed for the technology of tomorrow, consisting of Extreme programming, Agile development and very technically astute individuals.  Also, CETAS was mentioned as an analytics-as-a-service group providing such analytics capabilities to gaming companies doing log analysis but believes there’s a much broader market coming.

IMG_1393Paul also showed some impressive numbers on their new Pivotal HD/HAWQ offering which showed it handled many more queries than Hive and Cloudera/Impala. In essence, parts of Pivotal are available today but later this year the whole cloud-app dev-big data framework will be released for the first time.

IMG_1401Next up was a media-analyst event where David Goulden, EMC President and COO gave a talk on where EMC has come from and where they are headed from a business perspective.

Then he and Joe did a Q&A with the combined media and analyst community.  The questions were mostly on the financial aspects of the company rather than their technology, but there will be a more focused Q&A session tomorrow with the analyst community.

IMG_1403 Joe was asked about Vblock status. He said last quarter they announced it had reached a $1B revenue run rate which he said was the fastest in the industry.  Joe mentioned EMC is all about choice, such as Vblock different product offerings, VSpex product offerings and now with ViPR providing more choice in storage.

Sometime today Joe had mentioned that they don’t really do custom hardware anymore.  He said of the 13,000 engineers they currently have ~500 are hardware engineers. He also mentioned that they have only one internally designed ASIC in current shipping product.

Then Paul got up and did a Q&A on Pivotal.  He believes there’s definitely an opportunity in providing services surrounding big data and specifically mentioned CETAS as offering analytics-as-a-service as well as Pivotal Labs professional services organization.  Paul hopes that Pivotal will be $1B revenue company in 5yrs.  They already have $300M so it’s well on its way to get there.

IMG_1406Next, there was a very interesting media and analyst session that was visually stimulating from Jer Thorp, co-founder of The Office for Creative Research. And about the best way to describe him is he is a data visualization scientist.

IMG_1409He took some NASA Kepler research paper with very dry data and brought it to life. Also he did a number of analyzes of public Twitter data and showed twitter user travel patterns, twitter good morning analysis, twitter NYT article Retweetings, etc.  He also showed a video depicting people on airplanes around the world. He said it is a little known fact but over a million people are in the air at any given moment of the day.

Jer talked about the need for data ethics and an informed data ownership discussion with people about the breadcrumbs they leave around in the mobile connected world of today. If you get a chance, you should definitely watch his session.IMG_1410

Next Juergen Urbanski, CTO T-Systems got up and talked about the importance of Hadoop to what they are trying to do. He mentioned that in 5 years, 80% of all new data will land on Hadoop first.  He showed how Hadoop is entirely different than what went before and will take T-Systems in vastly new directions.

Next up at EMCworld main hall was Pat Gelsinger, VMware CEO’s keynote on VMware.  The story was all about Software Defined Data Center (SDDC) and the components needed to make this happen.   He said data was the fourth factor of production behind land, capital and labor.

Pat said that networking was becoming a barrier to the realization of SDDC and that they had been working on it for some time prior to the Nicera acquisition. But now they are hard at work merging the organic VMware development with Nicera to create VMware NSX a new software defined networking layer that will be deployed as part of the SDDC.

Pat also talked a little bit about how ViPR and other software defined storage solutions will provide the ease of use they are looking for to be able to deploy VMs in seconds.

Pat demo-ed a solution specifically designed for Hadoop clusters and was able to configure a hadoop cluster with about 4 clicks and have it start deploying. It was going to take 4-6 minutes to get it fully provisioned so they had a couple of clusters already configured and they ran a pseudo Hadoop benchmark on it using visual recognition and showed how Vcenter could be used to monitor the cluster in real time operations.

Pat mentioned that there are over 500,000 physical servers running Hadoop. Needless to say VMware sees this as a prime opportunity for new and enhanced server virtualization capabilities.

That’s about it for the major keynotes and media sessions from today.

Tomorrow looks to be another fun day.