IBM 2011 May 24 GDPS/Active-Active announcement

In GDPS, IBM, z/OSby Administrator

IBM announced new DR and BC capabilities for their System z environment based on GDPS and application specific asynchronous replication services.

IBM GDPS today

The GDPS product family includes a number of solutions ranging from synchronous systems such as HyperSwap and PPRC, to asynchronous systems like Global Mirror and XRC, to combinations that support both such an MzGM and MGM. However most of GDPS solutions have an RPO that can range from 0 to a matter of minutes, but have an RTO that takes 30 minutes or more. This is because current data replication capabilities, whether asynchronous or synchronous, leave data in a “power failed” state, which requires applications and/or middleware processing to condition replicated data for further processing.

GDPS/PPRC Hyper Swap manager combined with base PPRC functionality together support both an active/active and active/standby mode of operation but this is limited to up to 20 KM or less, mainly due to the need for Coupling Facilities (CF) duplexing and the attempt to minimize performance impacts on Sysplex execution. As such, there was a definite gap for something providing active/active recovery but at much greater distances.

Moreover, the other problem with disk replication is the granularity of recovery, essentially a disk storage system. Thus, when disk based replication failover occurs all disk storage must be switched over and normally this encompasses multiple applications

IBM GDPS/Active-Active

Given the above and other requirements from IBM’s large customers, the need for an alternative recovery approach had emerged. IBM announced GDPS active/active continuous availability offerings to address these needs.

GDPS/Active-Active is a composite solution. It all starts at the top of the system stack with multiple, redundant SASP (Server/Application State Protocol) transaction routers available from Cisco (CSM), F5 (BIG-IP) and others which accepts transactions from the cloud and routes them to servers for execution. Underneath the SASP routers, GDPS/Active-Active uses cloned z/OS® Sysplex server data centers with application specific asynchronous replication, and GDPS control monitoring/failover automation. With all this in place, Sysplex or application failures at a primary site can be detected and new routing instructions can be loaded in the SASP routers to switch new transactions to execute at the alternate site with up-to-date data.

In their first iteration IBM announced support for “Active/Standby” mode. This means the secondary site will execute applications in standby mode to support the asynchronous replication and keep the data updated. But when a failure is detected, GDPS automation can kick in to fire up more CPUs and other housekeeping activities to bring up the secondary site in a matter of seconds and reload the routing tables in the SASP routers.

GDPS/Active-Active supports policies that can completely automate the failover or issue an alert for an operator intervention to decide whether to switchover to the alternate site. Also, the GDPS monitors heartbeats from a number of system components and has a failure detection interval (default: 60 seconds), which is used to determine when a component has failed.

In IBM’s lab they were able to go from a failed primary site to an operating secondary site in 150 seconds. Given that the failure detection interval takes up the first 60 seconds, to bring up the secondary site and change transaction routing in 90 seconds is pretty impressive. However, IBM cautioned that these were lab results and your timing may vary.

GDPS/Active-Active application asynchronous replication is provided by IBM InfoSphere Replication Server for z/OS (for DB2) and InfoSphere IMS replication for z/OS, supporting the two IBM databases available for z/OS. Included in the GDPS Active/Active solution is one or more GDPS controllers that provide the heartbeat monitoring and failover orchestration, in coordination with the IBM Multi-site Workload Lifeline, needed to detect and recover from failures.

IBM issued a statement of direction to support an Active/Query mode of operation that would allow “query-only” access to DB2 and IMS databases at the secondary site while transactions are processed at the primary site. The time frame for delivery of Active/Query operations is within the next two years.

Announcement significance

IBM continues to improve z/OS capabilities to better support enterprise mission critical applications. In addition to the GDPS/Active-Active solution, IBM released GDPS v3.8 with a number of enhancements such as STP (Server Time Protocol) support, higher availability and new simplified management capabilities.

However, while IBM introduced Active/Standby and will supply Active/Query configurations, where is Active/Active mode? I believe that somewhere, deep inside IBM there is a roadmap to get to Active/Active configuration but it may involve significantly more than just mere asynch application replication. Whenever Active/Active mode arrives, z/OS will finally have continuous availability and workload balancing in real-time, across geographically distributed Sysplexes.

[This announcement summary dispatch was originally sent out to our newsletter subscribers in May of 2011.  If you would like to receive this information via email please consider signing up for our free monthly newsletter (see subscription request, above right) or subscribe by email and we will send our current issue along with download instructions for this and other reports.]

—-

Silverton Consulting, Inc. is a Storage, Strategy & Systems consulting services company, based in the USA offering products and services to the data storage community.