Improving z/OS WebSphere MQ availability: Structure failure or connectivity loss?
Marcela Adan 100000THF9 firstname.lastname@example.org | | Tags:  resiliency cics loss cf availability coupling websphere messaging connectivity z/os cfconlos facility mq
0 Comments | 4,063 Visits
By Barry Dearfield
CICS and Messaging Middleware for z/OS consultant
IBM STG Lab Services
Recently I participated in the IBM WebSphere MQ V7.1 and V7.5 Features and Enhancements IBM Redbooks publication project. It was a great experience and gave me the chance to meet and work with great bunch of WebSphere MQ experts including some members of the IBM Hursley development lab.
One of the chapters I wrote, revolves around improving z/OS WebSphere MQ resiliency with a new attribute, CFCONLOS, that allows control over whether a queue manager terminates or tolerates the loss when a connection to a coupling facility (CF) or coupling facility structure is lost.
I was able to have the testing environment set up so that it included multiple coupling facilities. One CF was configured so that it was only connected to the two LPARs on which I was testing. This allowed me to move one of the application structures used by the queue sharing group into this structure.
In the first test, the coupling facility resource management (CFRM) policy for the application structure had another CF defined as an alternate in which it could be rebuilt. The CFSTRUCT CFCONLOS attribute was set to terminate and the INJERROR was used to cause the structure to fail. Basically the queue managers disconnected and a system-managed rebuild was requested. The structure was rebuilt in the second CF and the queue managers reconnected and a RECOVER CFSTRUCT was automatically requested. I had expected the queue manager to terminate but in retrospect I realize that connectivity to the CF was never lost.
Next, the application structure was moved back to the original coupling facility and the CFRM was changed so that the application did not have an alternate CF in which it could rebuild. I then caused the structure to fail again. The result was the same except that the structure
Next, the CF was varied offline to one of the LPARs, the queue manager on that LPAR abended. When the CF was varied off the other LPAR, the queue managers there also abended.
At this point I confirmed that the CFCONLOS attribute is related to the loss of the connectivity of the CF and not the structure in the CF. While a CF structure fails, the queue manager disconnects but that is not the same as a loss of connectivity to a CF. When a CF structure fails, the queue manager disconnects and a system-managed rebuild was requested. But when the CF fails the CFCONLOS attribute determines if the queue manager will abend or continue processing messages.
I discussed several tests that we performed in the lab environment, in the IBM Redbooks publication IBM WebSphere MQ V7.1 and V7.5 Features and Enhancements. The chapter Resiliency: Improving availability includes a working example with instructions on how to improve WebSphere MQ resiliency.