Understanding host error recovery procedures

There can be times when PPRC cannot replicate an update to a primary device onto the secondary device. In these cases, the primary device sends channel end, device end, and unit check status to the host that encountered the error. With the PPRC error recovery procedures (ERP) PTF that is installed, the host system performs the following actions:

  1. The system stops all application I/O, for that host, to the primary volume in error. The host system also prevents other system images from accessing this volume.
  2. The system writes an IEA49xx message to the log to indicate that the PPRC volume is now in suspended state or duplex state. (Ensure that the systems' log is common to both the primary system and the recovery system.)
  3. The system puts information that is related to the specific failure into the SYS1.LOGREC data set for service personnel reference. SYS1.LOGREC can be common to both the primary system and the recovery system.
  4. The system waits for enough time to let the IEA49xx message reach the recovery system. You can introduce an automation routine here that would receive control before the I/O operation can complete.
  5. The system resumes all host application I/O to the primary volume. If you specified the CRIT(NO) parameter when you established the primary volume, the storage control allows subsequent I/O operations to continue without being unit checked. If you specified the CRIT(YES) parameter on the CESTPAIR command for the primary volume, then the storage control does the following:
    • Checks all subsequent write I/O operations.
    • Allows the host application to follow its own recovery actions.
    Note: An "A" in the unformatted CQUERY message ANTP0091I indicates that the volume pair is designated as CRIT(YES-ALL). This means that the pair was established with the CRIT(YES) option. The storage control maintenance panel is set to inhibit writes on any failure, which includes secondary device failures.