Recovering from disk failure

When a disk hardware failure occurs and an entire unit is lost, you can recover from this situation.

Symptoms

No I/O activity occurs for the affected disk address. Databases and tables that reside on the affected unit are unavailable.

Resolving the problem

Operator response:

  1. Assure that no incomplete I/O requests exist for the failing device. One way to do this is to force the volume offline by issuing the following z/OS® command, where xxx is the unit address:
    VARY xxx,OFFLINE,FORCE

    To check disk status, issue the following command:

    D U,DASD,ONLINE

    The following console message is displayed after you force a volume offline:

     
      UNIT  TYPE  STATUS   VOLSER  VOLSTATE
      4B1   3390  O-BOX    XTRA02  PRIV/RSDNT 

    The disk unit is now available for service.

    If you previously set the I/O timing interval for the device class, the I/O timing facility terminates all requests that are incomplete at the end of the specified time interval, and you can proceed to the next step without varying the volume offline. You can set the I/O timing interval either through the IECIOSxx z/OS parameter library member or by issuing the following z/OS command:

    SETIOS MIH,DEV=devnum,IOTIMING=mm:ss.
  2. Issue (or request that an authorized operator issue) the following Db2 command to stop all databases and table spaces that reside on the affected volume:
    -STOP DATABASE(database-name) SPACENAM(space-name)

    If the disk unit must be disconnected for repair, stop all databases and table spaces on all volumes in the disk unit.

  3. Select a spare disk pack, and use ICKDSF to initialize from scratch a disk unit with a different unit address (yyy) and the same volume serial number (VOLSER).
      // Job
      //ICKDSF   EXEC PGM=ICKDSF
      //SYSPRINT DD   SYSOUT=*
      //SYSIN    DD   *
           REVAL UNITADDRESS(yyy) VERIFY(volser) 

    If you initialize a 3380 or 3390 volume, use REVAL with the VERIFY parameter to ensure that you initialize the intended volume, or to revalidate the home address of the volume and record 0. Alternatively, use ISMF to initialize the disk unit.

  4. Issue the following z/OS console command, where yyy is the new unit address:
    VARY yyy,ONLINE
  5. To check disk status, issue the following command:
    D U,DASD,ONLINE
    The following console message is displayed:
      UNIT  TYPE  STATUS   VOLSER  VOLSTATE
      7D4   3390  O        XTRA02  PRIV/RSDNT
    
  6. Delete all table spaces (VSAM linear data sets) from the ICF catalog by issuing the following access method services command for each one of them, where y is either I or J:
    DELETE catnam.DSNDBC.dbname.tsname.y0001.Annn CLUSTER NOSCRATCH
    where nnn is the data set or partition number, left padded by 0 (zero).
  7. For user-managed table spaces, define the VSAM cluster and data components for the new volume by issuing the access method services DEFINE CLUSTER command with the same data set name as in the previous step, in the following format:
    catnam.DSNDBx.dbname.tsname.y0001.znnn
    The y is I or J, the x is C (for VSAM clusters) or D (for VSAM data components), and znnn is the data set or partition number, left padded by 0 (zero). For more information, see Data set naming conventions.
  8. For a user-defined table space, define the new data set before an attempt to recover it. You can recover table spaces that are defined in storage groups without prior definition.
  9. Issue the following Db2 command to start all the appropriate databases and table spaces that were previously stopped:
    -START DATABASE(database-name) SPACENAM(space-name)
  10. Recover the table spaces by using the Db2 RECOVER utility.