IBM Support

IV96606: UNDETECTED DATA LOSS AFTER STORAGE ERRORS WITH CERTAIN ADAPTERS APPLIES TO AIX 7200-00

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • **************************************************************
    * USERS AFFECTED:
    * Systems running the AIX 7200-00 Technology Level
    * with devices.pciex.df1060e214103404.com below the 7.2.0.4
    * level,
    * devices.fcp.disk.rte below the 7.2.0.4 level,
    * devices.pci.df1000f7.com below the 7.2.0.4 level, and
    * devices.pci.77102224.com below the 7.2.0.4 level.
      **************************************************************
    * ERROR DESCRIPTION:
    * On an AIX or VIOS LPAR using a physical Fibre Channel
    * adapter or Virtual Fibre Channel (NPIV) adapter, with
    * certain storage devices (see below), if communication
    * between the LPAR and the storage device is severed and
    * there are multiple writes to the same block happening at
    * that time, after the path fails, the driver may retry I/Os
    * down an alternate path too quickly and data may be written
    * to the device in a different order than it is completed to
    * the application, possibly resulting in undetected data loss.
    *
    * We have seen this, for example, when testing a link drop by
    * pulling FC cables between LPARs and storage.
    *
    * We have seen this issue occur when testing the following
    * storage devices:
    * - IBM Flash Systems
    * - IBM San Volume Controller (SVC) with caching turned off
    *   for the volume
    * - IBM Storwize family products with caching turned off
    * for the volume
    *
    * This issue CANNOT occur with the following storage devices:
    * - IBM DS8000 series
    * - IBM San Volume Controller (SVC) with caching turned on
    *   for the volume
    * - IBM Storwize family products with caching turned on
    * for the volume
    * - IBM XIV family
    * - EMC Symmetrix family
    *
    * Storage devices not specifically mentioned above should be
    * assumed to be exposed to this problem.
    *
    * This issue also cannot occur when reserve_policy for the
    * disks is set to single_path.
      **************************************************************
    * RECOMMENDATION:
    * Install APAR IV96606.
    * Prior to fix availability, an interim fix is available from
    * either
    * ftp://aix.software.ibm.com/aix/ifixes/iv96606/
    * https://aix.software.ibm.com/aix/ifixes/iv96606/
    * The ifix can be installed using Live Update (LU).
    * If LU is not used, installation of the ifix requires a
    * reboot.
      **************************************************************
    

Local fix

  • LOCAL FIX:
    If possible, changing the reserve_policy to single_path will
    avoid this problem because a LUN RESET will be triggered
    when switching paths.
    

Problem summary

  • On an AIX or VIOS LPAR using certain Fibre Channel adapters,
    if communication between the LPAR and the storage device is
    severed and there are multiple writes to the same block
    happening at that time, after the path fails, the driver may
    retry I/Os down an alternate path too quickly, and data may
    be written to the device in a different order than it is
    completed to the application, possibly resulting in
    undetected data loss.
    

Problem conclusion

  • After certain FC adapter errors, where the host does not know
    if a particular aborted command may still be completed by the
    storage,
    the host performs additional recovery by sending LUN RESET to
    ensure
    all aborted commands are flushed from the storage.
    

Temporary fix

  •   *********
      * HIPER *
      *********
    

Comments

APAR Information

  • APAR number

    IV96606

  • Reported component name

    AIX V7.2

  • Reported component ID

    5765CD200

  • Reported release

    720

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Submitted date

    2017-05-25

  • Closed date

    2017-05-25

  • Last modified date

    2018-09-21

  • APAR is sysrouted FROM one or more of the following:

    IV96553

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX V7.2

  • Fixed component ID

    5765CD200

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSVEF8","label":"AIX 7.2 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"720","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11S","label":"AIX 7.2 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"720","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
21 September 2018