Coded character sets and CCSIDS

IBM®'s character data representation architecture (CDRA) deals with the differences in string representation and encoding. The Coded Character Set Identifier (CCSID) is a key element of this architecture. A CCSID is a 2 byte (unsigned) binary number that uniquely identifies an encoding scheme and one or more pairs of character sets and code pages.

A CCSID is an attribute of strings, just as length is an attribute of strings. All values of the same string column have the same CCSID.

Character conversion is described in terms of CCSIDs of the source and target. With DB2® for z/OS®, two methods are used to identify valid source and target combinations and to perform the conversion from one coded character set to another:

  • DB2 catalog table SYSIBM.SYSSTRINGS

    Each row in the catalog table describes a conversion from one coded character set to another.

  • z/OS support for Unicode

    For more information about the conversion services that are provided, including a complete list of the IBM-supplied conversion tables, see z/OS Support for Unicode: Using Conversion Services.

In some cases, no conversion is necessary even though the strings involved have different CCSIDs.

Different types of conversions might be supported by each database manager. Round-trip conversions attempt to preserve characters in one CCSID that are not defined in the target CCSID so that if the data is subsequently converted back to the original CCSID, the same original characters result. Enforced subset match conversions do not attempt to preserve such characters. Which type of conversion is used for a specific source and target CCSID is product-specific.

For more information on character conversion, see DB2 Installation Guide.