DB2 Version 10.1 for Linux, UNIX, and Windows

db2ts CREATE INDEX command

The db2ts CREATE INDEX command creates a text search index for a text column. You can then search the column data by using text search functions.

The text search index does not contain any data until you run the text search UPDATE INDEX command, or the DB2® Administrative Task Scheduler runs the UPDATE INDEX command according to the defined update frequency for the index.

For execution, you must prefix the command with db2ts at the command line.

Authorization

The authorization ID of the statement must include the SYSTS_MGR role and CREATETAB authority on the database and one of the following privileges or authority:
  • CONTROL privilege on the table on which the index will be defined
  • INDEX privilege on the table on which the index will be defined and one of the following authorities:
    • IMPLICIT_SCHEMA authority on the database, if the implicit or explicit schema name of the index does not exist
    • CREATEIN privilege on the schema, if the schema name of the index exists
  • DBADM authority

To schedule automatic index updates, the instance owner must have DBADM authority or CONTROL privileges on the administrative task scheduler tables.

Required connection

Database

Command syntax

Read syntax diagramSkip visual syntax diagram
>>-CREATE INDEX--index-name--FOR TEXT--------------------------->

>--ON--schema-name--table-name---------------------------------->

>--+-(--text-column-name--)----------------------+-------------->
   '-(--function-name--(--text-column-name--)--)-'   

>--| text default information |--| update characteristics |----->

>--| storage options |--| index configuration options |--------->

>--| partition options |--| connection options |---------------><

text default information

|--+---------------------+--+------------------+---------------->
   '-CODEPAGE--code-page-'  '-LANGUAGE--locale-'   

>--+----------------+-------------------------------------------|
   '-FORMAT--format-'   

update characteristics

|--+--------------------------------------------+--------------->
   '-UPDATE FREQUENCY--+-NONE-----------------+-'   
                       '-| update frequency |-'     

>--| incremental update characteristics |-----------------------|

update frequency

|--D-(-+-*------------+-)--H-(-+-*------------+-)--------------->
       | .-,--------. |        | .-,--------. |     
       | V          | |        | V          | |     
       '---integer1-+-'        '---integer2-+-'     

       .-,--------.     
       V          |     
>--M-(---integer3-+-)-------------------------------------------|

incremental update characteristics

|--+----------------------------+-------------------------------|
   '-UPDATE MINIMUM--minchanges-'   

storage options

|--+---------------------------------+-------------------------->
   '-COLLECTION DIRECTORY--directory-'   

>--+-------------------------------------------+----------------|
   '-ADMINISTRATION TABLES IN--tablespace-name-'   

index configuration options

|--+-----------------------------------------------+------------|
   |                        .-,----------------.   |   
   |                        V                  |   |   
   '-INDEX CONFIGURATION--(---| option-value |-+-)-'   

option-value

|--+----------------------------------------------+-------------|
   +-COMMENT--text -------------------------------+   
   +-UPDATEAUTOCOMMIT--+-+--------------------+-+-+   
   |                   | '-commitcount_number-' | |   
   |                   '-+------------+---------' |   
   |                     '-commitsize-'           |   
   +-COMMITTYPE--committype-----------------------+   
   +-COMMITCYCLES--commitcycles-------------------+   
   +-INITIALMODE--initialmode---------------------+   
   +-LOGTYPE--ltype-------------------------------+   
   +-AUXLOG--auxlog_value-------------------------+   
   '-CJKSEGMENTATION -cjksegmentation-method------'   

connection options

|--+----------------------------------------------------------------+--|
   '-CONNECT TO--database-name--+---------------------------------+-'   
                                '-USER--username--USING--password-'     

Command parameters

CREATE INDEX index-name
Specifies the name of the index (optionally schema qualified) to be created that uniquely identifies the text search index within the database. The index name must adhere to the naming restrictions for DB2 indexes.
ON table-name
The table name containing the text column. You cannot create text search indexes on the following tables:
  • federated tables
  • materialized query tables
  • views
text-column-name
The column name of the column to be indexed. The column must be of one of the following data types: CHAR, VARCHAR, CLOB, DBCLOB, BLOB, GRAPHIC, VARGRAPHIC, or XML. If the data type of the column is not one of these data types, use a transformation function that is specified with function-schema.function-name to convert the column type to one of the valid types. Alternatively, you can specify a user-defined external function that accesses the text documents that you want to index. You can create only a single text search index for a column.
function-name(text-column-name)
Specifies the schema qualified name, conforming to DB2 naming conventions, of an external scalar function that accesses text documents in a column that is not of a supported type for text searching. Performs a data type conversion of that value and returns the value as one of the supported data types for text searching. Its task is to perform a column type conversion. This function must take only one parameter and return only one value.
CODEPAGE code-page
Specifies the DB2 code page (CODEPAGE) to be used when indexing text documents. The default value is specified by the value in the view SYSIBMTS.TSDEFAULTS, where DEFAULTNAME='CODEPAGE'; (which happens to be the database code page). This argument only applies to binary data types, such as, the column type or return type from a transformation function must be BLOB or character-type FOR BIT DATA.
LANGUAGE locale
Specifies the language to be used by DB2 Text Search for language-specific processing of a document during indexing. If you do not specify a locale, the database territory determines the default setting for LANGUAGE. To have your documents that are automatically scanned to determine the locale, specify locale as AUTO.
FORMAT format
Specifies the format of text documents in the column. The supported formats include: TEXT, XML, HTML, and INSO. DB2 Text Search requires this information when indexing documents. If you do not specify the format, the default value is used. The default value is in the view SYSIBMTS.TSDEFAULTS, where DEFAULTNAME='FORMAT';. For columns of data type XML, the default format 'XML'; is used, regardless of the value of DEFAULTNAME. You must install rich text support to use the INSO format.
UPDATE FREQUENCY
Specifies the frequency with which index updates are made. The index is updated if the number of changes is at least the value that is set for UPDATE MINIMUM. The update frequency NONE indicates that no further index updates are made. An update frequency of NONE can be useful for a text column in a table with data that does not change. It is also useful if you intend to manually update the index (by using the UPDATE INDEX command). You can do automatic updates if you issue the START FOR TEXT command, and the DB2 Text Search instance services are running.

The default frequency value is taken from the view SYSIBMTS.TSDEFAULTS, where DEFAULTNAME=&apos;UPDATEFREQUENCY&apos;.

NONE
No further index updates are made. You must start the update manually.
D
The days of the week when the index is updated.
*
Every day of the week.
integer1
Specific days of the week, from Sunday to Saturday: 0 - 6
H
The hours of the specified days when the index is updated.
*
Every hour of the day.
integer2
Specific hours of the day, from midnight to 11 pm: 0 - 23
M
The minutes of the specified hours when the index is updated.
integer3
Specified as top of the hour (0), or in multiples of 5-minute increments after the hour: 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or 55
UPDATE MINIMUM minchanges
Specifies the minimum number of changes to text documents before the index is updated incrementally at the time that is specified in UPDATE FREQUENCY. Positive integer values only are allowed. The default value is taken from the view SYSIBMTS.TSDEFAULTS, where DEFAULTNAME='UPDATEMINIMUM'.
Note: This value is ignored during an UPDATE INDEX command (unless the USING UPDATE MINIMUM option is used there). A small value increases consistency between the table column and the text search index. However, it also increases the load for the system.
COLLECTION DIRECTORY directory
Specifies the directory in which the text search index collection is stored. You must specify the absolute path, where the maximum length of the absolute path name is 215 characters. The process owner of the Text Search server instance service must have read and write access on this directory.

The COLLECTION DIRECTORY parameter is supported only for an integrated text search server setup. Review the usage notes for additional information about collection locations.

ADMINISTRATION TABLES IN tablespace-name
Specifies the name of an existing nontemporary table space for the administration tables that are created for the index.

For a nonpartitioned database, if you do not specify a table space, the table space of the base table for which you are creating the index is used.

For a partitioned database, you must use the ADMINISTRATION TABLES parameter explicitly when text indexes are created. The table space must be in the same partition group as the table space of the base table to ensure that the staging tables for the text search index are distributed in the same manner as the corresponding base table.

INDEX CONFIGURATION (option-value)
Specifies more index-related values as option value string pairs. The following values are supported:
Table 1. Specifications for option-value
Option Value Data type Description
COMMENT text String value less than 512 bytes Adds a string comment value to the REMARKS column in the DB2 Text Search catalog view TSINDEXES. It also adds the string comment value as the description of the collection.
UPDATEAUTOCOMMIT commitsize -
  • This integer option specifies the number of index updates after which a commit is executed to automatically preserve the previous work for either initial or incremental updates.
    • For initial updates, the index update will process batches of documents from a base table after the trigger to capture data updates is activated. After the number of documents updated reaches the COMMITCOUNT number, the server does an interim commit. Log entries generated by unprocessed documents are removed from the staging table. Using the UPDATEAUTOCOMMIT parameter for an initial text index update leads to a significant increase of execution time.
    • For incremental updates, log entries which have been processed are removed correspondingly from the staging table with each interim commit. COMMITCOUNT counts the number of documents updated, not the number of staging table entries.
  • Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, this string option specifies the number of rows or number of hours after which a commit is run to automatically preserve the previous work for either initial or incremental updates.

    If you specify the number of rows:
    • After the number of documents that are updated reaches the COMMITCOUNT number, the server applies a commit. COMMITCOUNT counts the number of documents that are updated by using the primary key, not the number of staging table entries.
    If you specify the number of hours:
    • The text index is committed after the specified number of hours is reached. The maximum number of hours is 24.

    For initial updates, the index update processes batches of documents from the base table. After the commitsize value is reached, update processing completes a COMMIT operation and the last processed key is saved in the staging table with operational identifier '4'. This key is used to restart update processing either after a failure or after the number of specified commitcycles are completed. If a commitcycles is specified, the update mode is modified to incremental to initiate capturing changes by using the LOGTYPE BASIC option to create triggers on the text table. However, until the initial update is complete, log entries that are generated by documents that are not been processed in a previous cycle are removed from the staging table.

    Using the UPDATEAUTOCOMMIT option for an initial text index update leads to a significant increase of execution time.

    For incremental updates, log entries that are processed are removed correspondingly from the staging table with each interim commit.

COMMITTYPE committype String

Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, specifies rows or hours for the UPDATEAUTOCOMMIT index configuration option. The default is rows.

COMMITCYCLES commitcycles Integer

Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, specifies the number of commit cycles. The default is 0 for unlimited cycles.

If cycles are not explicitly specified, the update operation uses as many cycles as required based on the batch size that is specified with the UPDATEAUTOCOMMIT option to finish the update processing.

You can use this option with the UPDATEAUTOCOMMIT setting with a committype.

INITIALMODE initialmode String
Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, specifies how the updates are processed. The INITIALMODE index configuration option has three possible values:
  • FIRST, where the primary update is the initial update. This is the default value of the INITIALMODE option.
  • SKIP, where the update mode is immediately set to incremental, triggers are added for the LOGTYPE BASIC option, but no initial update is performed.
  • NOW, where the update is started after the index was created as final part of the CREATE INDEX operation. This option is only supported for single-node setups.
LOGTYPE ltype String Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, specifies whether triggers are added to populate the primary log table. It has two values:
  • BASIC, where the primary staging table is created and triggers added on the text table to recognize any changes. This is the default value.
  • CUSTOM, where the primary staging table is created, but no triggers on the text table. Ensure a mechanism to identify changes for incremental updates, especially if you do not plan to use the ALLROWS option for updates.
AUXLOG auxlog_value String Controls the creation of the additional log infrastructure to capture changes that are not recognized by a trigger. The default setting for range partitioned tables is ON, otherwise it is OFF. You can change the default value in the default table with a setting for AuxLogNorm for nonrange-partitioned tables and AuxLogPart for range-partitioned tables.
CJKSEGMENTATION CJKSEGMENTATION_method String value less than 512 bytes Specifies the segmentation method for documents in Chinese, Japanese, and Korean languages (zh_CN, zh_TW, ja_JP, ko_KR locale sets), including such documents when automatic language detection is enabled (LANGUAGE AUTO). If no option is specified, the value for CJKSEGMENTATION in the defaults table is applied. Supported values are:
  • MORPHOLOGICAL
  • NGRAM

The specified segmentation method is added to the SYSIBMTS. TSCONFIGURATION administrative view and cannot be changed after the text index is created.

Remember: You must enclose non-numeric values, such as comments, in single quotation marks. A single quotation mark character within a string value must be represented by two consecutive single quotation marks.
Example:
INDEX CONFIGURATION (COMMENT 'Index on User''s Guide column')
partition options
Reserved for internal IBM use.
CONNECT TO database-name
This clause specifies the database to which a connection is established. The database must be on the local system. If specified, this clause takes precedence over the environment variable DB2DBDFT. This clause can be omitted if the following statements are all true:
  • The DB2DBDFT environment variable is set to a valid database name.
  • The user running the command has the required authorization to connect to the database server.
USER username USING password
This clause specifies the authorization name and password that are used to establish the connection.

Usage notes

All limits and naming conventions that apply to DB2 database objects and queries, also apply to DB2 Text Search features and queries. DB2 Text Search related identifiers must conform to the DB2 naming conventions. Also, there are some additional restrictions. For example, these identifiers can be of the form:
[A-Za-z][A-Za-z0-9@#$_]*
or
"[A-Za-z ][A-Za-z0-9@#$_ ]*"
With the successful execution of the CREATE INDEX command:
  • The DB2 Text Search server data is updated. A collection of name instance_database-name_index-identifier_number is created per database partition, as in the following example:
    tigertail_MYTSDB_TS250517_0000
    You can retrieve the collection name from the SYSIBMTS.TSCOLLECTIONNAMES view (column COLLECTIONNAME).
  • The DB2 Text Search catalog information is updated. An index staging table is created in the specified table space with appropriate DB2 indexes. In addition, an index event table is created in the specified table space. If the auxlog on option is set, a second staging table is created to capture changes via integrity processing.
  • When DB2 Text Search coexists with DB2 Net Search Extender and an active Net Search Extender index exists for the table column, the new text index is set to inactive.
  • The newly created text search index is not automatically populated. The UPDATE INDEX command must be executed either manually or automatically (as a result of an update schedule being defined for the index through the specification of the UPDATE FREQUENCY option) for the text search index to be populated.
  • If you specify a frequency, a schedule task is created for the DB2 Administrative Scheduler.
Usage restrictions:
  • You must define a primary key for the table. In DB2 Text Search, you can use a multicolumn DB2 primary key without type limitations. The number of primary key columns is limited to two columns less than the number of primary key columns that are allowed by DB2.
  • The total length of all primary key columns for a table with DB2 Text Search indexes is limited to 15 bytes less than the maximum total primary key length allowed by DB2. Refer to the restrictions of the DB2 CREATE INDEX statement.
You cannot issue multiple commands concurrently on a text search index if they might conflict. If you issue this command while a conflicting command is running, an errors occur and the command fails, after which you can try to run the command again. A conflicting command is:
  • DISABLE DATABASE FOR TEXT

You cannot change the auxiliary log property for a text index after the index is created.

The AUXLOG option is supported for nicknames for data columns that support an MQT with deferred refresh. It is not supported for views.

You can specify the UPDATEAUTOCOMMIT index configuration option without type and cycles for compatibility with an earlier version. It is associated by default with the COMMITTYPE rows option and unrestricted cycles.

The UPDATEAUTOCOMMIT, COMMITTYPE, and COMMITSIZE index configuration options are also specified for an UPDATE INDEX operation to override the configured values. Values that you submit for a specific update are applied only once and not persisted.

With theINITIALMODE SKIP option, the text search index manager is responsible to populate the index appropriately. This option allows control of the sequence in which data from the text table will initially be processed.

If you use the LOGTYPE index configuration option, the following rules apply:
  • When you use the LOGTYPE CUSTOM setting, use the SYSIBMTS.TSSTAGING administrative view to insert log entries for new, changed, and deleted documents.
  • Check the value of LOGTYPE option in the SYSIBMTS.TSCONFIGURATION administrative view to view the setting for an index.
  • Check the value of LOGTYPE option in the SYSIBMTS.TSDEFAULTS administrative view to view the default log type that is applied to newly created text indexes,
  • Note: TheLOGTYPE option is not valid with the ALLROWS option in the CREATE INDEX operation, as ALLROWS forces an initial update and no log tables are created in this case.
You must satisfy the following conditions in a partitioned database environment:
  • Text search index-specific administration tables, such as staging tables, and text search indexes are distributed in a manner like the corresponding base table. Use the ADMINISTRATION TABLES IN clause while text search indexes are created, so that the specified table space is in the same partition group as the table space of the base table.

The CJKSEGMENTATION option applies to zh_CN, zh_TW, ja_JP and ko_KR locale sets for Chinese, Japanese, and Korean languages. The MORPHOLOGICAL or NGRAM option that is specified for the segmentation method is added to the SYSIBMTS.TSCONFIGURATION administration view.

Creating an index with the LANGUAGE parameter set to the AUTO option allows CJKSEGMENTATION specification as an option. The specified segmentation method applies to Chinese, Japanese, and Korean language documents. You cannot change the value that is set for the CJKSEGMENTATION_method value after index creation is complete.

If you create a text search index with the LANGUAGE parameter set to AUTO and CJKSEGMENTATON set to MORPHOLOGICAL, then searches for valid strings on a morphological index might not return the expected results. In such a case, use the QUERYLANGUAGE option of the CONTAINS function to obtain the results, as shown in the following sample statement:
select bookname from ngrambooks where contains (story, '军书','QUERYLANGUAGE=zh_CN') = 1

If you use the INITIALMODE SKIP option, combined with the LOGTYPE ON and AUXLOG ON options, you must manually insert the log entries into the staging table, but only for the initial update. All subsequent updates are handled automatically.