IBM FileNet P8, Version 5.2

Content-Based Retrieval

The Content Engine Java™ and .NET APIs include a number of interfaces for content-based retrieval (CBR) administrative functions for IBM® FileNet® Content Search Engine. Using the APIs, you can configure domains and servers, establish and configure index areas, and initiate and manage index jobs.

Indexing and Index Jobs

IBM Content Search Services indexing aggregates data (in the form of indexes) to support full-text searches. Only objects and string properties enabled for CBR are included in full-text searches. Whether an object is CBR-enabled is determined by the value of the IsCBREnabled property on the ClassDefinition and PropertyDefinitionString objects associated with the class of the object (Document, Annotation, Folder, or CustomObject class or subclass). For more information about enabling CBR, see Content Searches. Indexing is done automatically for all CBR-enabled objects and properties. Because the indexing operation is a batched, asynchronous operation, its results are not immediately evident.

The IndexJob class enables you to track the status of an index job, as well as initiate and control the job. Usually, you initiate an index job to rebuild an index that is corrupted, or to accommodate a configuration change. The IndexRequests property on an IndexJob object provides a listing of all index requests associated with the index job. The CmTextSearchIndexRequest classes provide read and update operations, as well as status and tracking information for index requests.

The IndexJobItem base class is subclassed to provide for particular types of index jobs:

Class index job (IndexJobClassItem): All instances of the specified class are full-text indexed. Class index jobs require a table scan on the database, even if the amount of data to be indexed is minimal. A significant amount of time is required to perform a table scan on a large table. The database table scans are performed once for all classes to be indexed. To minimize the number of table scans required, use a single index job operation for all classes to be indexed for the same table.
Single object index job (IndexJobSingleItem): A single object is full-text indexed.
Reindex index job: An existing index is re-indexed.

During an indexing operation, all currently indexed data is available for use in full-text searches. However, new index data might become available while a full-text query is in progress. In this case, the query returns duplicate matches, because it uses both old and new indexed data. Old copies of indexed data are removed when the index job completes, after which duplicate matches will no longer occur.

Canceling an Index Job

An index job can be canceled by setting its JobAbortRequested property to true, or as a result of an unexpected error. When an index job is canceled, all of its related index requests are deleted. If the index job is canceled by an administrator, the JobStatus property on the IndexJob object is set to CANCELED. If the index job terminates as the result of an unexpected error, the JobStatus property will have a value of TERMINATED_ABNORMALLY. Canceling an index job can leave the affected indexes in an inconsistent state.

If you cancel an index job for a reindex operation or cancel an index job for a root class with subclasses, there are special considerations; these types of index jobs create new indexes that will replace the original indexes. While the new indexes are being populated, the original indexes will maintain their original entries. Therefore, duplicate entries will exist until the index job completes successfully. When all the index requests for the index job have completed, the index job will delete the original indexes. However, if this index job is canceled, the original indexes will remain and contain duplicates of the entries created by the index job in the new indexes.

It is recommended that the administrator resolve this situation by creating index jobs to reindex the original indexes, depending on the type of index job that is canceled:

Index job for a reindex operation: If this type of index job is canceled, the resource state of the original index will remain set to CLOSED. It is recommended that the administrator create another index job on this CLOSED index.
Index job for a root class with subclasses: If this type of index job is canceled, there might be multiple indexes having a resource state set to CLOSED. It is recommended that the administrator either create another index job on the root class with subclasses or create an index jobs for each of the CLOSED indexes.

For more information, see Status of Index Areas and Indexes.

Pausing and Resuming an Index Job

If you pause an index job, by setting the JobPauseRequested property of an IndexJob object to true, the JobStatus property of the index job is updated to IndexJobStatus.PAUSED, and the dispatching of new index requests by the index job is halted. Existing index requests will not be paused. If you resume an index job, by setting the JobPauseRequested property of an IndexJob object to false, the JobStatus property of the index job is updated to IndexJobStatus.IN_PROGRESS, and the dispatching of new index requests by the index job is allowed.

Indexing Error Handling

All indexing errors of any kind are recorded in the Content Engine log file. Optionally, the indexing errors also can be persisted to an object store. The ObjectStore property IndexingFailureRecordingLevel determines whether indexing errors are persisted. The default behavior is for indexing errors to be logged only, rather than persisted.

Indexing Failure Codes

In the case of an indexing failure of a CBR-enabled object, the CmIndexingFailureCode property of the CBR-enabled object is set to an IndexingFailureCode failure code constant and the failure is recorded in the Content Engine log file. If you set the value of the IndexingFailureRecordingLevel property of an ObjectStore object to PROPAGATE_TO_SOURCE, the error information is propagated to the CmIndexingFailureCode property of all CBR-enabled objects.

The indexing failure code is recorded only when the associated object has been processed by the Content Search Engine and some or all of the object's content could not be full-text indexed (due, for example, to text size limits causing truncation, or use of an unsupported format type). An indexing failure code does not necessarily mean that the object was not at least partially indexed; the server will attempt index as much of the object as it can. If a CBR-enabled object has been successfully indexed, its CmIndexingFailureCode property will have a value of zero.

System failures do not generate an indexing failure code. In such cases, the indexing operation is retained, a description of the error is written to the Content Engine log file, the error is recorded as part of the index request last failure description, and the indexing operation is automatically retried when the system is in a stable state.

Indexing Request Errors

The cause of an indexing request error is recorded in the associated index request object (CmTextSearchIndexRequest). Indexing request errors are recorded in the LastFailureReason property of the index request object. When an index request associated with an object is successfully processed, any failed index requests for that same object are deleted. In other words, a CBR-enabled object will not have any failed index requests (the CmIndexingFailureCode property will be zero) if it has been successfully indexed. An index request is typically retained only in the case of a system failure and the index request needs to be retried.

Indexing Job Errors

In some cases, an error may be related to index job processing, rather than index request processing. For example, at the end of some index jobs the index must be deleted, and it is possible for the deletion to fail. However, this does not mean the index job itself failed: an index job that successfully submits all of the associated index requests will always complete with JobStatus property value of TERMINATED_NORMALLY (successful completion).

If a failure occurs that is related to an index job, the identified reason for the failure is recorded in the LastFailureReason property on the IndexJob object.

Domain and Server Information

Full-text indexing and searching requires that at least one IBM Content Search Services server configuration to be associated with your P8 domain. To create such a configuration, create a CmTextSearchServer object by using one of the Factory.CmTextSearchServer methods and specify the domain object representing your P8 domain. Each CmTextSearchServer object that you create and associate with a given domain is automatically added to the read-only CmTextSearchServerSet collection object returned by the TextSearchServers property of the Domain object. Once you have created an IBM Content Search Services server, you must set its TextSearchServerStatus property to ENABLED for it to be recognized by the Content Engine server. If the Content Engine server cannot communicate with the IBM Content Search Services server, it automatically sets its TextSearchServerStatus property to UNAVAILABLE. If you do not want the IBM Content Search Services server to be available, set its TextSearchServerStatus property to DISABLED. The host, port number, and connection token of the IBM Content Search Services server must be set by using the CmTextSearchServer object properties.

IBM Content Search Services servers and FileNet P8 objects have the following many-to-one relationships:

Multiple object stores can share the same IBM Content Search Services server configuration object.
Multiple IBM Content Search Services server configuration objects can exist for a P8 domain.

The properties of a CmTextSearchConfiguration object allow you to control IBM Content Search Services functions on the Content Engine server. A CmTextSearchConfiguration object is contained in the SubsystemConfiguration property of the Domain, Site, VirtualServer, and ServerInstance classes. The CmTextSearchConfiguration instance to be used is determined by these classes in this order: ServerInstance, VirtualServer, Site, and Domain.

Index Areas and Indexes

An IBM Content Search Services index area is a file system directory containing the information necessary to perform full-text indexing that is updated and queried by IBM Content Search Services. A many-to-one relationship exists between an index area and an object store. A given index area is dedicated to a single object store, but you can have multiple index areas for an object store on a single file system, or you can distribute the indexing information for an object store in multiple index areas across file systems.

Each index area is represented by a CmTextSearchIndexArea object. The file system location of an index area is stored in its RootDirectoryPath property.

Each index area can hold multiple indexes (CmTextSearchIndex objects), which are specified by its TextSearchIndexes property. A many-to-one relationship exists between indexes and an index area; CmTextSearchIndex objects are created automatically in the associated index area, as needed. When an indexable class is instantiated, the CmTextSearchIndex object associated with its base class, and any index partitioning properties defined on the object store, can be used to reference the full-text indexing information. If there is no existing CmTextSearchIndex object associated with the base class and index partitioning properties, a new CmTextSearchIndex object (and the corresponding index maintained by IBM Content Search Services) is created. The index is identified by its IndexName property.

IBM Content Search Services indexing and search servers update and query the indexes. The indexes in an index area are only accessible to the servers that are in the same site as the index area (Site property of the CmTextSearchServer and CmTextSearchIndexArea objects).

To improve indexing efficiency, you can specify which languages are supported in an object store by adding language codes to the string list of its TextSearchIndexingLanguages property. If the IBM Content Search Services server cannot determine the language of a document to be indexed in an index request, the first language code in the string list is used as the default language code for the index request. Ensure that the languages that you specify with this property match the languages of most of the documents in this object store; otherwise, you might experience a performance delay. If you do not set this property to at least one language code, and the deprecated TextSearchIndexingLanguage property has not been previously set, an error will occur during indexing.

Status of Index Areas and Indexes

Index areas (CmTextSearchIndexArea objects) and indexes (CmTextSearchIndex objects) have a ResourceStatus property, which specifies their availability status. This property can have a value of OPEN, CLOSED, or FULL. For CmTextSearchIndexArea objects, ResourceStatus can also have a value of STANDBY. For CmTextSearchIndex objects, ResourceStatus can also have a value of UNAVAILABLE.

Index Area Status

Indexes can only be created in an index area (CmTextSearchIndexArea object) that has a resource status (ResourceStatus property) set to OPEN. Otherwise, if there are no index areas in the object store that have a resource status setting of OPEN, no new indexes can be created in the object store.

The resource status of an index area is automatically set to FULL when the index area has reached its full capacity. This setting indicates that no new objects can be indexed in the index area and no new indexes can be created in the index area. However, existing indexes can be deleted or queried. An index area is considered to be at full capacity when the number of its indexes is equal to the value of its MaxIndexes property and all of its indexes have a resource status of either FULL or CLOSED.

If an index area is full, and another index area that has a resource status of STANDBY is found, the Content Engine server automatically sets the resource status of the standby index area to OPEN. If there are multiple standby index areas, the Content Engine server chooses the standby index area with the highest priority, according to the value of its CmStandbyActivationPriority property. If two or more index areas exist with the same priority, one of these standby index areas is chosen randomly by the server.

Index Status

Create index requests can only be written to an index (CmTextSearchIndex object) if its resource status (ResourceStatus property) is set to OPEN. However, existing index entries can be updated, deleted, or queried.

The Content Engine server automatically sets the resource status of an index to FULL when the number of objects in the index is equal to the value of the MaxObjectsPerIndex property or the size of the index reaches the valued specified by the MaxSizePerIndexKbytes property. Once the resource status of the index has been set to FULL, no more create index requests can be written to this index. However, existing index entries can be deleted or queried. When an index is full, updated index data is automatically written to another index having a resource status of OPEN, if it exists. If no index can be found having a resource status of OPEN, a new index is created in the same index area as long as the index area's maximum index limit (as specified by its MaxIndexes property) has not been reached. If the index area's maximum index limit is reached and no other index areas are set to OPEN, an index area that has been set to STANDBY, if present, is automatically transitioned to OPEN and the new index is created in it.

If an index job closes an index for deletion, the server sets its ResourceStatus property to CLOSED and sets its IndexingStatus property to REPLACING. As long as the IndexingStatus property is set to REPLACING, you cannot change the ResourceStatus property of an index from CLOSED to OPEN by using the API. When the index job is canceled, the server sets the IndexingStatus property of the index to NORMAL; the ResourceStatus property remains set to CLOSED unless it is changed by the API.

An administrator can manually set an index to a resource status of CLOSED. An index that has been set to CLOSED suppresses errors and allows reindexing to complete without generating errors that can cause the entire reindexing operation to fail. A closed index is closed to create index requests; however, existing index entries can be updated, deleted, or queried.

An administrator can manually set an index to a resource status of UNAVAILABLE, typically when the index is found to be corrupted or is otherwise inaccessible. An index that has been set to UNAVAILABLE suppresses errors and allows reindexing to complete without generating errors that can cause the entire reindexing operation to fail. It is recommended that an administrator inform users when an index has been set to UNAVAILABLE because search results might be incomplete. After an index has been set to UNAVAILABLE, it cannot be set to any other state; this state is a final state for the index. It is recommended that an unavailable index be reindexed as soon as possible. The server automatically creates a new index or opens a standby index to handle any pending index requests for the unavailable index. After an unavailable index has been reindexed, the server deletes the unavailable index.

Note: Index data is deleted automatically when an index job disables full-text indexing or rebuilds an index (deleting and re-creating the CmTextSearchIndex instance). For more information, see Index Requests.

Index Requests

Each index request is associated with a CBR-enabled object, and is an instance of the CmTextSearchIndexRequest class. Index request objects are created by the indexing process and cannot be created by using the API. You can perform read and update operations on CmTextSearchIndexRequest objects, and there are properties with status, failure and retry information recorded. CmTextSearchIndexRequest objects do not have individual security assignments and receive their security from the default instance security for the class.

The SourceObject property identifies the CBR-enabled object that is the subject of the index request, the IndexJob property identifies the index job associated with the index request, and the IndexRequestStatus property records the current status of the index request.

Information about index request failures is provided by the CmIndexingFailureCode, LastFailureReason and RetryCount properties. By using these and other CmTextSearchIndexRequest properties, you can search for index requests meeting specific criteria; for example, all index requests that have failed and will not be retried, with a description of the last failure for each one, or all index requests that are being retried, with a description of the last failure for each one.

Skip Operation

You can specify a skip operation for an index request. During a skip operation, the CBR-enabled object that is the subject of an index request will not be indexed. To specify a skip operation, set the IndexingOperation property of the CmTextSearchIndexRequest object to SKIP. As a result, the Content Engine server sets the CmIndexingFailureCode property of the index request to MARKED_AS_SKIPPED and sets the IndexationId property of the CBR-enabled object to null.

Index Partitions

An index partition determines which CBR-enabled objects can be indexed into a partitioned IBM Content Search Services index. Only CBR-enabled objects that satisfy the partitioning constraint are stored in the index. When an indexing partitioning property is specified in a text-search query, only indexes with the same index partition property names and values are searched. By configuring index partitioning in an object store, you can decrease the number of indexes that must be searched as a result of a query, provided that your application uses index partitioning properties in the query.

Each IBM Content Search Services index maintains a list of zero to two CmIndexPartitionConstraint objects via its IndexPartitionConstraints property. This list is read-only and is maintained by the Content Engine sever. Each CmIndexPartitionConstraint object corresponds to an index partitioning property associated with an object store, represented by a CmTextSearchPartitionProperty object. Each property of a CBR-enabled object that is assigned as an index partitioning property must be a custom string- or date-valued property with a settability of SETTABLE_ONLY_ON_CREATE. You can have no more than one string and one date index partitioning property assigned to an object store.

Indexing and Special Characters

Special characters are indexed in the same position as the preceding term, unless there is a sequence of special characters. In this case, the special character sequence is indexed as unordered separate tokens (ordering of the characters is ignored). For more information about special characters in queries, see Special Characters.