IBM FileNet P8, Version 5.2            

CBR Queries

A content-based retrieval (CBR) query includes a full-text search function call as part of the WHERE clause for the SQL statement. The full-text search function permits you to search for text in object content and properties.

To specify a full-text search, use the CONTAINS function. All searches are case insensitive. The search must be expressed in IBM® Content Search Services query syntax or an XPath-based syntax.

The search results of a CONTAINS function call are affected by the configuration of IBM Content Search Services. For more information, see Search Results.

A CBR query can potentially run for an extended period. To limit the amount of time a query can run, set a timeout value for the query. The timeout value is the maximum time allowed to fetch a single page of search results for a continuable query. If a fetch exceeds the maximum time allowed, Content Engine stops running the query and returns an error. A single default timeout is probably not sufficient: when writing your applications, allow the user to set a timeout.

The following example shows a search function call and the other main syntactical elements of a CBR query:

SELECT d.This FROM Document d 
    INNER JOIN ContentSearch c ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.*,'lion AND tiger') 
    AND d.Date >= 20101201T000000Z AND d.Date < 20101231T235959Z
    ORDER BY c.Rank
    OPTIONS (FULLTEXTROWLIMIT 500)

CBR query syntax has the following main elements:

Search Results

The indexes for IBM Content Search Services store the text for an object as one logical entity. Consequently, a full-text search returns one row for each indexed object that matches the search criteria. The number of times that an object matches the search criteria does not affect the number of rows returned. For example, the text in the content and properties of an object might have multiple instances of the term lion. A search for lion returns only one row for that object.

Content Engine places the rows returned by a full-text search into the ContentSearch table. The table is then joined with other tables in the query. For more information, see ContentSearch table join.

The following factors affect the search results of a CONTAINS function call:

ContentSearch Table Join

To run a CBR query, Content Engine runs the full-text search separately from the query that is represented by the full SQL statement. The full-text search is expressed in statements such as lion AND tiger. These search expressions are passed directly to IBM Content Search Services, and the returned search results are copied into a temporary database table. The query is then run as a standard relational query.

As with other tables referenced in a query such as Document, the temporary table represents a class of objects. Each row in the table represents a separate instance of the class. You reference the temporary table as the ContentSearch class and join the table to other tables in the query. Only one ContentSearch join is allowed per SQL statement because only one search function call is allowed. For information about ContentSearch properties, see ContentSearch Properties. The properties can be referenced in the query as columns. ContentSearch is an internal class and, consequently, objects for the class cannot be instantiated.

The following is an example of an inner join in a full-text search:

SELECT … FROM Document D INNER JOIN ContentSearch CS ON D.This=CS.QueriedObject 
    WHERE CONTAINS(*,'text') AND D.Creator = 'Frank'

This query is executed by first evaluating the CONTAINS(*, 'text') clause and writing that data to a temporary table. Then the remainder of the query is executed:

SELECT … FROM Document D INNER JOIN ContentSearch CS ON D.This=CS.QueriedObject 
    WHERE CS.QueriedObject IS NOT NULL AND D.Creator = 'Frank'

Note that the CONTAINS clause has been replaced by CS.QueriedObject IS NOT NULL.

In the previous example, the CONTAINS clause is used with an AND operator. Using an OR operator will yield confusing results due to the presence of the inner join (which is an effective AND). For this reason, an AND operator is required with an INNER JOIN, and an OR operator is disallowed.

In this example of an outer join, the outer join first executes the CONTAINS clause and builds the temporary table in the database that holds the full-text search results, then executes the query:

SELECT … FROM Document D LEFT OUTER JOIN ContentSearch CS ON D.This=CS.QueriedObject 
    WHERE CONTAINS(*,'text') OR D.Creator = 'Frank'

When using outer joins, only the ContentSearch class can be joined conditionally. You cannot use a right outer join in the following query example, because the query results can potentially return only ContentSearch data, which is not allowed. Outer joins must also use an OR operator when combining the CONTAINS clause with other conditions; use of an AND operator is disallowed:

SELECT … FROM Document D LEFT OUTER JOIN ContentSearch CS ON D.This=CS.QueriedObject 
    WHERE CS.QueriedObject IS NOT NULL OR D.Creator = 'Frank'
         

CONTAINS Function

Use the CONTAINS function to search for text in object content and properties. The function must be called from the WHERE clause of the SQL statement for the query. At most one function call can be made per SQL statement.

The CONTAINS function has the following grammar:

CONTAINS(<property_spec>, <string_literal>[,<dialect>])

The following function call examples show some of the IBM Content Search Services query syntax:

CONTAINS(d.*, 'lion NOT tiger')
CONTAINS(d.*, 'lion* tiger')
CONTAINS(d.*, 'lion tiger~0.8')
CONTAINS(d.*, '@xmlxp:''/zoo/mammal[ .contains("lion")]''')
CONTAINS(d.*,'lion AND tiger') AND d.Date >= 20101201T000000Z AND d.Date < 20101231T235959Z 
Note: It is recommended that any date partition properties appear after the CONTAINS clause. This applies the date range to the CBR index instead of the full database. It is generally more efficient to apply the date range to the full-text index query than to the database query.

CONTAINS Parameter: <property_spec>

The CONTAINS <property_spec> parameter specifies the class of the objects to be searched and the scope of the search for each object: the choice is between searching a single property or all properties plus object content. The functional effectiveness of the specified search scope depends on whether the relevant class and properties are CBR-enabled. For more information, see Search results.

To search a specific property, specify the property name as shown in the following example:

SELECT d.This FROM Document d 
    INNER JOIN ContentSearch c ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.DocumentTitle,'lion')
        

To search all properties plus the object content, use an asterisk as shown in the following example:

SELECT d.This FROM Document d 
    INNER JOIN ContentSearch c ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.*,'lion')
        

CONTAINS Parameter: <string_literal>

The CONTAINS <string_literal> parameter accepts a full-text search expression. The search expression is sent to IBM Content Search Services for processing when the CBR query is run. The search expression syntax must conform to the syntax described in the following sections. An XPath-based syntax can also be used to search XML documents. For information about this syntax, see XML Searches. You can combine XML searches with fielded searches and text searches.

Enclose the CONTAINS search expression in single quotation marks as shown in the following example:

SELECT d.This FROM Document d 
    INNER JOIN ContentSearch c ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.*,'DocumentTitle:zoo AND lion')

Query terms and phrases

Queries can contain terms and operators. A term is a single word such as database. A phrase is a group of words that are contained in quotation marks, such as "computer software". Phrases are searched as exact expressions.

The following table shows how phrases (exact matches) are handled differently than search terms without quotation marks.

Search query Terms without quotation marks Exact match (phrases)
Lemmatization Done Not done
Stop words Removed Not removed
Synonyms Expanded Not expanded
Boolean operators Supported Not supported (interpreted as search terms)
Operators: - and % Supported Not supported (interpreted as search terms)
Wildcard characters: * and ? Supported Supported

Boolean Searches

Boolean operators allow terms to be combined through logical operators. The following Boolean operators are supported: AND, &&, OR, | |, NOT and "-". The AND operator is the default Boolean operator. In other words, if you do not include a Boolean operator between two terms, the AND operator is used.

Boolean operators are subject to the following restrictions:

Precede a term with the NOT operator or a minus sign (-) to indicate that the term must be absent from a document for a match to occur. These operators, which are equivalent, act as filters to remove documents and must be associated with a query that returns positive results. They can be applied to single terms, phrases, or Boolean expressions. For example, the following queries return documents that include the term computer and not the term hardware:

     computer NOT hardware
     computer –hardware

You can use parentheses to control the Boolean logic in a query. For example, the following query finds documents that contain either WebSphere® or Lotus® and website:

     (WebSphere OR Lotus) AND website
         

The following table shows examples of search queries with Boolean operators.

Search query Results

database AND disk

database && disk

Narrows the search to include documents that must contain both the term database and the term disk anywhere in the document.

database OR "log file"

database || "log file"

Widens the search to include documents that contain either the term database or the phrase log file anywhere in the document.

database NOT "log file"

database - "log file"

Searches for documents that contain database and that do not contain the phrase log file anywhere in the document.

database OR "data base" NOT "log file"

Searches for documents that contain either the term database or the phrase data base and that do not contain the phrase log file anywhere in the document.

Fielded Data

IBM Content Search Services supports indexing fields and searching string-based data in fields. When you perform a search, you can optionally specify the field in which to search.

A field name must correspond to the symbolic name of an indexed object property that is directly or indirectly specified by the <property_spec> parameter. The corresponding property must also be CBR-enabled. You cannot use fielded search within xmlxp expressions.

If you do not specify a field, IBM Content Search Services searches for the query text in the textual content of the entire document. For XML documents, IBM Content Search Services looks for the query text in elements only.

Specify the field in which to search by typing the field name followed by a colon (:). Then, specify the term or phrase that you are searching for. For example, the value of the symbolic name of the CBR-enabled property named "DocumentTitle" can be searched according to the following query:

     DocumentTitle:"Budget Proposal"

Use parentheses to group multiple clauses in a field. For example, the following query searches for titles that contain both the word tutorial and the phrase "computer software":

     DocumentTitle:(tutorial "computer software")

You can also use fields with searches that use Boolean operators by enclosing the search terms in parenthesis. For example, the following queries search for titles that do not contain the word apple:

     DocumentTitle:(-apple)
     DocumentTitle:(-apple banana)

The following query searches for titles that contain the word apple or banana:

     DocumentTitle:(apple OR banana)

The following query searches for titles that contain the word apple and do not contain the word banana:

     DocumentTitle:(apple AND NOT banana)

You can also use fielded data with Proximity Searches and Fuzzy Searches.

Restriction: You cannot use fielded search within xmlxp expressions.

Special Characters

Special characters can serve different functions in the query syntax. For example, a question mark (?) can be used as a wildcard character. Specify special characters in the CONTAINS <string_literal> parameter, and escape them if necessary. To escape most special characters, precede the character with a backslash (\). For example:

To escape a single quotation mark, use another single quotation mark. For example:

The special characters in the following table must be escaped.

Special character Behavior when not escaped
Asterisk (*) Used as a wildcard character.
At sign (@) A syntax error is generated when an at sign is the first character of a query. In xmlxp expressions, the at sign is used to refer to an attribute.
Brackets ([]) Used in xmlxp expressions to search the contents of elements and attributes.
Braces ({}) Generates a syntax error.
Backslash (\)  
Caret (^) Used for weighting (boosting) terms.
Colon (:) Used to search the contents of a field.
Equal sign (=) Generates a syntax error.
Exclamation point (!) A syntax error is generated when an exclamation point is the first character of a query.
Forward slash (/) Used in xmlxp expressions as an element path separator.
Greater than symbol (>) Used in xmlxp expressions to compare the value of an attribute. Otherwise, a syntax error is generated.
Less than symbol (<) Used in xmlxp expressions to compare the value of an attribute. Otherwise, a syntax error is generated.
Minus sign (-) When a minus sign is the first character of a term, only documents that do not contain the term are returned.
Parentheses Used for grouping.
Percent sign (%) Specifies that a search term is optional.
Plus sign (+)  
Question mark (?) Used as a wildcard character.
Semicolon (;)  
Single quotation mark (') Used to contain xmlxp expressions. To escape a single quotation mark, use another single quotation mark instead of a backslash.
Tilde (~) Used for proximity and fuzzy searches.
Vertical bar (|)  

Special characters that do not have a meaning in the query syntax need not be escaped, but it is safe to do so. The following examples are of special characters that do not need to be escaped:

Special characters are queried according to the following criteria:

Restriction: You cannot perform an exact match search on two consecutive, identical special characters. Queries of this type return documents that contain only one of the special characters. For example, searching for -- returns only documents that contain -.

For information about using special characters in wildcard search expressions, see Wildcard Searches.

Indexing and Special Characters

During tokenization and language processing, IBM Content Search Services identifies and indexes special characters as punctuation. Special characters are token delimiters. For example, "jack_jones" is tokenized as three separate tokens: jack,"_, and jones. Emails, URLs, and file paths are broken down into tokens. For example:

Special characters do not occupy a token position in the file. For example, jack_jones is indexed with the underscore in the same token position as jack. Special characters also do not occupy a token position when spaces are included. For example, jack_jones is indexed in the same way as jack _ jones.

The token position is used for exact phrase search and for proximity search. For example, if a document contains the expression jack_jones, searching for the exact phrase ""jack jones"" finds this document.

When a sequence of special characters are indexed separately, they are searched in no particular order. For example, searching for "#$" also finds documents that contain "$#".

Special Characters in Chinese, Japanese, and Korean Languages

To find a sequence of characters in Chinese, Japanese and Korean (CJK) languages that includes special characters, the query expression must include the special characters. If you omit the special characters from the query expression, the character sequence might not be found. In non-CJK languages, the character sequence is always found, even if the query expression omits the special characters. For example, if an indexed document contains john_smith, you can search for john_smith or "john smith" (exact match, without the underscore) and both queries return the document that contains john_smith.

Restriction: You cannot search for the following special characters in CJK documents: ? * \

Wildcard Searches

IBM Content Search Services supports wildcard searches in terms, fielded data, and phrases. You can place wildcard characters before, within, or after a term. Use a question mark (?) to perform a single character wildcard search. For example, the following term query finds documents that contain the terms mare, mere, mire, and more:

     m?re

The following phrase query finds documents that contain "uri requirements" and "url requirements":

     "ur? requirements"

You can use special characters in wildcard search expressions. For example, searching for ja*_ finds documents that contain jack_jones. However, you cannot use wildcard characters to find special characters. For example, searching for "ca*s" finds documents that contain cats, categories, or cas, but not documents that contain ca_s.

To use a single wildcard character search at the beginning of a term, you must use double quotation marks. For example:

     "?more"

Use an asterisk (*) to perform a multiple character wildcard search. A multiple character wildcard search looks for 0 (zero) or more characters. For example, the following term query finds all documents with DocumentTitle property values that begin with tech:

     DocumentTitle:tech*

The following phrase query finds documents with DocumentTitle property values that contain "maintenance contacts" and "maintenance contracts":

     DocumentTitle:"maintenance cont*"

You can use wildcard characters as the values of XML elements and attributes. However, you cannot use wildcard characters in the names of XML elements and attributes. For more information, see XML Searches.

Note: Using a multiple character wildcard (*) at the beginning of a search term can have a negative impact on the performance of a query when many matching terms are found. There is a configurable limit on the number of terms that can be returned; an error is returned when the limit (10,000 terms by default) is exceeded. To change this limit, use the configuration tool to specify a new value for the queryExpansionLimit parameter. For more information, see Configuration tool usage .

Specifying Optional Terms

You can use a percent sign (%) to indicate that a search term is optional. For example, the following query returns all documents that contain the word cat, and optionally contain the word dog. This query provides a higher score on documents that also contain the word dog:

     cat %dog

The optional operator is in effect only if there is more than one search term in an expression. If there is only one term in an expression or within a set of parenthesis, the optional operator is ignored and the query finds documents that include the specified term. For example, each of the following queries is equivalent in both search results and score:

     %dog
     %(dog)
     (%dog)
     dog

If all the search terms in an expression are optional, the query finds documents that include at least one of the specified terms, which is equivalent to an OR query. For example, each of the following queries is equivalent, in both search results and score, and return documents that contain either the words cat or dog:

     %cat %dog
     %cat OR %dog
     cat OR dog

In the following queries, the words dog and horse are marked as optional, but because they are the only terms within a set of parenthesis, at least one of them must be found in a document for it to be returned. For example, each of the following queries is equivalent. Each document that is returned must contain the word cat and must contain either the word dog or horse (or both):

     cat (%dog %horse)
     cat (%dog OR %horse)
     cat (dog OR horse)
         

Proximity Searches

A proximity search finds documents that contain terms within a specified number of words of each other. To perform a proximity search, enclose the terms in quotation marks and follow the terms with a tilde symbol (~) and a positive integer. For example, the following query searches for documents that contain IBM and WebSphere within seven words of each other:

     "IBM WebSphere"~7

Proximity search is supported for individual terms, not for phrases. A word that follows a sentence break is not considered to be adjacent to words in the previous sentence. A word after a sentence break is considered to be 10 positions apart from the last word of the previous sentence. A sentence break is identified by a period (.), a question mark (?), or an exclamation point (!) followed by a space. For example, the period in ibm.com® is not a sentence break. The position of the punctuation is typically the same as the character before it (the punctuation does not take a position by itself).

You can use a proximity search with fielded data. For example, the following query searches for DocumentTitle property values that contain the words apple and banana within two words of each other:

     DocumentTitle:("apple banana"~2)
Restriction: Special characters are not supported in proximity search queries.

Fuzzy Searches

A fuzzy search query searches for character sequences that are not only the same but similar to the query term. Use the tilde symbol (~) at the end of a term to do a fuzzy search. For example, the following query finds documents that include the terms analytics, analyze, analysis, and so on.

     analytics~ 

You can add an optional parameter to specify the required similarity. Specify a value greater than 0 and less than 1. The value must be preceded by a 0 and decimal point, for example, 0.8. A value closer to 1 matches terms with a higher similarity. If the parameter is not specified, the default is 0.5.

     analytics~0.8 

Fuzzy search syntax can be specified per term, but not on a phrase. To apply fuzzy search to multiple words in a query, you must apply a fuzzy search factor for each term. For example, the following query finds documents that include terms that are similar to summer and time.

     summer~0.7 time~0.7

You can use a fuzzy search with fielded data by enclosing the search term in double quotation marks. For example, the following query searches for DocumentTitle property values that contain words that are similar to the word apple:

     DocumentTitle:"apple"~0.3
     DocumentTitle:"apple~"
Restriction:
  • Special characters are not supported in fuzzy search queries.
  • Terms in fuzzy search queries do not go through language processing (lemmatization, synonym expansion, and stop word removal). Therefore, fuzzy search queries do not find terms that are similar to synonyms.
  • If wildcard characters are included in the fuzzy search terms, only the wildcard search is done.

XML Searches

The CONTAINS <string_literal> parameter accepts an XPath-based expression to search for terms in elements and attribute values in XML files.

The following search features are supported by XML search:

The search expression is sent to IBM Content Search Services for processing when the CBR query is run. IBM Content Search Services implements only a subset of the XPath specification as described here.

You can also specify a range of field values. To define a range, specify the lower and upper bounds and indicate whether the range is inclusive (by using square brackets) or exclusive (by using curly brackets). Sorting is lexical. For example, 2 is greater than 10.

XPath-based searches are subject to the following restrictions:

As with any other string literal in a query, you enclose a search expression within single quotation marks. Single quotation marks must also be used as part of the XPath syntax for the search expression. These XPath-required single quotation marks are shown without escape characters in the search expression examples in this topic. When passing an XPath-based search expression to the CONTAINS function, escape the XPath-required single quotation marks by using two consecutive single quotation marks. In the following example, the search expression @xmlxp:'/zoo/mammal[. contains("lion")]' is passed to the CONTAINS function:

SELECT d.This FROM Document d 
    INNER JOIN ContentSearch c ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.*, '@xmlxp:''/zoo/mammal[. contains("lion")]''')

As shown in the example, use a forward slash to separate the XML elements in the path. After the path specification, use brackets with an enclosed period to search the content of XML elements.

By definition, XML element and attribute names are case-sensitive. However, element and attribute values are not case-sensitive.

Specifying element paths in XML queries

You can specify element paths in XML queries by using the following query syntax:

/tag[. contains("word")]

/tag1/tag2[. contains("word")]

//tag[. contains("word")]

/tag1//tag2[. contains("word")]

/*/tag[. contains("word")]

/tag1/*/tag2[. contains("word")]

Searching the content of elements and attributes

You can search the contents of elements in XML documents by using a period (.) in an xmlxp expression as follows:

/tag[.contains ("word")]

You can combine search expressions at different levels of an element path, for example:

/book[. contains ("whale")]/name[. contains("Moby")]

To search the contents of attributes in XML documents, use @attribute_name in an xmlxp expression as follows:

/tag[@attribute contains ("word")]

The following query expressions apply to searching the contents of both elements and attributes. To search attribute values, replace the period (.) with @attribute_name in the square brackets.

Examples of expressions with Boolean operators

Examples of expressions with wildcard characters

For more information about wildcard characters, see Wildcard Searches.

Alternative syntax structure

You can include an element path in the brackets of an xmlxp expression. For example, the following queries are equivalent and return the same results:

     @xmlxp:’/doc/email[recipients contains ("John")]’
     @xmlxp:’/doc/email/recipients[.contains ("John")]’

You can use this syntax to express constraints on different subpaths in the same xmlxp expression, for example:

     @xmlxp:’/doc/email[recipients/to contains  ("John") AND sender contains ("Jack")]’

Numeric comparisons of attribute and element values

You can use the following construct to compare the numeric values of attributes:

/tag[@attribute operator numeric_value]

The following operators are supported: > < >= <= = !=

Numbers can be positive and negative integers (for example, 15 and -15), decimal fractions (for example, 15.1), and scientific notation (for example, 1.510000e+1). For example, the following queries are equivalent:

     @xmlxp:’/book/name[@price > 15]’
     @xmlxp:’/book/name[@price > 15.0]’
     @xmlxp:’/book/name[@price > 1.500000e+1]’

Date and time comparisons

Date and DateTime element and attribute values can be searched. The following operators are supported for comparing Date and DateTime values: > < >= <= = !=

Date and DateTime values are searchable when they are provided in the following ISO 8601 formats:

For more information on supported date and time formats, see http://www.w3.org/TR/xmlschema-2/#dateTime.

The following examples are valid Date and DateTime values:

     2011-07-04
     2011-07-04 16:41:06
     2011-07-04T16:41:06
     2011-07-04T16:41:06.123
         
Restriction:
  • An element or attribute must contain the Date or DateTime value only, without additional characters (other than whitespace).
  • Elements that contain Date or DateTime values cannot have descendent elements.

The following examples show Date and DateTime values in the context of an XML document:

During a search, the Date or DateTime value must be enclosed within an xs:date() or xs:dateTime() function call to be recognized as the correct data type.

An XML DateTime data type in an XML document can specify a timezone value in the form ((’+’ | ’-’) hh ’:’ mm) | ’Z’. However, when a DateTime is indexed, the ECM Text Search server truncates timezone values during indexing. Time zones cannot be specified in an xs:date() or xs:dateTime() value in a query. Therefore, time zones are not considered during XML searches that involve Date or DateTime data types.

A time element of 24:00:00 is treated as a value between the last instant of that day and the first instant of the next day.

The following examples show Date and DateTime comparisons in xmlxp terms:

Boolean expressions in queries

You can use the following Boolean operators to build complex query expressions with the following xmlxp constraints: AND, OR, and XOR. For example, the following query includes AND and OR Boolean operators:

     @xmlxp:’//BBB[.contains("E")]’ AND @xmlxp:’/AAA/BBB[@id  = "2"]’ OR @xmlxp:’/AAA/BBB[@id  = "2"]’

You can also use Boolean operators in the value of an @xmlxp constraint.

The following syntax examples show how you can use Boolean operators with search terms:

Finding elements or attributes

You can check for the existence of elements or attributes without specifying their values.

For example, the following query returns results when an <email> element is found directly under a <doc> element:

     @xmlxp:’/doc/email’

The following query returns results when an <email> element with an id attribute is found directly under a <doc> element:

     @xmlxp:’/doc/email/@id’

Using namespaces in a search

You can refer to XML namespaces during a search. Namespaces in XML documents are mechanisms for scoping element and attribute names to prevent naming conflicts, for example, between XML documents from different sources. For more information about namespaces, see http://www.w3.org/TR/xml-names.

For example, if you want to define an explicit namespace such as http://example.com/ns/abc in an XML document and refer to it in an xmlxp query, format the XML as follows:

     <?xml version=’1.0’?>
       <doc xmlns:ns1="http://example.com/ns/abc">
         <ns1:p>word</ns1:p>
       </doc>

The corresponding xmlxp query is written as follows:

     @xmlxp:’declare namespace ns1 = "http://example.com/ns/abc";
     /doc/ns1:p[. contains ("word")]’

Default namespaces

To define a default namespace in an XML document such as http://example.com/ns/abc and refer to it an xmlxp query, specify the following XML:

     <?xml version=’1.0’?>
       <doc xmlns="http://example.com/ns/abc">
         <p>word</p>
       </doc>

The corresponding xmlxp query is written as follows:

     @xmlxp:’declare default element namespace "http://example.com/ns/abc";
     /doc/p[. contains ("word")]’

Attribute namespaces

An attribute can also have a namespace. In this example, the element and attribute both have a namespace and the namespaces are different:

     <dog xmlns:an="http://example.org/animals" xmlns:sz="http://example.org/sizes">
     <an:breed sz:size="Medium">Mutt</an:breed>
     </dog>

The following xmlxp query uses a namespace for the attribute name:

     @xmlxp:’declare namespace sz = "http://example.org/sizes";
     /dog/breed[@sz:size contains("Medium")]’

The following xmlxp query uses a namespace for both the attribute and element:

     @xmlxp:’declare namespace an = "http://example.org/animals";
     declare namespace sz = "http://example.org/sizes";
     /dog/an:breed[. contains ("Mutt") and @sz:size contains("Medium")]

Combining XML search with fielded search

You can combine xmlxp syntax with searches on metadata fields. For example, the following search finds emails that were sent to a recipient named Paul and the CBR-enabled DocumentTitle property has a value of "IBM":

     @xmlxp:’//recipients/to[.contains("Paul")]’ DocumentTitle: IBM

You can also include Boolean operators as shown in the following examples:

     @xmlxp:’//recipients/to[.contains("Paul")]’ AND DocumentTitle: IBM
     @xmlxp:’//recipients/to[.contains("Paul")]’ OR DocumentTitle: IBM
     @xmlxp:’//recipients/to[.contains("Paul")]’ AND NOT DocumentTitle: IBM
Restriction: You cannot use fielded search within an xmlxp expression.

Combining XML search with textual search

You can combine xmlxp syntax with regular text searches. For example, the following search finds emails that were sent to a recipient named Paul that contain the word important.

     @xmlxp:’//recipients/to[.contains("Paul")]’ important
     

Combining XML search with proximity search

You can combine xmlxp syntax with a proximity search according to the following syntax:

SELECT d.This FROM Document d 
    INNER JOIN ContentSearch c ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.*, '@xmlxp:''/email1/body1[.contains("""body email""~2")]''')

For example, the following search finds an email subject that contains "sun" and "sand" within 2 words of each other:

SELECT d.This FROM Document d 
    INNER JOIN ContentSearch c ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.*, '@xmlxp:''/icc_document/icc_email/icc_subject[.contains("""sun sand""~2")]''')

CONTAINS Parameter: <dialect>

The CONTAINS <dialect> parameter is an optional parameter that identifies the syntax of the search.

Pass one of the following arguments to the parameter:

The following example shows a query with a search function call that specifies the IBM Content Search Services syntax:

SELECT d.This FROM Document d 
    INNER JOIN ContentSearch c ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.*, 'lion' , 'Lucene')

The <dialect> parameter is optional. By default, the object store CBRSearchType property determines the Content Search Engine that runs a search for a CBR query. The intended Content Search Engine determines the presumed search syntax for a query. The <dialect> parameter works in reverse: if you explicitly identify the search syntax, this identification overrides the CBRSearchType property value to determine the Content Search Engine that runs the search. The intended Content Search Engine must be enabled to run the CBR query successfully.

Full-Text Joins

Whenever a CONTAINS clause is used, a join is performed on the (internal) ContentSearch class.

Note: Instances of the ContentSearch class cannot be retrieved. However, values for the properties of this class can be specified, and are defined in the metadata for this class. For more information, see ContentSearch Properties.

The join is necessary because when a query with a full-text search is executed, the full-text search is performed first, then the data from the full-text search is copied to a temporary database table that is referenced by the ContentSearch class name. The remainder of the search statement is then executed against the repository, joining to this temporary table to access the full-text search data.

Only one join to the ContentSearch class is allowed per query statement, because only one CONTAINS clause is allowed.

The following is an example of an inner join in a full-text search:

 SELECT … FROM Document D INNER JOIN ContentSearch CS ON D.This=CS.QueriedObject 
     WHERE CONTAINS(*,'text') AND D.Creator = 'Frank'

This query is executed by first evaluating the CONTAINS(*, 'text') clause and writing that data to a temporary table. Then the remainder of the query is executed:

SELECT … FROM Document D INNER JOIN ContentSearch CS ON D.This=CS.QueriedObject 
      WHERE CS.QueriedObject IS NOT NULL AND D.Creator = 'Frank'

Note that the CONTAINS clause has been replaced by CS.QueriedObject IS NOT NULL.

In the previous example, the CONTAINS clause is used with an AND operator. An OR operator would yield confusing results due to the presence of the inner join (which is an effective AND). For this reason, an AND operator is required with an INNER JOIN, and an OR operator is disallowed.

An example of an outer join is:

SELECT … FROM Document D LEFT OUTER JOIN ContentSearch CS ON D.This=CS.QueriedObject 
WHERE CONTAINS(*,'text') OR D.Creator = 'Frank'

The outer join above first executes the CONTAINS clause and builds the temporary table in the database that holds the full-text search results, then executes the query:

SELECT … FROM Document D LEFT OUTER JOIN ContentSearch CS ON D.This=CS.QueriedObject 
            WHERE CS.QueriedObject IS NOT NULL OR D.Creator = 'Frank'

When using outer joins, only the ContentSearch class can be joined conditionally. The query above would not be allowed to use a right outer join, because that would potentially return only ContentSearch data, which is not allowed. Outer joins must also use an OR operator when combining the CONTAINS clause with other conditions. Use of an AND operator here is disallowed.

ContentSummary Queries

When performing a full-text (CBR) query, avoid referencing the ContentSummary column. The retrieval of this text column can slow the execution of the search.

FullTextRowLimit

FullTextRowLimit indicates the number of rows to pull from the full text index prior to executing the remainder of the query. There is one row for each object, regardless of the number of content elements. As noted in Full-Text Joins, data from the full-text index is copied to a temporary table, and that table is then joined with the remainder of the object store database to execute the query.

A query might have a CONTAINS clause that matches many thousands of rows in the full-text index. However, a user might not want to pull all these rows into the temporary database table prior to running the remainder of the query. In this case, you can set FullTextRowLimit to a lesser value, enabling the user to see a subset of the matches more quickly.

Do not set the FullTextRowLimit too low. For example, suppose a query has the WHERE clause "WHERE color = 'red' and CONTAINS(*, 'blue')". This query might have a thousand rows that match "CONTAINS(*, 'blue')", but only ten rows that match "color = 'red'" and "CONTAINS(*, 'blue')". If FullTextRowLimit is set to 500, then only 500 rows are pulled from the full-text index and written to the temporary table. However, those 500 rows might not be the same rows that also have "color = 'red'" in the database, so all ten rows might not be found as a result of the query.

If a value for FullTextRowLimit is not supplied, the value of the ObjectStore.FullTextRowDefault property (stored in the GCD) is used.

If a value for FullTextRowLimit is supplied and this value is greater than the value of ObjectStore.FullTextRowMaxcodeph>, then the value of ObjectStore.FullTextRowMax is used instead. The FullTextRowMax property is present on the ObjectStore class, enabling the system administrator to prevent queries from using too much processing time. The ObjectStore.FullTextRowMax property defaults to infinity unless changed by the system administrator.

FullTextRowLimit and Optimized Queries

A full-text query is optimized if the following is true:

An optimized full-text query can return results faster in some instances, and will also allow the user to retrieve more rows without hitting the FullTextRowLimit value, as well as possible timeout errors or out of memory conditions.

If a full-text query is optimized, when a request for the first page is generated, the server retrieves the FullTextRowLimit number of rows from the full-text search engine and stores them in the temporary database table, then executes the relational part of the query. However, if an insufficient number of rows are found to fill the current page, the server retrieves the next set of rows from the full-text search engine, repeating the process until enough rows are found. When the server pulls the second and subsequent set of rows from the full-text search engine, the server has the percentage of retrieved rows that are actually used in the resultant data to be returned to the user, and, therefore, it no longer uses the FullTextRowLimit, and instead retrieves the number of rows based on the prior percentage used.

When the second or subsequent page is requested for an optimized full-text query, the server also retrieves the next set of rows from the full-text search engine. Therefore, this type of query can be used to browse through an unlimited number of rows that match a content search.

Optimization might not be suitable for all types of queries: ordering by descending ContentSearch.Rank slows the relational part of the query, and some callers might not want the data to be returned in descending order.

A continuable query that is not optimized will not continue past the number of rows specified by the value of FullTextRowLimit. When the query is not optimized, the server cannot retrieve the next set of full-text hits for subsequent pages past the FullTextRowLimit. Instead, the server retrieves the same amount of rows for each page processed, using the FullTextRowLimit value for the number of rows.

FullTextRowLimit and Non-Optimized Queries

Queries that are not continuable pull only one set of a FullTextRowLimit number of rows from the full-text search engine. The value of FullTextRowLimit must be chosen carefully in this case.

Queries that are not optimized might return a subset of the number of rows matching the query if more than the FullTextRowLimit number of matches exists in the full-text index. If this condition occurs, the server throws a CBR_FULLTEXTROWLIMIT_EXCEEDED exception to notify the client that not all matches have been returned. This exception will be thrown when the caller iterates through the rows returned for the query, after the last row is found.

If a value for FullTextRowLimit is not supplied, the value used is the value of the ObjectStore.FullTextRowDefault property stored in the Global Configuration Data (GCD) database.

If a value for FullTextRowLimit is supplied and this value is greater than the value of the ObjectStore.FullTextRowMax property, the ObjectStore.FullTextRowMax value is used instead. The FullTextRowMax property enables system administrators to prevent queries from using too much processing time.

Query Result Ranking

Use the Rank property in the ORDER BY clause of a query to order the returned objects by query relevance. The Rank property belongs to the ContentSearch class.

Note: A query using the Rank property returns a maximum of 10,000 rows. This limit exists because sorting query results by rank can consume a significant amount of system resources. When larger result sets are needed, use of the Rank property is not recommended.

The following full-text search example orders the search results in descending order of relevance:

SELECT d.This 
    FROM Document d INNER JOIN ContentSearch c 
    ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.DocumentTitle, 'lion tiger^5')
    ORDER BY c.Rank DESCENDING

In this example, the DocumentTitle property is the searched text for each object. The search terms are lion and tiger. (As discussed in this topic, ^5 indicates the boost factor for tiger). The Rank property value for a returned object is calculated based on the searched text and the terms in accordance with the following factors:

Query Result Row Limits

Use the FULLTEXTROWLIMIT option and some related object store properties to limit the number of rows that a CBR query returns.

The following example shows how to specify a value for the FULLTEXTROWLIMIT option:

SELECT d.This FROM Document d 
    INNER JOIN ContentSearch c ON d.This = c.QueriedObject 
    WHERE CONTAINS(d.*,'lion AND tiger')
    ORDER BY c.Rank
    OPTIONS (FULLTEXTROWLIMIT 500)

The following object store properties interact with FULLTEXTROWLIMIT to determine the effective row limit for the query:

CBR Query Optimization

This feature specifies how searches that combine both a content-based retrieval (CBR) search and a relational search on a database (DB) are executed. By default, the Content Engine always performs the CBR search first and the DB search second. The CBR-first approach is most efficient when there are a small number of full text hits. Efficiency decreases, however, when there are many full-text hits, and there are fewer database hits than full-text hits.

To provide control over how combined searches are executed, the CBRQueryOptimization property can be set on the object store. As an alternative to the default CBR-first option, you can set the property to the dynamic switching option. In dynamic switching mode, the Content Engine dynamically determines whether to issue the CBR search first or the DB search first, optimizing performance for these types of searches.

In dynamic switching mode, the Content Engine switches from CBR first to DB first based on an estimated number of CBR hits. The estimate is compared to a threshold value, set in the CBRQueryDynamicThreshold property. If the number of full-text estimated hits is less than or equal to the CBRQueryDynamicThreshold value, the CBR search is executed first (CBR-first search). If the number of full-text estimated hits is larger than the CBRQueryDynamicThreshold value, the database search is executed first (DB-first search).

Dynamic switching operation is impacted by various search options, including requests for rank ordering. The CBRQueryRankOverride property on the object store determines how the server responds to CBR search requests for rank order, and can impact server performance.

There is no CBRQueryOptimization option for the server to always process combined searches as DB first (although query users can specify an SQL override option to force DB first). Nonetheless, you can configure the server for DB first behavior by setting the following combination of object store properties as specified:

However you configure the server to process combined CBR and database searches, follow the best practices to maximize performance. If you are considering the dynamic switching (optimized) mode, be aware of the limitations of running in this mode. In addition, some searches cannot be optimized in this mode; see Unsupported Searches.

SQL Query Overrides

The following SQL query options can be used to override the CBRQueryOptimization setting on the object store:

There are measures that you must take to ensure that DB-first searches execute efficiently. See Best Practices.

Note: DB-first execution is not supported for some types of searches. If you use a CBR_DB_FIRST or CBR_DYNAMIC_THRESHOLD option with an unsupported search, an exception is thrown.

In the following example, the search will be executed as DB first, overriding the object store configured values.

SELECT d.Id, d.DocumentTitle FROM Document d
    WHERE d.DocumentTitle LIKE 'Specification FaultTree%'
    AND CONTAINS(d.*, 'Tennesee AND Contract')
    OPTIONS (CBR_DB_FIRST)

Best Practices

Dynamic Switching: Limitations

While dynamic switching mode protects against inefficient performance due to too many full-text hits, there are limitations in running in this mode:

Dynamic Switching: Unsupported Searches

Some searches that combine both full text and relational query conditions cannot be optimized. The following table lists the searches that cannot be optimized, and states how the server responds to an unsupported search, with or without an SQL query override of the CBRQueryOptimization setting on the object store. As indicated by the dash ( - ), the CBRQueryOptimization setting is not applicable when an SQL query override is used.

Unsupported searches CBRQueryOptimization setting SQL query option override Server behavior
Uses outer join with CBR (ContentSearch class) DYNAMIC_SWITCHING none CBR first only
- CBR_DB_FIRST or CBR_DYNAMIC_THRESHOLD Throws error: RETRIEVE_OPTIMIZED_SEARCH_INVALID_OUTER_JOIN
Uses property conditions OR'd with the CBR Contains condition DYNAMIC_SWITCHING none CBR first only
- CBR_DB_FIRST or CBR_DYNAMIC_THRESHOLD Throws error: RETRIEVE_OPTIMIZED_SEARCH_INVALID_OUTER_JOIN
Uses content property conditions DYNAMIC_SWITCHING none CBR first only
- CBR_DB_FIRST or CBR_DYNAMIC_THRESHOLD Throws error: RETRIEVE_OPTIMIZED_SEARCH_INVALID_PROPERTY_CONDITIONS
Uses multiple join with with an outer join and a content search join that is not the last join on the right DYNAMIC_SWITCHING none CBR first only
- CBR_DB_FIRST or CBR_DYNAMIC_THRESHOLD Throws error: RETRIEVE_OPTIMIZED_SEARCH_INVALID_MULTIPLE_JOIN
Uses distinct with rank in the select list, and rank override is not set to REQUIRED (2) in the object store DYNAMIC_SWITCHING none CBR first only
- CBR_DB_FIRST or CBR_DYNAMIC_THRESHOLD Throws error: RETRIEVE_OPTIMIZED_SEARCH_INVALID_SELECT_DISTINCT
Uses distinct with rank in the select list, and rank override is set to REQUIRED (2) in the object store DYNAMIC_SWITCHING or NONE none Throws error: RETRIEVE_CANT_SELECT_DISTINCT_RANK_WITH_OVERRIDE_REQUIRED
Uses order by rank, and rank override is set to DISABLED (0) DYNAMIC_SWITCHING none CBR first only
Uses order by rank, and rank override is set to ENABLED (1) DYNAMIC_SWITCHING none CBR-first searches will be returned in rank order but DB-first searches will not.
Uses order by rank, and rank override is set to REQUIRED (2) DYNAMIC_SWITCHING none CBR-first and DB-first searches will not be returned in rank order.
Uses order by rank, and rank override is set to any value - CBR_DB_FIRST Throws error: RETRIEVE_OPTIMIZED_SEARCH_INVALID_RANK_ORDER
Uses order by rank, and rank override is set to DISABLED (0) or ENABLED (1) - CBR_DYNAMIC_THRESHOLD CBR-first searches will be returned in rank order but DB-first searches will not.
Uses order by rank, and rank override is set to REQUIRED (2) - CBR_DYNAMIC_THRESHOLD CBR-first and DB-first searches will not be returned in rank order.
Merged scope search: uses order by rank with rank override not set to REQUIRED (2). DYNAMIC_SWITCHING in any object store or NONE in all object stores none CBR first only, in rank order
- CBR_DB_FIRST Throws error: RETRIEVE_OPTIMIZED_SEARCH_INVALID_RANK_ORDER
- CBR_DYNAMIC_THRESHOLD Throws error: RETRIEVE_OPTIMIZED_SEARCH_INVALID_MERGED_SCOPE_RANK_ORDER


Feedback

Last updated: October 2013
query_sql_syntax_cbr.htm

© Copyright IBM Corporation 2014.
This information center is powered by Eclipse technology. (http://www.eclipse.org)