DB2 Version 9.7 for Linux, UNIX, and Windows

Annotated XML schema decomposition and recursive XML documents

XML schemas containing recursion can be registered in the XML schema repository (XSR) and enabled for decomposition, with the restriction that the recursive relationships themselves cannot be decomposed as scalar values into a target table. By using appropriate schema annotations, the recursive sections can be stored and later retrieved as serialized markup.

Types of recursion

An XML schema is said to be recursive when the definition of types in it allow for elements of the same name and type to appear in their own definition. Recursion may be explicit or implicit.

Explicit recursion
Explicit recursion occurs when an element is defined in terms of itself. This is shown in the following example, where the element <root> is explicitly referred to in its own definition using the ref element declaration attribute:
<xs:element name="root">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="a" type="xs:string"/>
      <xs:element name="b">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="c" type="xs:string"/>
            <xs:element ref="root" minOccurs=”0”/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:element>
With explicit recursion, a recursive branch is delimited as follows:
  • The start of a recursive branch is a declaration of element Y whose ancestors do not consist of another element declaration of Y. The start of a recursive branch can have multiple branches of descendants; for a particular descendant branch, if the branch has another element declaration of Y, the branch is considered a recursive branch.
  • The end of a recursive branch is the highest level element declaration of Y that is a descendant of the start of the branch. Note that the end of branch is specifically an element reference
The node that is a start of a recursive branch can serve as the starting node for multiple recursive branches. In the following example there are two explicitly recursive branches:
  1. <root> (*), <b>, <root> (**)
  2. <root> (*), <b>, <root> (***)
<xs:element name="root"> <!-- * -->
  <xs:complexType>
  <xs:sequence>
    <xs:element name="a" type="xs:string"/>
    <xs:element name="b">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="c" type="xs:string"/>
        <xs:element ref="root" minOccurs=”1”/>  <!-- ** -->
        <xs:element ref="root" minOccurs=”1”/>  <!-- *** -->
      </xs:sequence>
    </xs:complexType>
    </xs:element>
  </xs:sequence>
  </xs:complexType>
</xs:element>

A recursive branch delineates how its member elements are decomposed. In the instance document, the occurrence of element Y that corresponds to the start of the recursive branch, and its descendants, up to the occurrence of Y that corresponds to the end of that branch, can be decomposed as scalar values. The occurrence of Y in the instance document corresponding to the end of the recursive branch, marks the recursive region. The recursive region begins with the starting element tag of this occurrence of Y, and ends with the end element tag of the occurrence. All elements and attributes in the instance document that are in this recursive region can be decomposed as markup or as string values, depending on the value specified for the db2-xdb:contentHandling decomposition annotation.

Implicit recursion
Implicit recursion occurs when an element with a complex type definition contains another element, also defined as a complex type, where the latter has as its type attribute the name of a complex type definition of which it is a part. This is shown in the following example, where the element <beginRecursion> refers to the type “rootType” and the element <beginRecursion> is itself part of the type “rootType” being defined:
<xs:element name="root" type="rootType"/>
<xs:complexType name="rootType">
  <xs:sequence>
  <xs:element name="a" type="xs:string"/>
    <xs:element name="b">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="c" type="xs:string"/>
          <xs:element name="beginRecursion" type="rootType" minOccurs=”0”/>
        </xs:sequence>
      </xs:complexType>
    </xs:element>
  </xs:sequence>
</xs:complexType>
With implicit recursion, a recursive branch is delimited as follows:
  • The start of a recursive branch is a declaration of element Y of complexType type CT whose ancestors do not consist of another element declaration of type CT. The start of a recursive branch can have multiple branches of descendants; for a particular descendant branch, if the branch has another element declaration of Z of type CT, the branch is considered a recursive branch.
  • The end of a recursive branch is the highest level element declaration of type CT that is a descendant of the start of the branch.
The node that is a start of a recursive branch can serve as the starting node for multiple recursive branches. In the following example there are two implicitly recursive branches:
  1. <root>, <b>, <beginRecursion>
  2. <root>, <b>, <anotherRecursion>
<xs:element name="root" type="rootType"/>
<xs:complexType name="rootType">
  <xs:sequence>
  <xs:element name="a" type="xs:string"/>
  <xs:element name="b">
    <xs:complexType>
    <xs:sequence>
      <xs:element name="c" type="xs:string"/>
      <xs:element name="beginRecursion" type="rootType" minOccurs=”2”/>
      <xs:element name="anotherRecursion" type="rootType" minOccurs=”0”/>
    </xs:sequence>
    </xs:complexType>
  </xs:element>
  </xs:sequence>
</xs:complexType>

There is a slight difference in how this second, implicit type of recursion is decomposed, as compared to explicit recursion. In the instance document, the occurrence of element Y that corresponds to the start of the recursive branch, and its descendants, up to the occurrence of Z that corresponds to the end of that branch, can be decomposed as scalar values. This occurrence of Z in the instance document marks the recursive region. The recursive region begins after the starting element tag of Z, and ends immediately before the end element tag of Z. All element descendants of this occurrence of Z lie in this recursive region. However, the attributes of this occurrence are outside the recursive region and can therefore be decomposed as scalar values.

Decomposition behavior for recursive branches

For both types of recursion, the recursive branch delineates non-recursive and recursive regions in the corresponding part of the instance document. Only the non-recursive regions of an XML instance document can be decomposed as scalar values into a target database table. This restriction includes any non-recursive regions within one branch that are part of a recursive region of an enclosing branch. That is, if recursive branch RB2 is completely encompassed by recursive branch RB1, then for some instances of RB2 in the instance XML document, its non-recursive region can fall inside the recursive region of an instance of RB1. In this case, this non-recursive region cannot be decomposed as scalar values; instead it is part of the markup which is the decomposition result for RB1. For any instance of RB2, only the non-recursive region of the instance that is not inside any other recursive region can be decomposed as scalar values.

For example, the following XML schema contains two recursive branches:
  1. RB1 (<root> (identified with *), <b>, <root> (identified with **))
  2. RB2 (<d>, <d>)
<xs:element name="d">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="d">
    </xs:sequence>
    <xs:attribute name="id" type="xs:int"/>
  </xs:complexType>
</xs:element>
<xs:element name="root"> <!-- * -->
  <xs:complexType>
  <xs:sequence>
    <xs:element name="a" type="xs:string"/>
       <xs:element ref="d"/>
    <xs:element name="b">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="c" type="xs:string"/>
        <xs:element ref="root" minOccurs=”1”/>  <!-- ** -->
      </xs:sequence>
    </xs:complexType>
    </xs:element>
  </xs:sequence>
  </xs:complexType>
</xs:element>
The recursive regions of an associated instance document are highlighted below. There are two instances of RB2 (<d>, <d>) in the instance document, but only the non-recursive region of the first instance of RB2 (<d> identified by #) can be decomposed as scalar values. That is, the attribute id="1" can be decomposed. The non-recursive region of the second instance of RB2 is completely within the second highlighted area, which is a recursive region of the instance of RB1. Therefore, the attribute id="2" cannot be decomposed.
<root>
  <a>a str1</a>
  <d id="1"> <d id="11"> </d> </d>
  <b>
    <c>c str1</c>
    <root>
      <a>a str11</a>
      <d id="2"> <d id="22"> </d> </d>
      <b>
      <c>c str11</c>
      </b>
    </root>
  </b>
</root>

Example: Using the db2-xdb:contentHandling decomposition annotation with both types of recursion

This example demonstrates decomposition behavior for both the explicit and implicit type of recursion, and the results of setting different values for the db2-xdb:contentHandling annotation. In the following two XML instance documents the recursive regions are highlighted.

In Document 1, recursion begins when the <root> element appears below itself:
<root>
  <a>a str1</a>
  <b>
    <c>c str1</c>
    <root>
      <a>a str11</a>
      <b>
        <c>c str11</c>
      </b>
    </root>
  </b>
</root>
In Document 2, recursion begins for elements below the element <beginRecursion>:
<root>
  <a>a str2</a>
  <b>
    <c>c str2</c>
    <beginRecursion>
      <a>a str22</a>
      <b>
        <c>c str22</c>
      </b>
    </beginRecursion>
  </b>
</root>

In an instance document, all elements or attributes and their contents that appear between the beginning of recursion and end of recursion cannot be decomposed as scalar values into table-column pairs. However a serialized markup version of the items between the beginning of recursion and end of recursion can be obtained by annotating an element (of complexType) in the recursive branch with the db2-xdb:contentHandling attribute set to “serializeSubtree”. A text serialization of all the character data in this part can also be obtained by setting db2-xdb:contentHandling to “stringValue”. In general, the content or markup of the recursive path can be obtained by setting the db2-xdb:contentHandling attribute appropriately at any complexType element of the recursive branch or on an element that is an ancestor of the elements in the recursive branch.

For instance, annotating element <b> in the following XML schema:
<xs:element name="root">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="a" type="xs:string"/>
      <xs:element name="b" 
             db2-xdb:rowSet=”TABLEx” 
             db2-xdb:column=”COLx” 
             db2-xdb:contentHandling=”serializeSubtree”>
        <xs:complexType>
          <xs:sequence>
            <xs:element name="c" type="xs:string"/>
            <xs:element ref="root" minOccurs=”0”/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:element>
results in this XML fragment being inserted into a row of TABLEx, COLx when Document 1 is decomposed:
  <b>
    <c>c str1</c>
    <root>
      <a>a str11</a>
      <b>
        <c>c str11</c>
      </b>
    </root>
  </b>
Similarly, annotating element “beginRecursion” in the following XML schema:
<xs:element name="root" type="rootType"/>
<xs:complexType name="rootType">
  <xs:sequence>
    <xs:element name="a" type="xs:string"/>
    <xs:element name="b">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="c" type="xs:string"/>
          <xs:element name="beginRecursion" 
            type="rootType" minOccurs=”0”             
            db2-xdb:rowSet=”TABLEx” 
            db2-xdb:column=”COLx” 
            db2-xdb:contentHandling=”serializeSubtree”/>
        </xs:sequence>
      </xs:complexType>
    </xs:element>
  </xs:sequence>
</xs:complexType>
results in this XML fragment being inserted into a row of TABLEx, COLx when Document 2 is decomposed:
    <beginRecursion>
      <a>a str22</a>
      <b>
        <c>c str22</c>
      </b>
    </beginRecursion>