The quantity of data stored in data warehouse environments is growing at an unprecedented rate. There are several reasons for this growth. For example:
- Database users are retaining enormous amounts of detailed data such as transaction history, web search queries, and detailed phone records.
- As data mining algorithms continue to improve, and as increasing processing power becomes available, organizations are analyzing much older historical data to predict future trends more accurately.
- Stricter regulations and audit standards now require businesses to keep data for longer periods of time than previously.
- Many businesses are eliminating the cost of keeping paper-based records by switching to web-based records.
Given the high cost associated with enterprise storage, it is a challenge for a database administrator (DBA) to justify the need to move the entire warehouse to faster leading-edge storage and delegate the slower (but still expensive) hardware for in-house testing. The advent of solid state drives (SSD) further accelerates the need for a tiered storage configuration where some data resides on the faster SSD disks while other data resides on slower magnetic disk. The DB2 V10 release offers a wide range of features and tools to support a multi-tiered warehouse where part of table data can reside on SSD media while the remaining (historical data) within the same table resides on slower, less expensive hardware, thereby improving the overall total cost of ownership (TCO) for the warehouse.
In database systems, there is a strong tendency for a relatively small proportion of data to be hot data and the majority of the data to be warm or cold data. These sets of multi-temperature data pose considerable challenges if you want to optimize the use of fast storage by trying not to store cold data there. As a data warehouse consumes increasing amounts of storage, optimizing the use of fast storage becomes increasingly important in managing storage costs.
DB2 V10 introduces storage groups wherein each storage group represents a tier of storage in the warehouse. A storage group provides for logical grouping of paths on storage subsystems with similar attributes. By associating automatic storage table spaces with a storage group, database administrators can place data that is accessed more frequently into faster storage while data that is accessed less often is still available, albeit with a slower response time.
As time goes by, newer data is ingested in the warehouse. The existing hot data is no longer accessed as much as before and needs to make way for the newly ingested data. This is achieved by the DBA changing the storage group that the table space is currently associated with. While the table space data moves from hot storage to warm storage, the data continues to be available to queries. The actual data movement happens asynchronously and can be suspended to allow for more important query workloads to run faster. The asynchronous data movement can then be resumed later when the reporting workload on the system is light.
InfoSphere Optim Configuration Manager V2.1.1 includes a job manager that you can use to define one or more scheduled data migration jobs. This allows the DBA to define an age based policy of when data partitions automatically move to a different storage tier.
Using DB2 WLM, you can now prioritize an activity based on the data that the activity accesses, either before the activity executes (predictively) or while the activity is executing (reactively).
To prioritize an activity, you use a combination of a data tag, which is a numeric identifier applied to a table space or storage group, and WLM controls. For example, if you have a table space IMPORTANT_TS containing critical data that has a data tag assigned to it, you could map any query that reads data from a table in this table space to a service class that is allocated a higher percentage of overall CPU cycles on the system.
You can assign a data tag directly to a table space or assign the data tag to the storage group for the table space and have the table space inherit the data tag from the storage group.
Predictive prioritization using work class and work action sets uses an estimated data tag list that is obtained for an activity at compile time, similar to cost and cardinality estimates. The estimated data tag list contains the data tags for all table spaces that the compiler believes will be accessed during execution of the activity. You can define work class sets to identify activities that have a particular data tag in their estimated data tag lists. You can then define a work action to map any activities matching a work class set to a specific service class before they begin to execute.
Reactive prioritization using the new DATATAGINSC threshold maps an activity to a different service class at run time when the activity accesses data that is assigned a particular data tag. For example, you can specify that an activity will be mapped to a different service class when it reads data from a table space with data tag value of 3. Reactive prioritization is useful if the compiler cannot accurately estimate the list of data tags for the activity. An example of such a case is a query against a range-partitioned table that uses parameter markers. The compiler cannot necessarily determine what table ranges are accessed in advance.