Dr. Jon A. Lind, InfoSphere Warehouse Product Manager, DB2 Warehousing Product Management,
Shamit Bagchi, DB2 & PureData Data Warehouse Product Marketing
The DB2 Technology Preview is an exciting new accelerator for analytics. Featuring a column store for tables providing not only fantastic performance but storage savings due to the column store architecture and extreme compression algorithms. This new technology that has been developed by IBM and integrated directly into the DB2 engine.
When IBM began to product-ize the technology IBM had to hit three key value tenants:
- Lower operating costs
- Optimize hardware
- Extreme performance.
IBM was able to hit all these marks through the engineering mantra known as “the 7 big ideas”, which are the fundamental break show cased in the technology preview.
Lower Operation Costs:
Column Store – by storing table data in column organized format we not only save significantly on storage costs but we also improve I/O and memory efficiency. This lowers operating costs
Simple to Implement and Use – as has already been mentioned, you just create the column organized table and then load and go. That’s it – no tuning, no indexes, etc. This lowers administration and development costs significantly
Extreme Compression – by using compression and sophisticated encoding algorithms, the technology preview shows how we can save significantly on storage costs including power, cooling, and management of that storage
Extreme Compression – in addition to the lower costs, the compression algorithms used exploit processor characteristics to improve performance. The compression we use works with a register friendly encoding technique to improve processor efficiency
Deep Hardware Instruction Exploitation – we will discuss this in more detail on a future slide, but with SIMD (see below) processing we are multiplying the performance of the processor by having instructions work on multiple data elements simultaneously
Core Friendly Parallelism – Access plans on column organized tables will leverage all of the cores on the server simultaneously to delver better analytic query performance
Optimal Memory Caching – With row organized tables, a full table scan ends up putting data into the bufferpool that is often not required. For column organized tables, if there are columns that are involved in joins or other predicates in many queries then we can pack the bufferpool full of those columns while keeping other columns out of memory if they are not regularly used. This improves performance and optimized the memory available
Optimal Memory Caching – as stated above, this not only helps to optimize hardware but also improves overall workload performance
Data Skipping – by keeping track of which pages of data contain which column values, we can avoid a lot of I/O and query processing by simply skipping data we already know would not qualify for the query
Column Store – in addition to lowering costs, by selecting only columns that are part of a query we can increase performance of queries by an order of magnitude in some cases
SIMD: (Single Instruction stream Multiple Data stream) - A computer that performs one operation on multiple sets of data. It is typically used to add or multiply eight or more sets of numbers at the same time for multimedia encoding and rendering as well as scientific applications. Hardware registers are loaded with numbers, and the mathematical operation is performed on all registers simultaneously.
To learn more about big data management, join IBM for a free Big Data Event