IBM PureData System for Analytics, Version 7.1

Overview of analytic functions

You use analytic functions to compute cumulative, moving, centered, and reporting aggregates.

IBM® Netezza® SQL analytic functions compute an aggregate value that is based on a group of rows. But unlike aggregate functions, they return multiple rows for each group. Analytic functions answer the following questions:

What is the running total?
What are the percentages within a group?
What are the top n queries?
What is the moving average?

The Netezza SQL functions can be divided into the following families. Some functions are used in more than one family:

Reporting/Window aggregate family

Use reporting functions to compare values at different levels of aggregation because they can do multiple passes over the data in a single query block and then return the same aggregate value for every row in a partition. You can use reporting functions in percent-of-total and market share calculations. For instance, you might want to know regional sales levels as a percent of national sales.

Window functions answer questions such as “what is the 12-week moving average of a stock price?” or “what was the cumulative sum of sales per each region?”

For all SQL aggregate functions, reporting functions provide reporting aggregate processing, while window functions provide moving and cumulative processing. These SQL aggregate functions include: avg, sum, min, max, count, variance, and stddev.

Lag/Lead family

Use these functions to compare different rows of a table by specifying an offset from the current row. You can use these functions to analyze change and variation. The functions include lag and lead.

First/Last family

Use these functions to specify sorted aggregate groups and return the first or last value of each group. For example, you can query bank statements for beginning and ending monthly balances. These functions include first_value and last_value.

Ranking family

These functions help to answer questions such as “what are the top 10 and bottom 10 selling items?” The functions examine the entire output before they display an answer. These functions include ntile, dense_rank, percent_rank, cume_dist, and rank.

Row count family

The row_number function assigns a number to each row that is based on its position within the window partition. It is similar to dense_rank and rank, but unlike the rank functions, it counts ties (peer rows, that is, rows that match on the order by column).

Hypothetical set family

These functions give the rank or percentile that a row would have if inserted into a specified data set. These functions include dense_rank, percent_rank, cume_dist, and rank.

Inverse distribution functions family

These functions give the value in a data set that corresponds to a specified percentile. These functions include percentile_cont and percentile_disc, however, these two functions are not supported as window aggregates.