IBM PureData System for Analytics, Version 7.1

Window aggregation on a grouping select

This example goes through the steps for writing a SQL query that finds the top city in each region that is based on annual sales totals.

The following example ranks the cities that are based on their sales amount:
SELECT *, RANK() OVER (ORDER BY amt DESC) AS ranking FROM sales_tbl 
WHERE region = 'Northeast' or region = 'Central' ORDER BY amt DESC; 
city       | state |  region   | quarter | amt  | ranking 
-----------+-------+-----------+---------+------+---------
 New York  | NY    | Northeast |       4 | 5000 |       1
 New York  | NY    | Northeast |       3 | 4300 |       2
 New York  | NY    | Northeast |       2 | 3700 |       3
 New York  | NY    | Northeast |       1 | 3000 |       4
 Baltimore | MD    | Central   |       2 | 2500 |       5
 Baltimore | MD    | Central   |       4 | 2500 |       5
 Boston    | MA    | Northeast |       4 | 2400 |       7
 Boston    | MA    | Northeast |       1 | 2000 |       8
 Baltimore | MD    | Central   |       1 | 2000 |       8
 Baltimore | MD    | Central   |       3 | 2000 |       8
 Atlanta   | GA    | Central   |       4 | 1700 |      11
 Boston    | MA    | Northeast |       3 | 1700 |      11
 Atlanta   | GA    | Central   |       3 | 1600 |      13
 Boston    | MA    | Northeast |       2 | 1500 |      14
 Atlanta   | GA    | Central   |       2 | 1500 |      14
 Atlanta   | GA    | Central   |       1 | 1500 |      14
(16 rows)
The following example throws an error, as aggregates are not allowed in the WHERE clause:
SELECT *, RANK() OVER (ORDER BY amt DESC) AS ranking FROM sales_tbl 
WHERE ranking <= 2;
ERROR:  Aggregates not allowed in WHERE clause
Get the top two by selecting cities with a rank of less than 2:
SELECT * FROM ( SELECT *, RANK() OVER (ORDER BY amt DESC) AS ranking 
FROM sales_tbl WHERE region = 'Northeast' or region = 'Central') AS 
subset WHERE ranking <= 2; 
CITY     | STATE |  REGION   | QUARTER | AMT  | PROFIT_MARGIN | RANKING 
---------+-------+-----------+---------+------+---------------+---------
New York |    NY | Northeast |       4 | 5000 |            20 |       1
New York |    NY | Northeast |       3 | 4300 |            22 |       2
(2 rows)
Now rank the cities that are based on total sales for the year. So first compute total sales for each city for each year:
SELECT city, state, region, SUM(amt) AS yr_sales FROM sales_tbl WHERE 
region = 'Northeast' or region = 'Central' GROUP BY region, state, 
city; 
CITY      | STATE |  REGION   | YR_SALES 
----------+-------+-----------+----------
Boston    | MA    | Northeast |     7600
New York  | NY    | Northeast |    16000
Baltimore | MD    | Central   |     9000
Atlanta   | GA    | Central   |     6300
(4 rows)
Now rank the cities on total sales.
SELECT city, state, region, SUM(amt) AS yr_sales, RANK() OVER 
(PARTITION BY region ORDER BY SUM(amt)) FROM sales_tbl WHERE region = 
'Northeast' or region = 'Central' GROUP BY region, state, city;
CITY      | STATE |  REGION   | YR_SALES | RANK 
----------+-------+-----------+----------+------
Boston    | MA    | Northeast |     7600 |    1
New York  | NY    | Northeast |    16000 |    2
Atlanta   | GA    | Central   |     6300 |    1
Baltimore | MD    | Central   |     9000 |    2
(4 rows)

When a window aggregate and a GROUP BY clause display in a single query, the GROUP BY is evaluated first, and the window aggregate function is evaluated on the result of the GROUP BY.

The following example puts it all together to find the top two cities that are based on total sales:
SELECT * FROM (SELECT city, state, region, SUM(amt) AS yr_sales, RANK 
() OVER (PARTITION BY region ORDER BY SUM(amt) DESC) AS ranking FROM 
sales_tbl WHERE region = 'Northeast' or region = 'Central' GROUP BY 
region, state, city) subset WHERE ranking = 1 ORDER BY yr_sales DESC;
city      | state |  region   | yr_sales | ranking 
----------+-------+-----------+----------+---------
New York  | NY    | Northeast |    16000 |       1
Baltimore | MD    | Central   |     9000 |       1
(2 rows)


Feedback | Copyright IBM Corporation 2014 | Last updated: 2014-02-28