test=# SELECT country, sum(amount_sold) FROM t_sales GROUP BY 1; country | sum -----------+----- USA | 373 Germany | 104 Argentina | 137 (3 rows)
这里没什么好说的,除了我们将为每个小组得到一个总数。然而,有一点哲学上的讨论正在进行。GROUP BY 1 基本上是指 GROUP BY country,相当于 SELECT 子句中的第一列。因此,GROUP BY country 和 GROUP BY 1 是一回事:
1 2 3 4 5 6 7 8 9 10 11 12 13
test=# SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY 1, 2 ORDER BY 1, 2; country | product_name | sum -----------+--------------+----- Argentina | Hats | 111 Argentina | Shoes | 26 Germany | Hats | 41 Germany | Shoes | 63 USA | Hats | 171 USA | Shoes | 202 (6 rows)
当然,这也适用于不止一列。不过,我想指出一点。考虑以下示例:
1 2 3 4 5 6 7 8 9 10 11 12
test=# SELECT CASE WHEN country = 'USA' THEN 'USA' ELSE 'non-US' END, sum(amount_sold) FROM t_sales GROUP BY 1; case | sum --------+----- USA | 373 non-US | 241 (2 rows)
GROUP BY 将把一列中每一个不同的条目变成一个组。有时您可能想一次做更多的分组。为什么有这个必要呢?假设你正在处理一个 10TB 的表。显然,读取这些数据通常是性能方面的限制因素。因此,一次读取数据并一次产生更多的结果是很有吸引力的。这正是您可以用 GROUP BY GROUP SETS 做的事情。假设我们想一次产生两个结果:
GROUP BY country
GROUP BY product_name
这是它如何工作的:
1 2 3 4 5 6 7 8 9 10 11 12
test=# SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY GROUPING SETS ((1), (2)) ORDER BY 1, 2; country | product_name | sum -----------+--------------+----- Argentina | | 137 Germany | | 104 USA | | 373 | Hats | 323 | Shoes | 291 (5 rows)
在这种情况下,PostgreSQL 只是简单地附加结果。前三行代表 GROUP BY country。接下来的两行包含 GROUP BY product_name 的结果。从逻辑上讲,它相当于以下查询:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
test=# SELECT NULL AS country , product_name, sum(amount_sold) FROM t_sales GROUP BY 1, 2 UNION ALL SELECT country, NULL, sum(amount_sold) FROM t_sales GROUP BY 1, 2 ORDER BY 1, 2; country | product_name | sum -----------+--------------+----- Argentina | | 137 Germany | | 104 USA | | 373 | Hats | 323 | Shoes | 291 (5 rows)
但是,GROUPING SETS 版本的效率更高,因为它只需要读取一次数据。
ROLLUP: 添加“底线”
在创建报告时,您通常需要“底线”来总结表中显示的内容。在 SQL 中这样做的方法是使用 GROUP BY ROLLUP:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
test=# SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY ROLLUP (1, 2) ORDER BY 1, 2; country | product_name | sum -----------+--------------+----- Argentina | Hats | 111 Argentina | Shoes | 26 Argentina | | 137 Germany | Hats | 41 Germany | Shoes | 63 Germany | | 104 USA | Hats | 171 USA | Shoes | 202 USA | | 373 | | 614 (10 rows)
test=# SELECT CASE WHEN country IS NULL THEN 'TOTAL' ELSE country END, CASE WHEN product_name IS NULL THEN 'TOTAL' ELSE product_name END, sum FROM (SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY ROLLUP (1, 2) ORDER BY 1, 2 ) AS x; country | product_name | sum -----------+--------------+----- Argentina | Hats | 111 Argentina | Shoes | 26 Argentina | TOTAL | 137 Germany | Hats | 41 Germany | Shoes | 63 Germany | TOTAL | 104 USA | Hats | 171 USA | Shoes | 202 USA | TOTAL | 373 TOTAL | TOTAL | 614 (10 rows)