目录
分组
选择having而不是where
聚集函数
隐式显式分组
count()
分组形式
单列分组
多行分组
表达式分组
对每个分组产生合计数
分组过滤条件
分组
group by子句根据列值对数据进行分组。例如你不想查看雇员和他们所处部门的列表,而是想要获取部门和它所拥有雇员数的清单。下面的查询首先为每个部门计算所含雇员数,然后返回至少包含2个雇员的部门名称。
SELECT ,COUNT(e.emp_id) num_employees FROM department d
-> INNER JOIN employee e ON d.dept_id=e.dept_id
-> GROUP BY HAVING COUNT(e.emp_id)>2;
+----------------+---------------+
| name | num_employees |
+----------------+---------------+
| Operations | 14 |
| Administration | 3 |
+----------------+---------------+
选择having而不是where
我们想要查看每个柜员创建了多少账户。利用聚集函数count()计算每个分组的行数,*表示对分组的所有列计数。若是想过滤数据,并且过滤条件针对分组数据而不是原始数据。由于group by子句在where子句被评估之后运行,所以无法利用where子句增加过滤条件。所以必须使用having子句。
SELECT open_emp_id,COUNT(*) how_many FROM account
-> WHERE COUNT(*)>4 GROUP BY open_emp_id; 在where子句时,分组还未被创建
ERROR 1111 (HY000): Invalid use of group function
SELECT open_emp_id,COUNT(*) how_many FROM account
-> GROUP BY open_emp_id HAVING COUNT(*)>4;
+-------------+----------+
| open_emp_id | how_many |
+-------------+----------+
| 1 | 8 |
| 10 | 7 |
| 16 | 6 |
+-------------+----------+
聚集函数
Max() | 返回集合中最大值 | 忽略遇到的null值 |
Min() | 返回集合中最小值 | |
Avg() | 返回集合中平均值 | |
Sum() | 返回集合中所有值的和 | |
Count() | 返回集合中值的个数 |
SELECT MAX(avail_balance) max_balance,MIN(avail_balance) min_balance,
-> AVG(avail_balance) avg_balance, SUM(avail_balance) sum_balance,
-> COUNT(*) num_accounts FROM account WHERE product_cd = 'CHK';
+-------------+-------------+-------------+-------------+--------------+
| max_balance | min_balance | avg_balance | sum_balance | num_accounts |
+-------------+-------------+-------------+-------------+--------------+
| 38552.05 | 122.37 | 7300.800985 | 73008.01 | 10 |
+-------------+-------------+-------------+-------------+--------------+
除了使用列作为聚集函数的参数外,还可以创建表达式作为参数。
SELECT MAX(pending_balance - avail_balance) max_uncleared FROM account;
+---------------+
| max_uncleared |
+---------------+
| 660.00 |
+---------------+
隐式显式分组
在上一例子中,用过滤条件product_cd = 'CHK'指定分组上的所有行,属于隐式分组。我们想获取所有product_cd的分组结果,即需要额外的一列product_cd 值。
SELECT product_cd MAX(avail_balance) max_balance,MIN(avail_balance) min_balance,
-> AVG(avail_balance) avg_balance, SUM(avail_balance) sum_balance,
-> COUNT(*) num_accounts FROM account;
由于上面命令并没有显式地指定如何对数据分组而导致查询失败,所以要为它增加一个group by子句以指定聚集函数所作用行的分组
SELECT product_cd, MAX(avail_balance) max_balance,MIN(avail_balance) min_balance,
-> AVG(avail_balance) avg_balance, SUM(avail_balance) sum_balance,
-> COUNT(*) num_accounts FROM account GROUP BY product_cd;
+------------+-------------+-------------+--------------+-------------+--------------+
| product_cd | max_balance | min_balance | avg_balance | sum_balance | num_accounts |
+------------+-------------+-------------+--------------+-------------+--------------+
| BUS | 9345.55 | 0.00 | 4672.774902 | 9345.55 | 2 |
| CD | 10000.00 | 1500.00 | 4875.000000 | 19500.00 | 4 |
| CHK | 38552.05 | 122.37 | 7300.800985 | 73008.01 | 10 |
| MM | 9345.55 | 2212.50 | 5681.713216 | 17045.14 | 3 |
| SAV | 767.77 | 200.00 | 463.940002 | 1855.76 | 4 |
| SBL | 50000.00 | 50000.00 | 50000.000000 | 50000.00 | 1 |
+------------+-------------+-------------+--------------+-------------+--------------+
count()
当使用count确定每个分组成员数目时,可以选择对分组中的所有成员计数还是利用distinct关键字只计数某个列的不同值
SELECT account_id,open_emp_id FROM account;
+------------+-------------+
| account_id | open_emp_id |
+------------+-------------+
| 10 | 1 |
| 11 | 1 |
| 12 | 1 |
| 14 | 1 |
| 15 | 1 |
| 21 | 1 |
| 22 | 1 |
| 23 | 1 |
| 1 | 10 |
| 2 | 10 |
| 3 | 10 |
| 4 | 10 |
| 5 | 10 |
| 17 | 10 |
| 27 | 10 |
| 7 | 13 |
| 8 | 13 |
| 29 | 13 |
| 13 | 16 |
| 18 | 16 |
| 19 | 16 |
| 24 | 16 |
| 25 | 16 |
| 28 | 16 |
+------------+-------------+
SELECT COUNT(open_emp_id) FROM account;
+--------------------+
| COUNT(open_emp_id) |
+--------------------+
| 24 |
+--------------------+
SELECT COUNT(DISTINCT open_emp_id) FROM account;
+-----------------------------+
| COUNT(DISTINCT open_emp_id) |
+-----------------------------+
| 4 |
+-----------------------------+
分组形式
单列分组
SELECT product_cd, SUM(avail_balance) prod_balance FROM account GROUP BY product_cd;
+------------+--------------+
| product_cd | prod_balance |
+------------+--------------+
| BUS | 9345.55 |
| CD | 19500.00 |
| CHK | 73008.01 |
| MM | 17045.14 |
| SAV | 1855.76 |
| SBL | 50000.00 |
+------------+--------------+
多行分组
SELECT product_cd, open_branch_id, SUM(avail_balance) tot_balance FROM account
-> GROUP BY product_cd, open_branch_id ORDER BY product_cd;
+------------+----------------+-------------+
| product_cd | open_branch_id | tot_balance |
+------------+----------------+-------------+
| BUS | 2 | 9345.55 |
| BUS | 4 | 0.00 |
| CD | 1 | 11500.00 |
| CD | 2 | 8000.00 |
| CHK | 1 | 782.16 |
| CHK | 2 | 3315.77 |
| CHK | 3 | 1057.75 |
| CHK | 4 | 67852.33 |
| MM | 1 | 14832.64 |
| MM | 3 | 2212.50 |
| SAV | 1 | 767.77 |
| SAV | 2 | 700.00 |
| SAV | 4 | 387.99 |
| SBL | 3 | 50000.00 |
+------------+----------------+-------------+
表达式分组
SELECT EXTRACT(YEAR FROM start_date) year, COUNT(*) how_many FROM employee
-> GROUP BY EXTRACT(YEAR FROM start_date);
+------+----------+
| year | how_many |
+------+----------+
| 2005 | 2 |
| 2006 | 8 |
| 2015 | 1 |
| 2007 | 3 |
| 2008 | 2 |
| 2004 | 2 |
+------+----------+
对每个分组产生合计数
对结果进行分组后,每一个分组的总和也要显示在每组数据的下面。利用with rollup选项来请求数据库服务器完成。
SELECT product_cd, open_branch_id, SUM(avail_balance) tot_balance
-> FROM account GROUP BY product_cd, open_branch_id WITH ROLLUP;
+------------+----------------+-------------+
| product_cd | open_branch_id | tot_balance |
+------------+----------------+-------------+
| BUS | 2 | 9345.55 |
| BUS | 4 | 0.00 |
| BUS | NULL | 9345.55 |
| CD | 1 | 11500.00 |
| CD | 2 | 8000.00 |
| CD | NULL | 19500.00 |
| CHK | 1 | 782.16 |
| CHK | 2 | 3315.77 |
| CHK | 3 | 1057.75 |
| CHK | 4 | 67852.33 |
| CHK | NULL | 73008.01 |
| MM | 1 | 14832.64 |
| MM | 3 | 2212.50 |
| MM | NULL | 17045.14 |
| SAV | 1 | 767.77 |
| SAV | 2 | 700.00 |
| SAV | 4 | 387.99 |
| SAV | NULL | 1855.76 |
| SBL | 3 | 50000.00 |
| SBL | NULL | 50000.00 |
| NULL | NULL | 170754.46 |
+------------+----------------+-------------+
分组过滤条件
SELECT product_cd, SUM(avail_balance) prod_balance FROM account WHERE status = 'ACTIVE'
-> GROUP BY product_cd HAVING SUM(avail_balance) >= 10000;
+------------+--------------+
| product_cd | prod_balance |
+------------+--------------+
| CD | 19500.00 |
| CHK | 73008.01 |
| MM | 17045.14 |
| SBL | 50000.00 |
+------------+--------------+
SELECT product_cd, SUM(avail_balance) prod_balance FROM account WHERE status = 'ACTIVE'
-> GROUP BY product_cd HAVING MAX(avail_balance) <= 10000 AND MIN(avail_balance) >= 1000;
+------------+--------------+
| product_cd | prod_balance |
+------------+--------------+
| CD | 19500.00 |
| MM | 17045.14 |
+------------+--------------+