SQL常见操作
以下学习内容为巩固提升,为后面做案例打下铺垫
聚合函数使用
导入house_price.csv
计算价格的平均值
select avg(Price) as avg_price from house_price
注:avg(Neighborhood) 对字符串进行avg操作返回0,这里的0其实代表无法计算,返回默认的0
如果列中出现"NULL",则跳过对应的行数据,计算其他非空行的数据的均值
计算数据的总量
select count(*) from house_price
以上计算的是所有数据,包含空的行
select count(price) from house_price
以上计算的是price列中非空的行数
其他的聚合函数
SUM(xxx):对非空的数据进行求和
MAX(xxx):求最大的值
MAX(字符串):求出字符串按照从左到右按照字典序排序输出
MIN(xxx):同理
DISTINCT:去重
统计Bedrooms中不同的取值
select DISTINCT Bedrooms from house_price
统计Price中不同的取值的个数
select count(DISTINCT Price) from house_price
##分组
找出house_price中Price不同取值的每个取值的个数
select Price,count(*) from house_price group by Bedrooms
MySQL 版本>5.6的问题
--[Err] 1055 - Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
--- 问题原因 sql_mode=only_full_group_by
-- 解决方案
SET sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));
找出house_price中不同取值(例如:取值为2,则为两室,为3则为3室)的Bedrooms的每个取值的房屋总价格
select Bedrooms,sum(price) from house_price group by Bedrooms;
取别名后,可以以别名进行分组
select Bedrooms as br,sum(price) from house_price group by br;
找出house_price中不同取值的Bedrooms的房屋中每个取值对应的行数大于20个的房屋及其对应的总价格
select Bedrooms,sum(price) from house_price group by Bedrooms having count(*)>20;
找出house_price中不同取值的Bedrooms的房屋中每个取值对应的行数大于20个的房屋及其对应的总价格,并将数据按照总价格的升序排序
select Bedrooms,sum(price) from house_price group by Bedrooms having count(*)>20 order by sum(Price);
复杂SQL语句的执行顺序
select Bedrooms,count(*) from house_price where Bedrooms=2 group by Bedrooms having count(*)>20 order by Bedrooms;
对于上述的复杂语句,我们需要知道执行顺序:
From->where->group by->having->select->order by
子查询
找出比平均价格高的物品
select Price from house_price where Price>AVG(Price);
以上为错误写法,avg不能写在where语句后
此时,我们可以用子查询,进行多次select
select Price from house_price where Price>(select AVG(Price) from house_price);
这种写法就可以了,这种子查询叫做标量子查询
标量子查询:子查询只返回一行一列
子查询先执行括号内的运算,再执行外部运算
找出一些房屋信息和房屋的均价
错误写法:
select Price,AVG(Price) from house_price;
正确写法:
select Price,(select AVG(Price) from house_price) as avg_price from house_price;
以下写法也是支持的
select Price,123 as avg_price from house_price;
以Neighborhood和Bedrooms分组,统计分组后均价大于整体价格均价的Neighborhood、Bedrooms和分组后的均价数据
select Neighborhood,Bedrooms,AVG(Price) from
house_price group by Neighborhood,Bedrooms
having AVG(Price)>(select AVG(Price)
from house_price);
找出所有房屋中Bedrooms数量相同的价格大于以Neighborhood为分组的房屋均价的房屋
错误写法1:
select Neighborhood,Bedrooms from house_price where Price>(select AVG(Price) from house_price group by Neighborhood);
错误原因:由于不是标量子查询,则子查询返回了多条数据,无法与Price进行一一比对
错误写法2:
select Neighborhood,Bedrooms from house_price as a where where a.Bedrooms=b.Bathrooms and Price>(select AVG(Price) from house_price as b group by Neighborhood);
错误原因:首先,子查询先执行。子查询执行完后,别名b就消失了,而子查询执行时,别名a已经声明,且没有消失
正确写法:
select Neighborhood,Bedrooms from house_price as a where Price>(select AVG(Price) from house_price as b where a.Bedrooms=b.Bedrooms group by Neighborhood);
此处的子查询叫做关联子查询
EXISTS和NOT EXISTS
手动添加一个info表,表中数据为
id Bedrooms
1 2
2 3
我们观察以下SQL语句:
select * from house_price as a where EXISTS (select * from house_price as b where a.Bedrooms=b.Bathrooms);
执行结果:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-a33pfVk2-1575559822745)(img/1.png)]
注:将子查询中的"*"改为任意内容,返回结果一致,这是因为EXISTS关键字值关心子查询中返回数据的真假而不是具体的内容
NOT EXISTS和EXIST的返回结果相反
多表联结查询
导入product.csv和shopproduct.csv
内连接
合并两张表,在两张表中分别获取部分数据
select shop_id,shop_name,sp.product_id,quantity,product_name,product_type,sale_price
from product as p inner join shopproduct as sp
on p.product_id=sp.product_id
注:上述操作可以执行,但是可能一不留神就会出现像product_id这样的字段无法界定的问题。比较稳妥的写法如下(起别名):
select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product as p
inner join
shopproduct as sp
on p.product_id=sp.product_id
左外连接
select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product as p
left outer join
shopproduct as sp
on p.product_id=sp.product_id
保留左边的表中全部信息,右边表中的对应数据为空则不保留
右外连接
select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product as p
right outer join
shopproduct as sp
on p.product_id=sp.product_id
保留右边的表中全部信息,左边边表中的对应数据为空则不保留
交叉连接
select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product as p
cross join
shopproduct as sp
on p.product_id=sp.product_id
将左表和右表中的数据进行一一组合(笛卡尔积)
一种过时的写法(不推荐)
select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product,shopproduct
where p.product_id=sp.product_id and sp.shop_id="000A"
无法直接看出是内连接还是外连接,条件中无法看出是连接条件还是其他判断条件
组合查询
组合查询的意义:
1,将不同的表进行查询拼接在一起,返回
2,将一个表进行多次查询,返回一个查询的结果
导入Product1.csv文件
首先我们学习一个表的多次查询的情况:
select product_id,product_name,product_type from product where product_id<5
union
select product_id,product_name,product_type from product where product_type="厨房用具";
执行效果:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5P73RSg4-1575559822746)(img/2.png)]
其实上述也可以简写为:
select product_id,product_name,product_type from product where product_type="厨房用具" or product_id>2;
使用UNION_ALL的效果:
select product_id,product_name,product_type from product where product_id<5
union all
select product_id,product_name,product_type from product where product_type="厨房用具";
上述使用union和使用union all效果不一致,union all不会将和并后的数据去重
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0YaBMqoY-1575559822747)(img/3.png)]
注:union或union all前后查询出的字段数量必须一致
以下写法是错误的
select product_id,product_name from product where product_id<5
union all
select product_id,product_name,product_type from product where product_type="厨房用具";
最后,order by关键字只能在最后使用,而不能再union或union all之前使用
select product_id,product_name from product1 where product_id<5
union all
select product_id,product_name,product_type from product1 where product_type="厨房用具"
order by product_id
CASE表达式
简单case表达式
case <表达式>
when <表达式1> then <表达式4>
when <表达式2> then <表达式5>
when <表达式3> then <表达式6>
。。。
else <表达式>
end
搜索case表达式
case
when <求值表达式1> then <表达式4>
when <求值表达式2> then <表达式5>
when <求值表达式3> then <表达式6>
。。。
else <表达式>
end
使用house_price表
简单case表达式写法
select
case Bedrooms
when 2 then 'small'
when 3 then 'middle'
when 4 then 'large'
else 'largest'
end as BedroomsDesc
from house_price
搜索case表达式写法
select
case
when Bedrooms=2 then 'small'
when Bedrooms=3 then 'middle'
when Bedrooms=4 then 'large'
else 'largest'
end as BedroomsDesc
from house_price
建表company_profile添加数据如下:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RuMkaWo1-1575559822751)(img/4.png)]
需求:
求出每个公司的男性的总数
select company,sum(num)
from company_profile
where gender="M"
group by company;
求出每个公司的女性的总数
select company,sum(num)
from company_profile
where gender="F"
group by company;
求出每个公司男性和女性的总数,需要将上述两个表进行合并
select M.company,M.num_M,F.num_F from
(select company,sum(num) as num_M
from company_profile
where gender="M"
group by company) as M
inner join
(select company,sum(num) as num_F
from company_profile
where gender="F"
group by company) as F
on M.company=F.company
结果如下:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f5oSQRvM-1575559822753)(img/5.png)]
使用case表达式
select company,
sum(case when gender="M" then num
else 0
end) as num_M
,sum(case when gender="F" then num
else 0
end) as num_F
from company_profile
group by company;