SQL常见操作

以下学习内容为巩固提升,为后面做案例打下铺垫

聚合函数使用

导入house_price.csv

计算价格的平均值

select avg(Price) as avg_price from house_price

注:avg(Neighborhood) 对字符串进行avg操作返回0,这里的0其实代表无法计算,返回默认的0

如果列中出现"NULL",则跳过对应的行数据,计算其他非空行的数据的均值

计算数据的总量

select count(*) from house_price

以上计算的是所有数据,包含空的行

select count(price) from house_price

以上计算的是price列中非空的行数

其他的聚合函数

SUM(xxx):对非空的数据进行求和

MAX(xxx):求最大的值

MAX(字符串):求出字符串按照从左到右按照字典序排序输出

MIN(xxx):同理

DISTINCT:去重

统计Bedrooms中不同的取值

select DISTINCT Bedrooms from house_price

统计Price中不同的取值的个数

select count(DISTINCT Price) from house_price

##分组

找出house_price中Price不同取值的每个取值的个数

select Price,count(*) from house_price group by Bedrooms
MySQL 版本>5.6的问题
--[Err] 1055 - Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

--- 问题原因 sql_mode=only_full_group_by
-- 解决方案
SET sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));

找出house_price中不同取值(例如:取值为2,则为两室,为3则为3室)的Bedrooms的每个取值的房屋总价格

select Bedrooms,sum(price) from house_price group by Bedrooms;

取别名后,可以以别名进行分组

select Bedrooms as br,sum(price) from house_price group by br;

找出house_price中不同取值的Bedrooms的房屋中每个取值对应的行数大于20个的房屋及其对应的总价格

select Bedrooms,sum(price) from house_price group by Bedrooms having count(*)>20;

找出house_price中不同取值的Bedrooms的房屋中每个取值对应的行数大于20个的房屋及其对应的总价格,并将数据按照总价格的升序排序

select Bedrooms,sum(price) from house_price group by Bedrooms having count(*)>20 order by sum(Price);

复杂SQL语句的执行顺序

select Bedrooms,count(*) from house_price where Bedrooms=2 group by Bedrooms having count(*)>20 order by Bedrooms;

对于上述的复杂语句,我们需要知道执行顺序:

From->where->group by->having->select->order by

子查询

找出比平均价格高的物品

select Price from house_price where Price>AVG(Price);

以上为错误写法,avg不能写在where语句后

此时,我们可以用子查询,进行多次select

select Price from house_price where Price>(select AVG(Price) from house_price);

这种写法就可以了,这种子查询叫做标量子查询

标量子查询:子查询只返回一行一列

子查询先执行括号内的运算,再执行外部运算

找出一些房屋信息和房屋的均价

错误写法:

select Price,AVG(Price) from house_price;

正确写法:

select Price,(select AVG(Price) from house_price) as avg_price from house_price;

以下写法也是支持的

select Price,123 as avg_price from house_price;

以Neighborhood和Bedrooms分组,统计分组后均价大于整体价格均价的Neighborhood、Bedrooms和分组后的均价数据

select Neighborhood,Bedrooms,AVG(Price) from 
house_price group by Neighborhood,Bedrooms 
having AVG(Price)>(select AVG(Price)
                   from house_price);

找出所有房屋中Bedrooms数量相同的价格大于以Neighborhood为分组的房屋均价的房屋

错误写法1:

select Neighborhood,Bedrooms from house_price where Price>(select AVG(Price) from house_price group by Neighborhood);

错误原因:由于不是标量子查询,则子查询返回了多条数据,无法与Price进行一一比对

错误写法2:

select Neighborhood,Bedrooms from house_price as a where where a.Bedrooms=b.Bathrooms and Price>(select AVG(Price) from house_price as b group by Neighborhood);

错误原因:首先,子查询先执行。子查询执行完后,别名b就消失了,而子查询执行时,别名a已经声明,且没有消失

正确写法:

select Neighborhood,Bedrooms from house_price as a where Price>(select AVG(Price) from house_price as b where a.Bedrooms=b.Bedrooms group by Neighborhood);

此处的子查询叫做关联子查询

EXISTS和NOT EXISTS

手动添加一个info表,表中数据为

id Bedrooms

1 2

2 3

我们观察以下SQL语句:

select * from house_price as a where EXISTS (select * from house_price as b where a.Bedrooms=b.Bathrooms);

执行结果:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-a33pfVk2-1575559822745)(img/1.png)]

注:将子查询中的"*"改为任意内容,返回结果一致,这是因为EXISTS关键字值关心子查询中返回数据的真假而不是具体的内容

NOT EXISTS和EXIST的返回结果相反

多表联结查询

导入product.csv和shopproduct.csv

内连接

合并两张表,在两张表中分别获取部分数据

select shop_id,shop_name,sp.product_id,quantity,product_name,product_type,sale_price
from product as p inner join shopproduct as sp
on p.product_id=sp.product_id

注:上述操作可以执行,但是可能一不留神就会出现像product_id这样的字段无法界定的问题。比较稳妥的写法如下(起别名):

select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product as p 
inner join 
shopproduct as sp
on p.product_id=sp.product_id

左外连接

select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product as p 
left outer join 
shopproduct as sp
on p.product_id=sp.product_id

保留左边的表中全部信息,右边表中的对应数据为空则不保留

右外连接

select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product as p 
right outer join 
shopproduct as sp
on p.product_id=sp.product_id

保留右边的表中全部信息,左边边表中的对应数据为空则不保留

交叉连接

select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product as p 
cross join 
shopproduct as sp
on p.product_id=sp.product_id

将左表和右表中的数据进行一一组合(笛卡尔积)

一种过时的写法(不推荐)

select sp.shop_id,sp.shop_name,sp.product_id,sp.quantity,p.product_name,p.product_type,p.sale_price
from product,shopproduct
where p.product_id=sp.product_id and sp.shop_id="000A"

无法直接看出是内连接还是外连接,条件中无法看出是连接条件还是其他判断条件

组合查询

组合查询的意义:

1,将不同的表进行查询拼接在一起,返回

2,将一个表进行多次查询,返回一个查询的结果

导入Product1.csv文件

首先我们学习一个表的多次查询的情况:

select product_id,product_name,product_type from product where product_id<5
union
select product_id,product_name,product_type from product where product_type="厨房用具";

执行效果:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5P73RSg4-1575559822746)(img/2.png)]

其实上述也可以简写为:

select product_id,product_name,product_type from product where product_type="厨房用具" or product_id>2;

使用UNION_ALL的效果:

select product_id,product_name,product_type from product where product_id<5
union all
select product_id,product_name,product_type from product where product_type="厨房用具";

上述使用union和使用union all效果不一致,union all不会将和并后的数据去重

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0YaBMqoY-1575559822747)(img/3.png)]

注:union或union all前后查询出的字段数量必须一致

以下写法是错误的

select product_id,product_name from product where product_id<5
union all
select product_id,product_name,product_type from product where product_type="厨房用具";

最后,order by关键字只能在最后使用,而不能再union或union all之前使用

select product_id,product_name from product1 where product_id<5
union all
select product_id,product_name,product_type from product1 where product_type="厨房用具"
order by product_id

CASE表达式

简单case表达式

case <表达式>
	when <表达式1> then <表达式4>
	when <表达式2> then <表达式5>
	when <表达式3> then <表达式6>
	。。。
	else <表达式>
end

搜索case表达式

case
	when <求值表达式1> then <表达式4>
	when <求值表达式2> then <表达式5>
	when <求值表达式3> then <表达式6>
	。。。
	else <表达式>
end

使用house_price表

简单case表达式写法

select 
case Bedrooms
	when 2 then 'small'
	when 3 then 'middle'
	when 4 then 'large'
	else 'largest'
end as BedroomsDesc
from house_price

搜索case表达式写法

select 
case 
	when Bedrooms=2 then 'small'
	when Bedrooms=3 then 'middle'
	when Bedrooms=4 then 'large'
	else 'largest'
end as BedroomsDesc
from house_price

建表company_profile添加数据如下:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RuMkaWo1-1575559822751)(img/4.png)]

需求:

求出每个公司的男性的总数

select company,sum(num)
from company_profile
where gender="M"
group by company;

求出每个公司的女性的总数

select company,sum(num)
from company_profile
where gender="F"
group by company;

求出每个公司男性和女性的总数,需要将上述两个表进行合并

select M.company,M.num_M,F.num_F from
(select company,sum(num) as num_M
from company_profile
where gender="M"
group by company) as M
inner join
(select company,sum(num) as num_F
from company_profile
where gender="F"
group by company) as F
on M.company=F.company

结果如下:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f5oSQRvM-1575559822753)(img/5.png)]

使用case表达式

select company,
			sum(case when gender="M" then num
								else 0
					end) as num_M
			,sum(case when gender="F" then num
								else 0
					end) as num_F
		from company_profile
		group by company;