今天有小伙伴打电话寻求一个SQL相关的问题,大致是表中存在重复数据,需要删除掉重复数据保留一条的场景,因为没有说明具体删除规则,怎么简单怎么来吧,研究后决定尝试一下,思路是删除表中多余的重复记录(多个字段),只留有id最小的记录。

1.数据准备

MySQL数据库中有一个goods表,表中手动插入一些数据,最终表的内容如下:

MySQL 批量保存去重复数据 mysql去重复 保留一条_SQL


2.思路分析

按照原来设计思路,4,5,6明显是重复数据,想要保留id为4的这行记录。

首先,对数据内容进行简单抽离,先查询出重复字段:

SELECT goods_detail, goods_title FROM goods GROUP BY goods_detail,goods_title HAVING COUNT(*) > 1;

MySQL 批量保存去重复数据 mysql去重复 保留一条_sql_02


其次,再查找出id,让id=4.最后合并的时候需要使用这个条件

MySQL 批量保存去重复数据 mysql去重复 保留一条_mysql_03


最后,对SQL进行归并,因为是删除操作,首先想到的是先按照条件查询出要删除的结果,然后将select改为delete就可以实现删除操作,经过尝试后,写出查询的sql是:

select *  from goods a where ((a.goods_detail,a.goods_title) in(
SELECT  goods_detail, goods_title FROM goods GROUP BY goods_detail,goods_title HAVING COUNT(*) > 1))
and (a.id not in (
SELECT MIN(id) FROM goods GROUP BY goods_detail,goods_title HAVING COUNT(*) > 1));

确实得到了想要的结果:

MySQL 批量保存去重复数据 mysql去重复 保留一条_sql_04


随后尝试真将查询改为删除,来实现最终结果,可是,执行的时候却出现了问题:

DELETE  from goods a where ((a.goods_detail,a.goods_title) in(
SELECT  goods_detail, goods_title FROM goods GROUP BY goods_detail,goods_title HAVING COUNT(*) > 1))
and (a.id not in (
SELECT MIN(id) FROM goods GROUP BY goods_detail,goods_title HAVING COUNT(*) > 1));

报错信息如下:

[SQL]
DELETE  from goods a where ((a.goods_detail,a.goods_title) in(
SELECT  goods_detail, goods_title FROM goods GROUP BY goods_detail,goods_title HAVING COUNT(*) > 1))
and (a.id not in (
SELECT MIN(id) FROM goods GROUP BY goods_detail,goods_title HAVING COUNT(*) > 1));
[Err] 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'a where ((a.goods_detail,a.goods_title) in(
SELECT  goods_detail, goods_title F' at line 1

不知道为什么会出现这种问题,开始百度,发现原因是MySQL不能将直接查处来的数据当做删除数据的条件,我们应该先把查出来的数据新建一个临时表,然后再把临时表作为条件进行删除功能。于是开始更改调整SQL语句:
第一版:

DELETE  FROM goods WHERE (goods_detail, goods_title) IN 
(SELECT goods_detail, goods_title FROM goods GROUP BY goods_detail, goods_title HAVING COUNT(*)>1) e
AND Id NOT IN 
(SELECT MIN(Id) AS Id FROM goods GROUP BY goods_detail, goods_title HAVING COUNT(*)>1) t;

仍然报错

[SQL]DELETE  FROM goods WHERE (goods_detail, goods_title) IN 
(SELECT goods_detail, goods_title FROM goods GROUP BY goods_detail, goods_title HAVING COUNT(*)>1) 
e
AND Id NOT IN 

(SELECT MIN(Id) AS Id FROM goods GROUP BY goods_detail, goods_title HAVING COUNT(*)>1) 
t;
[Err] 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'e
AND Id NOT IN 

(SELECT MIN(Id) AS Id FROM goods GROUP BY goods_detail, goo' at line 3

分析可能是子查询中的内容没有变成临时表,还是属于原来的表,所以对SQL进行调整,将第一个子查询中,查出来的goods_detail和good_title两字字段组成一个临时表,第二条子句中的id组成临时表,再进行后续操作,调整如下(先用查询验证):

SELECT *  FROM goods WHERE (goods_detail, goods_title) IN 
(SELECT goods_detail, goods_title FROM 
(SELECT goods_detail, goods_title FROM goods GROUP BY goods_detail, goods_title HAVING COUNT(*)>1) 
e)
AND Id NOT IN 
(SELECT Id FROM 
(SELECT MIN(Id) AS Id FROM goods GROUP BY goods_detail, goods_title HAVING COUNT(*)>1) 
t);

可以查询出想要的结果,第五条和第六条

MySQL 批量保存去重复数据 mysql去重复 保留一条_SQL_05


证明思路无误,将select * 改为delete,进行尝试:

MySQL 批量保存去重复数据 mysql去重复 保留一条_sql_06


用工具进行查询验证,发现id=5和id=6的数据成功删除。

后续验证:
在原有id=4基础上,增加5,6相同数据,id=2基础上增加7,8相同的数据,id=1基础上增加9相同数据。执行SQL语句后,发现新增的重复数据都被删除,只留下最原始的id=1,2,3,4四条纪录。