上周研发提过来说希望对2个表做join,但是这2个表分布在不同的MySQL实例里面。
如果要做JOIN的话,我目前想到了3种方法:
1、使用BI常用的presto工具,配置2个数据源,然后在presto里做join
2、使用datax将依赖到表数据抽到一个MySQL实例里面,这样可以直接做join
3、使用mysql的FEDERATED引擎的表(这种方法也类似PG里面的dblink或者fdw)
对presto感兴趣的,可以自行搜索引擎去了解,它主要基于内存计算,处理性能很强,个人测试可以单机部署,生产一般和离线集群混部(晚上集群资源供离线计算,白天资源供presto即席查询)。
datax数据抽取,也比较简单,这里就不提了。
下面主要介绍下FEDERATED 引擎
官方文档 https://dev.mysql.com/doc/refman/8.0/en/federated-storage-engine.html
# 开启 Federated 存储引擎
vim /etc/my.cnf 在 [mysqld]段内加下面的这行
federated
# 重启mysql数据库
systemctl restart mysqld
# 再次查看是否修改成功
> show engines;
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| FEDERATED | YES | Federated MySQL storage engine | NO | NO | NO |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
12 rows in set (0.00 sec)
注意事项:
https://dev.mysql.com/doc/refman/8.0/en/federated-usagenotes.html
存储FEDERATED引擎支持 SELECT、 INSERT、 UPDATE、 DELETE、 TRUNCATE TABLE。
它不支持ALTER TABLE、 INDEX 等语法。
也不支持事务。
然后,创建一个fedreated引擎的表(后端的数据源可以是表,也可以是视图),引用其它的mysql实例。
-- 需要注意的是,这个FEDERATED的表的字段数不能超过原表,字段名称也不能和原表出现差异,否则查询这个FEDERATED表会报错。
create table f_tb1 (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) DEFAULT NULL,
`school` varchar(32) DEFAULT NULL,
`sex` varchar(32) DEFAULT NULL,
`addr` varchar(32) DEFAULT NULL,
primary key (id),
KEY `idx_name` (`name`)
)ENGINE=FEDERATED CONNECTION='mysql://dts:dts@127.0.0.1:5643/test/tb1';
>explain select * from f_tb1 where name='ccc';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | f_tb1 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
因为FEDERATED表在创建后,就不支持修改,如果要改动的话,需要删掉再重建FEDERATED表
因为FEDERATED表在创建后,就不支持修改,如果要改动的话,需要删掉再重建FEDERATED表
drop table f_tb1;
create table f_tb1 (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) DEFAULT NULL,
`school` varchar(32) DEFAULT NULL,
`sex` varchar(32) DEFAULT NULL,
`addr` varchar(32) DEFAULT NULL,
primary key (id),
KEY `idx_name` (`name`)
)ENGINE=FEDERATED CONNECTION='mysql://dts:dts@127.0.0.1:5643/test/tb1';
explain select * from f_tb1 where name='ccc';
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | f_tb1 | NULL | ref | idx_name | idx_name | 515 | const | 2 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
此外,还支持在本地建表,把远程的数据抽过来
>create table local_tb1 select * from f_tb1;
Query OK, 6 rows affected (0.02 sec)
Records: 6 Duplicates: 0 Warnings: 0
>insert into local_tb1 select * from f_tb1;
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
跨实例JOIN示例:
create table f_tb2
(
`id` int(11) NOT NULL AUTO_INCREMENT,
`agent_name` varchar(128) DEFAULT NULL,
`school` varchar(32) DEFAULT NULL,
primary key (id),
KEY `idx_name` (`agent_name`)
)ENGINE=FEDERATED CONNECTION='mysql://dts:dts@127.0.0.1:5726/sbtest/tb2';
select a.* from f_tb1 as a inner join f_tb2 as b on a.name=b.agent_name where a.id<=100;
+----+-------+--------+------+------+
| id | name | school | sex | addr |
+----+-------+--------+------+------+
| 5 | intel | NULL | NULL | NULL |
| 6 | amd | NULL | NULL | NULL |
+----+-------+--------+------+------+
2 rows in set (0.00 sec)
执行计划如下:
explain select a.* from f_tb1 as a inner join f_tb2 as b on a.name=b.agent_name where a.id<=100;
+----+-------------+-------+------------+-------+------------------+----------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+------------------+----------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | a | NULL | range | PRIMARY,idx_name | PRIMARY | 4 | NULL | 2 | 100.00 | Using where |
| 1 | SIMPLE | b | NULL | ref | idx_name | idx_name | 515 | test.a.name | 2 | 100.00 | NULL |
+----+-------------+-------+------------+-------+------------------+----------+---------+-------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
打开general_log,分析下刚才的这个join查询。
xplain命令,实际上是下发到后端的引擎执行了 SHOW TABLE STATUS LIKE 'xx' 获取表的基础数据(例如表有多少行、表体积之类的)
真实查询命令,实际上就是对2个库的查询下推:
对tb1所在的后端节点:
Query SHOW TABLE STATUS LIKE 'tb1'
Query SELECT `id`, `name`, `school`, `sex`, `addr` FROM `tb1` WHERE (`id` <= 100)
对tb2所在的后端节点:
Query SHOW TABLE STATUS LIKE 'tb2'
Query SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE (`agent_name` = 'aaaa')
Query SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE (`agent_name` = 'bbb')
Query SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE (`agent_name` = 'ccc')
Query SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE (`agent_name` = 'ddd')
Query SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE (`agent_name` = 'intel')
Query SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE (`agent_name` = 'amd')
出现上面这种日志情况是因为: 对于这个SQL,FEDERATED先查询了sql中的where条件,把a.id<=100的a表中的name的值都取出来,然后把name的值传到后端的tb2表中作为查询条件。最后在上层数据库中做数据的拼装。
对于下面这个查询示例
explain select a.* from f_tb1 as a inner join f_tb2 as b on a.name=b.agent_name ;
+----+-------------+-------+------------+------+---------------+----------+---------+-------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+----------+---------+-------------------+------+----------+-------------+
| 1 | SIMPLE | b | NULL | ALL | idx_name | NULL | NULL | NULL | 2 | 100.00 | Using where |
| 1 | SIMPLE | a | NULL | ref | idx_name | idx_name | 515 | test.b.agent_name | 2 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+----------+---------+-------------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
select a.* from f_tb1 as a inner join f_tb2 as b on a.name=b.agent_name ;
+----+-------+--------+------+------+
| id | name | school | sex | addr |
+----+-------+--------+------+------+
| 6 | amd | NULL | NULL | NULL |
| 5 | intel | NULL | NULL | NULL |
+----+-------+--------+------+------+
2 rows in set (0.00 sec)
从后端日志看(先查询tb2,然后把结果带入到tb1中,最后拼装出结果集)
节点1:
Query SHOW TABLE STATUS LIKE 'tb1'
Query SELECT `id`, `name`, `school`, `sex`, `addr` FROM `tb1` WHERE (`name` = 'amd')
Query SELECT `id`, `name`, `school`, `sex`, `addr` FROM `tb1` WHERE (`name` = 'intel')
节点2:
Query SHOW TABLE STATUS LIKE 'tb2'
Query SHOW TABLE STATUS LIKE 'tb2'
Query SELECT `id`, `agent_name`, `school` FROM `tb2`
FEDERATED 引擎表用的并不多,这里只是简单的介绍下,常用的也就这么点东西。