上周研发提过来说希望对2个表做join,但是这2个表分布在不同的MySQL实例里面。

如果要做JOIN的话,我目前想到了3种方法:

1、使用BI常用的presto工具,配置2个数据源,然后在presto里做join

2、使用datax将依赖到表数据抽到一个MySQL实例里面,这样可以直接做join

3、使用mysql的FEDERATED引擎的表(这种方法也类似PG里面的dblink或者fdw)


对presto感兴趣的,可以自行搜索引擎去了解,它主要基于内存计算,处理性能很强,个人测试可以单机部署,生产一般和离线集群混部(晚上集群资源供离线计算,白天资源供presto即席查询)。

datax数据抽取,也比较简单,这里就不提了。


下面主要介绍下FEDERATED 引擎

官方文档 https://dev.mysql.com/doc/refman/8.0/en/federated-storage-engine.html


# 开启 Federated 存储引擎

vim /etc/my.cnf 在 [mysqld]段内加下面的这行

federated

# 重启mysql数据库

systemctl restart mysqld

# 再次查看是否修改成功

> show engines;
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                    | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| FEDERATED          | YES     | Federated MySQL storage engine                                             | NO           | NO   | NO         |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
12 rows in set (0.00 sec)


注意事项:

https://dev.mysql.com/doc/refman/8.0/en/federated-usagenotes.html

存储FEDERATED引擎支持 SELECT、 INSERT、 UPDATE、 DELETE、 TRUNCATE TABLE。

它不支持ALTER TABLE、 INDEX 等语法。

也不支持事务。


然后,创建一个fedreated引擎的表(后端的数据源可以是表,也可以是视图),引用其它的mysql实例。


-- 需要注意的是,这个FEDERATED的表的字段数不能超过原表,字段名称也不能和原表出现差异,否则查询这个FEDERATED表会报错。

create table f_tb1 (
 `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(128) DEFAULT NULL,
  `school` varchar(32) DEFAULT NULL,
  `sex` varchar(32) DEFAULT NULL,
  `addr` varchar(32) DEFAULT NULL,
   primary key (id),
   KEY `idx_name` (`name`)
)ENGINE=FEDERATED CONNECTION='mysql://dts:dts@127.0.0.1:5643/test/tb1'; 


>explain select * from f_tb1 where name='ccc';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | f_tb1 | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |    25.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)



因为FEDERATED表在创建后,就不支持修改,如果要改动的话,需要删掉再重建FEDERATED表

因为FEDERATED表在创建后,就不支持修改,如果要改动的话,需要删掉再重建FEDERATED表
drop table f_tb1;

create table f_tb1 (
	`id` int(11) NOT NULL AUTO_INCREMENT,
	`name` varchar(128) DEFAULT NULL,
	`school` varchar(32) DEFAULT NULL,
	`sex` varchar(32) DEFAULT NULL,
	`addr` varchar(32) DEFAULT NULL,
	primary key (id),
		KEY `idx_name` (`name`)
	)ENGINE=FEDERATED CONNECTION='mysql://dts:dts@127.0.0.1:5643/test/tb1'; 


	explain select * from f_tb1 where name='ccc';
	+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
	| id | select_type | table | partitions | type | possible_keys | key      | key_len | ref   | rows | filtered | Extra |
	+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
	|  1 | SIMPLE      | f_tb1 | NULL       | ref  | idx_name      | idx_name | 515     | const |    2 |   100.00 | NULL  |
	+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
	1 row in set, 1 warning (0.00 sec)


此外,还支持在本地建表,把远程的数据抽过来

	>create table local_tb1 select * from f_tb1;
	Query OK, 6 rows affected (0.02 sec)
	Records: 6  Duplicates: 0  Warnings: 0

	>insert into local_tb1 select * from f_tb1;
	Query OK, 6 rows affected (0.01 sec)
	Records: 6  Duplicates: 0  Warnings: 0



跨实例JOIN示例:


	create table f_tb2
	(
	`id` int(11) NOT NULL AUTO_INCREMENT,
	`agent_name` varchar(128) DEFAULT NULL,
	`school` varchar(32) DEFAULT NULL,
	primary key (id),
		KEY `idx_name` (`agent_name`)
	)ENGINE=FEDERATED CONNECTION='mysql://dts:dts@127.0.0.1:5726/sbtest/tb2'; 

	select a.*  from f_tb1 as a inner join f_tb2 as b on a.name=b.agent_name  where a.id<=100;
	+----+-------+--------+------+------+
	| id | name  | school | sex  | addr |
	+----+-------+--------+------+------+
	|  5 | intel | NULL   | NULL | NULL |
	|  6 | amd   | NULL   | NULL | NULL |
	+----+-------+--------+------+------+
	2 rows in set (0.00 sec)

	执行计划如下:
	explain select a.*  from f_tb1 as a inner join f_tb2 as b on a.name=b.agent_name  where a.id<=100;
	+----+-------------+-------+------------+-------+------------------+----------+---------+-------------+------+----------+-------------+
	| id | select_type | table | partitions | type  | possible_keys    | key      | key_len | ref         | rows | filtered | Extra       |
	+----+-------------+-------+------------+-------+------------------+----------+---------+-------------+------+----------+-------------+
	|  1 | SIMPLE      | a     | NULL       | range | PRIMARY,idx_name | PRIMARY  | 4       | NULL        |    2 |   100.00 | Using where |
	|  1 | SIMPLE      | b     | NULL       | ref   | idx_name         | idx_name | 515     | test.a.name |    2 |   100.00 | NULL        |
	+----+-------------+-------+------------+-------+------------------+----------+---------+-------------+------+----------+-------------+
	2 rows in set, 1 warning (0.00 sec)


打开general_log,分析下刚才的这个join查询。
xplain命令,实际上是下发到后端的引擎执行了	SHOW TABLE STATUS LIKE 'xx' 获取表的基础数据(例如表有多少行、表体积之类的)

真实查询命令,实际上就是对2个库的查询下推:
			对tb1所在的后端节点:
				Query	SHOW TABLE STATUS LIKE 'tb1'
				Query	SELECT `id`, `name`, `school`, `sex`, `addr` FROM `tb1` WHERE  (`id` <= 100)

			对tb2所在的后端节点:
				Query	SHOW TABLE STATUS LIKE 'tb2'
				Query	SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE  (`agent_name` = 'aaaa')
				Query	SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE  (`agent_name` = 'bbb')
				Query	SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE  (`agent_name` = 'ccc')
				Query	SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE  (`agent_name` = 'ddd')
				Query	SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE  (`agent_name` = 'intel')
				Query	SELECT `id`, `agent_name`, `school` FROM `tb2` WHERE  (`agent_name` = 'amd')

			出现上面这种日志情况是因为: 对于这个SQL,FEDERATED先查询了sql中的where条件,把a.id<=100的a表中的name的值都取出来,然后把name的值传到后端的tb2表中作为查询条件。最后在上层数据库中做数据的拼装。


对于下面这个查询示例

 

		explain select a.*  from f_tb1 as a inner join f_tb2 as b on a.name=b.agent_name ;
		+----+-------------+-------+------------+------+---------------+----------+---------+-------------------+------+----------+-------------+
		| id | select_type | table | partitions | type | possible_keys | key      | key_len | ref               | rows | filtered | Extra       |
		+----+-------------+-------+------------+------+---------------+----------+---------+-------------------+------+----------+-------------+
		|  1 | SIMPLE      | b     | NULL       | ALL  | idx_name      | NULL     | NULL    | NULL              |    2 |   100.00 | Using where |
		|  1 | SIMPLE      | a     | NULL       | ref  | idx_name      | idx_name | 515     | test.b.agent_name |    2 |   100.00 | NULL        |
		+----+-------------+-------+------------+------+---------------+----------+---------+-------------------+------+----------+-------------+
		2 rows in set, 1 warning (0.00 sec)

		select a.*  from f_tb1 as a inner join f_tb2 as b on a.name=b.agent_name ;
		+----+-------+--------+------+------+
		| id | name  | school | sex  | addr |
		+----+-------+--------+------+------+
		|  6 | amd   | NULL   | NULL | NULL |
		|  5 | intel | NULL   | NULL | NULL |
		+----+-------+--------+------+------+
		2 rows in set (0.00 sec)

		从后端日志看(先查询tb2,然后把结果带入到tb1中,最后拼装出结果集)
			节点1:
				Query	SHOW TABLE STATUS LIKE 'tb1'
				Query	SELECT `id`, `name`, `school`, `sex`, `addr` FROM `tb1` WHERE  (`name` = 'amd')
				Query	SELECT `id`, `name`, `school`, `sex`, `addr` FROM `tb1` WHERE  (`name` = 'intel')
			节点2:
				Query	SHOW TABLE STATUS LIKE 'tb2'
				Query	SHOW TABLE STATUS LIKE 'tb2'
				Query	SELECT `id`, `agent_name`, `school` FROM `tb2`


FEDERATED 引擎表用的并不多,这里只是简单的介绍下,常用的也就这么点东西。