线程池的说明
MySQL数据库默认是为每一个连接分配一个服务端的线程进行处理,在一个连接数少并且运行稳定的生产系统中,这通常不会有什么问题。但在下面两种情况下,会遇到性能问题
前端发起的连接数过多,造成服务端生成大量的线程,消耗了大量的CPU资源,而这些连接大部分时候处于sleep的状态,后端线程空转。
瞬间增加大量短连接的会话,造成服务端不断生成线程,高峰过后又有大量线程被销毁,这样会因为context-switch造成CPU性能大幅下降。
对于这种情况通常采用线程池的方法进行解决,线程池会预先创建一定数量的线程,当有连接的请求需要处理时,MySQL会从线程池中分配一条线程进行处理,到处理完成后,这条线程又会回到线程池中,下次它可能会被分配处理另外一个连接到请求。
单连接对单线程和线程池的对比
线程类型 | 适用场景 | 线程执行单位 |
one-thread-per-connection | 连接数不多而且稳定 | 连接 |
pool-of-threads | 高并发、短连接 | SQL语句 |
需要说明的是:服务端的线程池和客户端的连接池不是一回事。
连接池和线程池的对比
池的类型 | 优化目标 | 功能 |
Connection Pool | 客户端 | 减少建立连接的时间和连接占用的资源 |
Thread Pool | 服务端 | 减少线程重复建立和context-switch的资源 |
线程池在MySQL的企业版才有,社区版没有,下面的图说明了MySQL企业版通过线程池有效地提高MySQL扩展性
(https://www.mysql.com/products/enterprise/scalability.html)
Oracle声称线程池让MySQL的扩展性提高了60倍。遗憾的是免费的社区版中并没有这个功能,但免费的Percona和MariaDB中都有这个功能。这里以MariaDB为例介绍线程池的作用。
安装MariaDB
在MariaDB的官方网站上可以下载安装包,在Linux上的安装方法如下:
groupadd mysql
useradd -g mysql mysql
cd /usr/local
tar -zxvpf /path-to/mariadb-VERSION-OS.tar.gz
ln -s mariadb-VERSION-OS mysql
cd mysql
./scripts/mysql_install_db --user=mysql
chown -R root .
chown -R mysql data
启动MariaDB:
root@yaoyuan ~# /usr/local/mysql/bin/mysqld_safe --user=mysql &
登陆MariaDB,root或者mysql用户登陆不需要密码
[mysql@yuan ~]$ mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 5
Server version: 10.6.11-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> \s
--------------
mysql Ver 15.1 Distrib 10.6.11-MariaDB, for linux-systemd (x86_64) using readline 5.1
Connection id: 5
Current database:
Current user: mysql@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server: MariaDB
Server version: 10.6.11-MariaDB MariaDB Server
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: latin1
Client characterset: utf8mb3
Conn. characterset: utf8mb3
UNIX socket: /tmp/mysql.sock
Uptime: 1 hour 34 min 51 sec
Threads: 1 Questions: 47 Slow queries: 0 Opens: 37 Open tables: 30 Queries per second avg: 0.008
--------------
配置线程池
控制线程池的主要参数是thread_handling,这个参数的默认值one-thread-per-connection,就是一个连接对应一个线程。
MariaDB [information_schema]> show variables like 'thread_h%';
+-----------------+---------------------------+
| Variable_name | Value |
+-----------------+---------------------------+
| thread_handling | one-thread-per-connection |
+-----------------+---------------------------+
1 row in set (0.001 sec)
激活线程池的方法是在参数文件中把这个参数配置成pool-of-threads:
[mariadb]
...
thread_handling=pool-of-threads
启动后可以看到线程池的相关参数的默认值
MariaDB [(none)]> show variables like 'thread%';
+--------------------------------+-----------------+
| Variable_name | Value |
+--------------------------------+-----------------+
| thread_cache_size | 151 |
| thread_handling | pool-of-threads |
| thread_pool_dedicated_listener | OFF |
| thread_pool_exact_stats | OFF |
| thread_pool_idle_timeout | 60 |
| thread_pool_max_threads | 65536 |
| thread_pool_oversubscribe | 3 |
| thread_pool_prio_kickup_timer | 1000 |
| thread_pool_priority | auto |
| thread_pool_size | 1 |
| thread_pool_stall_limit | 500 |
| thread_stack | 299008 |
+--------------------------------+-----------------+
12 rows in set (0.075 sec)
hread_pool_size:线程池的大小,这个参数的默认值由cpu的个数决定,默认是CPU的个数,这里是虚拟机,只有一个虚拟CPU,因此是1。
thread_pool_max_threads:线程池中的最大线程,默认65536。
测试线程池
测试环境是只有一个虚拟CPU的虚拟机。
参数配置 /etc/my.cnf
[mariadb]
thread_handling=pool-of-threads
thread_pool_size=10
max_connections=1000
thread_pool_max_threads=50
创建1万条记录的表
[mysql@yuan ~]$ mysqlslap --create="create table table_a(col1 int primary key auto_increment, col2 varchar(255))" --query="insert into table_a(col2) values(md5(rand()))" --concurrency=10 --number-of-queries=10000 --no-drop
Benchmark
Average number of seconds to run all queries: 1.982 seconds
Minimum number of seconds to run all queries: 1.982 seconds
Maximum number of seconds to run all queries: 1.982 seconds
Number of clients running queries: 10
Average number of queries per client: 1000
并发500个连接进行测试
[mysql@yuan ~]$ mysqlslap --query='select count(*) from table_a where col2<>md5(rand())' --concurrency=500 --iterations=10
Benchmark
Average number of seconds to run all queries: 7.809 seconds
Minimum number of seconds to run all queries: 7.762 seconds
Maximum number of seconds to run all queries: 8.033 seconds
Number of clients running queries: 500
Average number of queries per client: 1
每个SQL语句运行的时间大约是7秒多,实际上这是因为并发造成context-switch而影响了执行的效率,如果是单线程会快很多,例如:
[mysql@yuan ~]$ mysqlslap --query='select count(*) from table_a where col2<>md5(rand())' --concurrency=1 --iterations=10
Benchmark
Average number of seconds to run all queries: 0.015 seconds
Minimum number of seconds to run all queries: 0.015 seconds
Maximum number of seconds to run all queries: 0.016 seconds
Number of clients running queries: 1
Average number of queries per client: 1
观察500个连接运行时的状态参数
[mysql@yuan ~]$ mysqladmin extended-status -i5 -c100|grep Thread
| Threadpool_idle_threads | 24 |
| Threadpool_threads | 50 |
| Threads_cached | 0 |
| Threads_connected | 371 |
| Threads_created | 113 |
| Threads_running | 26 |
| Threadpool_idle_threads | 32 |
| Threadpool_threads | 50 |
| Threads_cached | 0 |
| Threads_connected | 195 |
| Threads_created | 113 |
| Threads_running | 18 |
| Threadpool_idle_threads | 20 |
| Threadpool_threads | 50 |
| Threads_cached | 0 |
| Threads_connected | 465 |
| Threads_created | 113 |
| Threads_running | 30 |
| Threadpool_idle_threads | 24 |
| Threadpool_threads | 50 |
| Threads_cached | 0 |
| Threads_connected | 287 |
| Threads_created | 113 |
| Threads_running | 25 |
| Threadpool_idle_threads | 48 |
| Threadpool_threads | 50 |
| Threads_cached | 0 |
| Threads_connected | 47 |
| Threads_created | 113 |
| Threads_running | 2 |
分析MariaDB的状态参数可以得到以下结论:
- Threadpool_idle_threads+Threads_running=Threadpool_threads
- Threads_created值一直没有变,表示一直没有创建新的线程处理连接。
- Threads_cached是在one-thread-per-connection时缓存的线程,在线程池使用时是零。
实际互联网的应用中,网民登录到网站后大部分时间是浏览网页,执行操作的时间只占很少一部分,可以使用下面的命令模拟网民的每一分钟执行一个SQL:
[mysql@yuan ~]$ mysqlslap --query='select count(*) from table_a where col2<>md5(rand());select sleep(6);' --concurrency=1 --iterations=20
Benchmark
Average number of seconds to run all queries: 6.015 seconds
Minimum number of seconds to run all queries: 6.015 seconds
Maximum number of seconds to run all queries: 6.017 seconds
Number of clients running queries: 1
Average number of queries per client: 1
再次观察MariaDB的状态参数
| Threadpool_idle_threads | 19 |
| Threadpool_threads | 20 |
| Threads_cached | 0 |
| Threads_connected | 5 |
| Threads_created | 133 |
| Threads_running | 2
|
| Threadpool_idle_threads | 19 |
| Threadpool_threads | 20 |
| Threads_cached | 0 |
| Threads_connected | 5 |
| Threads_created | 133 |
| Threads_running | 2 |
| Threadpool_idle_threads | 19 |
| Threadpool_threads | 20 |
| Threads_cached | 0 |
| Threads_connected | 5 |
| Threads_created | 133 |
| Threads_running | 2 |
发现这时线程池里面的线程数下降到20个,大部分时间线程池中只有一个线程在提供服务。
总结
经过前面的测试,我们可以看到,线程池对减少MySQL服务端线程数量的作用非常明显,通过包括一定数量的线程的线程池可以向大量的连接提供服务。
如果连接中执行的SQL运行时间较长,例如数据仓库类的业务,就不适合用线程池。另外非常简单的SQL(例如select 1)也会因为启动了线程池而比直接连接时运行得稍微慢一点。