在互联网时代的早期,计算机普及程度较低,业务简单,并发量相对较小,单体应用常常足以支撑业务量。随着互联网红利来临,并发量的增大,也对单体服务提出了较大的挑战,常见的解决方式是**增加服务器性能**(磁盘、内存、CPU),**集群部署**等。但单机并不能无限制增加资源且利用率会大幅度下降,集群部署需要前置的网关进行路由,网关层仍旧需要处理高并发与单点问题。本文将就`Nginx`反向代理服务器讲解网关层(流量网关,非应用网关,如`springcloud gateway`等)的**负载均衡算法**与基于**Keepalived+VIP**的高可用方案。
## 一、什么是负载均衡
负载均衡,英文名称为Load Balance,其含义就是指将负载(工作任务)进行平衡、分摊到多个操作单元上进行运行,简而言之就是充当流量统一入口,调度后方部署在多台机器的应用。
## 二、负载均衡分类
按软硬件分类:
- **硬件负载均衡**,基于`ASIC`实现,性能高,如常用的F5等,成本较高。
- **软件负载均衡**,如反向代理服务器`Nginx`等,适用于中小型企业,成本低廉。
按网络分类:
- **四层负载均衡**,维持同一个TCP连接,性能高,如`LVS`。
- **七层负载均衡**, 基于各类应用层协议,功能较为丰富,但性能不如四层负载均衡,如`Nginx`。
## 三、负载均衡算法
这里我们列举常用的负载均衡算法:
-   **轮循均衡**(Round Robin):每次客户端请求轮流分配给内部服务器,不断循环。这种算法适合于服务器软硬件配置大致相同的场景。
-   **权重轮循均衡**(Weighted Round Robin):类似与轮询算法,但会根据服务器的不同处理能力,给每个服务器分配不同的权值,使请求按比例打算到内部服务器。如服务器 A、B、C 的权值被设计成 1、2、2,则服务器 A、B、C 将分别接收到 20%、40%、40%的服务请求。此种均衡算法适合服务器配置不均的场景。
-   **随机均衡**(Random):把客户端的请求随机分配给内部服务器,理论上在数据足够大的场景下能达到相对均衡的分布。
-   **一致性哈希均衡**(Consistency Hash):构建一个环形hash表,根据请求中某一些数据(可以是 MAC、IP 地址,也可以是更上层,如应用层HTTP报文中的某些参数信息)作为特征值来计算需要落在的节点上,为保证服务均与打散与节点宕机后仍能命中服务,会创建多个虚拟节点。
- ...
## 四、Keepalived+VIP+DNS轮询方案
部署环境如下:
**VIP** | **内网IP** | **主机名** | **Nginx端口** |
| ----------- | ------------ | ----------- | ---------------
**192.168.16.11** | **192.168.16.16** | **keepalive-nginx-1** | **8031** |
**192.168.16.11** | **192.168.16.17** | **keepalive-nginx-2** | **8031** |

参考架构图:

负载均衡存储cookie 负载均衡 vip_运维

**1. 安装nginx**
- 将`nginx`添加到`yum repro`库中
```
rpm -Uvh http://nginx.org/packages/centos/7/noarch/RPMS/nginx-release-centos-7-0.el7.ngx.noarch.rpm
```
- 安装`nginx`
`yum -y install nginx`
- 验证
```
[root@localhost ~]# nginx -v
nginx version: nginx/1.20.2
```
- 配置Nginx端口

```
vi /etc/nginx/conf.d/default.conf

# 192.168.16.10
server {
    listen       8001; #修改default端口为8031
    server_name  localhost;
    ...
}

# 192.168.16.11
server {
    listen       8031; #修改default端口为8031
    server_name  localhost;
    ...
}
```

- 启动 Nginx,并设置开机启动
`systemctl start nginx & systemctl enable nginx`

如果报权限错误,关闭SELINUX 

`vi /etc/selinux/config`,将`SELINUX=enforcing`改为`SELINUX=disabled`
- 查看 Nginx 启动状态
```
[root@localhost ~]# systemctl status nginx
● nginx.service - nginx - high performance web server
   Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2022-04-27 15:54:19 CST; 1min 40s ago
     Docs: http://nginx.org/en/docs/
 Main PID: 1050 (nginx)
   CGroup: /system.slice/nginx.service
           ├─1050 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
           ├─1051 nginx: worker process
           ├─1052 nginx: worker process
           ├─1053 nginx: worker process
           └─1054 nginx: worker process

Apr 27 15:54:19 localhost.localdomain systemd[1]: Starting nginx - high performance web server...
Apr 27 15:54:19 localhost.localdomain systemd[1]: Started nginx - high performance web server.
```
- 页面验证

```
C:\Users\86189>curl http://192.168.16.11:8031/

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

成功返回`nginx`欢迎页。

    
**2. 安装keepalived**
- 下载 Keepalived
wget https://www.keepalived.org/software/keepalived-2.1.5.tar.gz
- 安装 Keepalived
```
# 安装依赖
$ yum -y install gcc-c++
$ yum -y install openssl-devel 
# 安装keepalived
$ tar -xvzf keepalived-2.1.5.tar.gz
$ cd keepalived-2.1.5
$ ./configure --prefix=/usr/local/keepalived
$ make & make install
```
- 配置 Keepalived
```
# 创建/etc/keepalived目录
$ mkdir /etc/keepalived 
$ cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf
$ cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/keepalived
```
- 修改EnvironmentFile配置
`vi /lib/systemd/system/keepalived.service`
```
[Unit]
Description=LVS and VRRP High Availability Monitor
After=network-online.target syslog.target
Wants=network-online.target

[Service]
Type=forking
PIDFile=/run/keepalived.pid
KillMode=process
EnvironmentFile=-/etc/sysconfig/keepalived # 此处修改为/etc/sysconfig/keepalived
ExecStart=/usr/local/keepalived/sbin/keepalived $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target
```

- 配置keepalive

两台机器执行`vi /etc/keepalived/keepalived.conf`

使用 `ip addr`查看网卡信息:

```
# 182.168.16.16
[root@localhost app]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:01:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:32:f8:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.16/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet 192.168.16.11/32 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f219:afff:106:5f5f/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
       
# 182.168.16.16
[root@localhost app]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:d0:71:85 brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.17/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::d296:dcd5:28ce:e88a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
```

主机 192.168.16.17 配置:
```
# 192.168.16.16主机
# 全局定义,定义全局的配置选项
global_defs {
# 指定keepalived在发生切换操作时发送email,发送给哪些email
# 建议在keepalived_notify.sh中发送邮件
  notification_email {
    acassen@firewall.loc
  }
  notification_email_from Alexandre.Cassen@firewall.loc # 发送email时邮件源地址
    smtp_server 192.168.200.1 # 发送email时smtp服务器地址
    smtp_connect_timeout 30 # 连接smtp的超时时间
    router_id nginx-16-1 # 机器标识,通常可以设置为hostname
    vrrp_skip_check_adv_addr # 如果接收到的报文和上一个报文来自同一个路由器,则不执行检查。默认是跳过检查
    vrrp_garp_interval 0 # 单位秒,在一个网卡上每组gratuitous arp消息之间的延迟时间,默认为0
    vrrp_gna_interval 0 # 单位秒,在一个网卡上每组na消息之间的延迟时间,默认为0
}
# 检测脚本配置
vrrp_script checkhaproxy
{
  script "/etc/keepalived/check_nginx.sh" # 检测脚本路径
    interval 5 # 检测时间间隔(秒)
    weight 0 # 根据该权重改变priority,当值为0时,不改变实例的优先级
}
# VRRP实例配置
vrrp_instance VI_1 {
  state BACKUP  # 设置初始状态为'备份'
    interface ens3 # 设置绑定VIP的网卡,例如ens3
    virtual_router_id 51  # 配置集群VRID,互为主备的VRID需要是相同的值
    nopreempt               # 设置非抢占模式,只能设置在state为backup的节点上
    priority 100 # 设置优先级,值范围0~254,值越大优先级越高,最高的为master
    advert_int 1 # 组播信息发送时间间隔,两个节点必须设置一样,默认为1秒
# 验证信息,两个节点必须一致
    authentication {
      auth_type PASS # 认证方式,可以是PASS或AH两种认证方式
        auth_pass 1111 # 认证密码
    }
  unicast_src_ip 192.168.16.16  # 设置本机内网IP地址
    unicast_peer {
      192.168.16.17             # 对端设备的IP地址
    }
# VIP,当state为master时添加,当state为backup时删除
  virtual_ipaddress {
    192.168.16.11 # 设置高可用虚拟VIP,如果是腾讯云的CVM,需要填写控制台申请到的HAVIP地址。
  }
   # 要执行的检查脚本
  track_script {
    checkhaproxy
  }
  notify_master "/etc/keepalived/keepalived_notify.sh MASTER" # 当切换到master状态时执行脚本
    notify_backup "/etc/keepalived/keepalived_notify.sh BACKUP" # 当切换到backup状态时执行脚本
    notify_fault "/etc/keepalived/keepalived_notify.sh FAULT" # 当切换到fault状态时执行脚本
    notify_stop "/etc/keepalived/keepalived_notify.sh STOP" # 当切换到stop状态时执行脚本
    garp_master_delay 1    # 设置当切为主状态后多久更新ARP缓存
    garp_master_refresh 5   # 设置主节点发送ARP报文的时间间隔
    # 跟踪接口,里面任意一块网卡出现问题,都会进入故障(FAULT)状态
    track_interface {
      ens3
    }
}
```
备机 192.168.16.17 配置:
```

# 全局定义,定义全局的配置选项
global_defs {
# 指定keepalived在发生切换操作时发送email,发送给哪些email
# 建议在keepalived_notify.sh中发送邮件
  notification_email {
    acassen@firewall.loc
  }
  notification_email_from Alexandre.Cassen@firewall.loc # 发送email时邮件源地址
    smtp_server 192.168.200.1 # 发送email时smtp服务器地址
    smtp_connect_timeout 30 # 连接smtp的超时时间
    router_id nginx-17-2 # 机器标识,通常可以设置为hostname
    vrrp_skip_check_adv_addr # 如果接收到的报文和上一个报文来自同一个路由器,则不执行检查。默认是跳过检查
    vrrp_garp_interval 0 # 单位秒,在一个网卡上每组gratuitous arp消息之间的延迟时间,默认为0
    vrrp_gna_interval 0 # 单位秒,在一个网卡上每组na消息之间的延迟时间,默认为0
}
# 检测脚本配置
vrrp_script checkhaproxy
{
  script "/etc/keepalived/check_nginx.sh" # 检测脚本路径
    interval 5 # 检测时间间隔(秒)
    weight 0 # 根据该权重改变priority,当值为0时,不改变实例的优先级
}
# VRRP实例配置
vrrp_instance VI_1 {
  state BACKUP  # 设置初始状态为'备份'
    interface ens3 # 设置绑定VIP的网卡,例如ens3
    virtual_router_id 51  # 配置集群VRID,互为主备的VRID需要是相同的值
    nopreempt               # 设置非抢占模式,只能设置在state为backup的节点上
    priority 50 # 设置优先级,值范围0~254,值越大优先级越高,最高的为master
    advert_int 1 # 组播信息发送时间间隔,两个节点必须设置一样,默认为1秒
# 验证信息,两个节点必须一致
    authentication {
      auth_type PASS # 认证方式,可以是PASS或AH两种认证方式
        auth_pass 1111 # 认证密码
    }
  unicast_src_ip 192.168.16.17  # 设置本机内网IP地址
    unicast_peer {
      192.168.16.16             # 对端设备的IP地址
    }
# VIP,当state为master时添加,当state为backup时删除
  virtual_ipaddress {
    192.168.16.11 # 设置高可用虚拟VIP,如果是腾讯云的CVM,需要填写控制台申请到的HAVIP地址。
  }
  # 要执行的检查脚本
  track_script {
    checkhaproxy
  }
  notify_master "/etc/keepalived/keepalived_notify.sh MASTER" # 当切换到master状态时执行脚本
    notify_backup "/etc/keepalived/keepalived_notify.sh BACKUP" # 当切换到backup状态时执行脚本
    notify_fault "/etc/keepalived/keepalived_notify.sh FAULT" # 当切换到fault状态时执行脚本
    notify_stop "/etc/keepalived/keepalived_notify.sh STOP" # 当切换到stop状态时执行脚本
    garp_master_delay 1    # 设置当切为主状态后多久更新ARP缓存
    garp_master_refresh 5   # 设置主节点发送ARP报文的时间间隔
    # 跟踪接口,里面任意一块网卡出现问题,都会进入故障(FAULT)状态
    track_interface {
      ens3
    }
}
```

定义检测脚本:
`vi /etc/keepalived/check_nginx.sh`

```
#!/usr/bin/env bash

NGINXPID="/run/nginx.pid"
if [ ! -f $NGINXPID ];then
   killall keepalived
fi

```
定义告警脚本:

```
#!/usr/bin/env bash 
 # Use of this source code is governed by a MIT style 
 # license that can be found in the LICENSE file. 
   # /etc/keepalived/keepalived_notify.sh 
 log_file=/var/log/keepalived.log 
   iam::keepalived::mail() { 
 # 这里可以添加email逻辑,当keepalived变动时及时告警 
 : 
 } 
 iam::keepalived::log() { 
 echo "[`date '+%Y-%m-%d %T'`] $1" >> ${log_file} 
 } 
   [ ! -d /var/keepalived/ ] && mkdir -p /var/keepalived/ 
   case "$1" in 
 "MASTER" ) 
 iam::keepalived::log "notify_master" 
 ;; 
 "BACKUP" ) 
 iam::keepalived::log "notify_backup" 
 ;; 
 "FAULT" ) 
 iam::keepalived::log "notify_fault" 
 ;; 
 "STOP" ) 
 iam::keepalived::log "notify_stop" 
 ;; 
 *) 
 iam::keepalived::log "keepalived_notify.sh: state error!" 
 ;; 
 esac 

```
- 启动 Keepalived,并设置开机启动

```
$ systemctl start keepalived
$ systemctl enable keepalived
```
- 检查 Keepalived 状态

```
systemctl status keepalived
* keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-04-28 11:11:52 CST; 3s ago
  Process: 236527 ExecStart=/usr/local/keepalived/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 236528 (keepalived)
    Tasks: 3
   CGroup: /system.slice/keepalived.service
           |-236528 /usr/local/keepalived/sbin/keepalived -D
           |-236529 /usr/local/keepalived/sbin/keepalived -D
           `-236530 /usr/local/keepalived/sbin/keepalived -D

Apr 28 11:11:52 localhost.localdomain Keepalived_vrrp[236530]: (VI_1) Entering ...
Apr 28 11:11:52 localhost.localdomain Keepalived_vrrp[236530]: VRRP sockpool: [...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Gained...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Gained...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Gained...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Hint: Some lines were ellipsized, use -l to show in full.
```
提示`Active: active (running)`即可。

- 配置文件解析

配置文件,大致分为下面 4 个部分。
1.  global_defs:全局定义,定义全局的配置选项。
2.  vrrp_script checkhaproxy:检测脚本配置。
3.  vrrp_instance VI_1:VRRP 实例配置。
4.  virtual_server:LVS 配置。如果没有配置 LVS+Keepalived,不需要该配置。
- 验证虚拟ip
使用`systemctl restart keepalived`重启keepalive服务,两台机器分分别执行`ip addr`,可以看到:

```
# 192.168.16.16
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:82:f8:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.16/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet 192.168.16.11/32 scope global ens3
       valid_lft forever preferred_lft forever

```
主机增加了一个虚拟IP,`192.168.16.11`。如有异常可查看日志:`tailf var/log/messages`。

**3. 部署实践**

- 两台服务器均部署测试应用服务
`nohup java -jar -Dserver.port=8083 -Xms1024m -Xmx1024m springboot-web-demo-1.0-SNAPSHOT.jar &`

- 创建测试服务
`vi /etc/nginx/conf.d/cqbdri.conf`

```
# 192.168.16.16
server {
    listen       8033;
    server_name  192.168.16.16;
    root         /usr/share/nginx/html;
    location / {
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_pass  http://test;
      client_max_body_size 5m;
    }

    error_page 404 /404.html;
        location = /40x.html {
    }

    error_page 500 502 503 504 /50x.html;
        location = /50x.html {
    }
}
```
```
# 192.168.16.17
server {
    listen       8033;
    server_name  192.168.16.17;
    root         /usr/share/nginx/html;
    location / {
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_pass  http://test;
      client_max_body_size 5m;
    }

    error_page 404 /404.html;
        location = /40x.html {
    }

    error_page 500 502 503 504 /50x.html;
        location = /50x.html {
    }
}
```
- 服务测试
设置服务器名称:`hostnamectl set-hostname nginx1/nginx2`

DOS窗口下执行:`curl http://192.168.16.11:8033/hello?name=winson`

返回结果:`Hello winson! I'm Edge controller!`

nginx日志查看:`/var/log/nginx/access.log`

- 配置nginx负载均衡
`vi /etc/nginx/nginx.conf`
添加如下内容:

```
# 192.168.16.16
 upstream test {
       server 127.0.0.1:8083 weight=2;
       server 192.168.16.17:8083 weight =1;
   }
   
 # 192.168.16.17
 upstream test {
       server 127.0.0.1:8083 weight=1;
       server 192.168.16.16:8083 weight=2;
   }
```
- 负载均衡测试
DOS窗口下执行:`curl http://192.168.16.11:8033/hello?name=winson`

根据权重策略循环返回:
2次`Hello winson! I'm Edge controller! nginx-1` 

一次`Hello winson! I'm Edge controller! nginx-2` 
- Keepalive测试
1. 杀掉主机nginx进程
`systemctl stop nginx`

2. 查看nginx pid
执行`cat /run/nginx.pid`,返回:

```
#主机
[root@localhost keepalived]# cat /run/nginx.pid
cat: /run/nginx.pid: No such file or directory

#备机
[root@localhost keepalived]# cat /run/nginx.pid
8873
```

3. 查看ip漂移情况
执行ip addr:

```
# 192.168.16.16
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:01:82:f8:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.16/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f269:aeff:106:5f5f/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
       
# 192.168.16.17
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:d0:71:85 brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.17/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet 192.168.16.11/32 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::da96:dcd5:28ca:e88a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
```
可以看到虚拟`192.168.16.11`已经漂移到了机器`192.168.16.17`上。

4. 测试keepalive
DOS窗口下执行:`curl http://192.168.16.11:8033/hello?name=winson`

根据权重策略循环返回:

2次`Hello winson! I'm Edge controller! nginx-1` 

1次`Hello winson! I'm Edge controller! nginx-2`

nginx日志查看:`/var/log/nginx/access.log`

## 五、总结
以上我们就完成了了基于keepalive+VIP的高可用负载均衡方案,但仍旧存在一些问题:
1. 仅有一个VIP,备机始终处于闲置状态,如何提高使用率?
可以配置两个虚拟IP,上游使用**智能DNS**或**HTTPDNS**轮询,提高资源使用率。

参考架构图:

负载均衡存储cookie 负载均衡 vip_运维开发_02

2. keepalive主备所在的交换机故障如何实现高可用?
可以使用交换机**堆叠模式**,服务器分别接在两个不同的交换机上,也可以在不通机架做冷备,手动切换。