代码里面写死访问同一个impala实例,并发量一大会不会导致impala Daemon服务罢工。答案是肯定的!
解决思路:实现软均衡负载,
具体方案;在客户端与服务端直接搭建 HAProxy服务, 监听相应接口,分发请求;
以下是通过HAProxy实现impala均衡负载方案
前提条件:
impala服务正常
本文 impala版本3.2.0+cdh6.3.2
采用root用户操作
集群启用Kerberos
HAProxy1.5.18
1.HAProxy安装及启停
1.在集群中选择一个节点,使用yum方式安装HAProxy服务
yum -y install haproxy
2.启动与停止HAProxy服务,并将服务添加到自启动列表
-- 启动
service haproxy start
-- 查看状态
service haproxy status
chkconfig haproxy on
2.HAProxy配置Impala负载均衡
1.将/etc/haproxy目录下的haproxy.cfg文件备份,新建haproxy.cfg文件,添加如下配置
mv haproxy.cfg haproxy.cfg.bak
#---------------------------------------------------------------------
# Example configuration for a possible web application. See the
# full configuration options online.
#
# http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
#option http-server-close
#option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
listen stats
bind 0.0.0.0:1080
mode http
option httplog
maxconn 5000
stats refresh 30s
stats uri /stats
listen impalashell
bind 0.0.0.0:25003
mode tcp
option tcplog
balance leastconn
server slave06 slave06:21000 check
server slave07 slave07:21000 check
server slave08 slave08:21000 check
server slave09 slave09:21000 check
server slave10 slave10:21000 check
listen impalajdbc
bind 0.0.0.0:25004
mode tcp
option tcplog
balance leastconn
server slave06 slave06:21050 check
server slave07 slave07:21050 check
server slave08 slave08:21050 check
server slave09 slave09:21050 check
server slave10 slave10:21050 check
主要配置了HAProxy的http状态管理界面、impalashell和impalajdbc的负载均衡。
注意:impalajdbc与impalashell所使用的端口不一致
2.重启HAProxy服务
service haproxy restart
或者
systemctl restart haproxy
3.浏览器访问http://{hostname}:1080/stats
看到以上截图说明,已成功配置了Impala服务的负载均衡。
3. Impala配置
1.使用管理员账号登录Cloudera Manager,进入Impala服务
2.搜索“Load Balancer”,在下图所示配置HAProxy的:
3.保存配置,回到CM主页根据提示重启相应服务
[impala]
server_host=master01
server_port=25004
server_conn_timeout=60
4.Impala shell
1.使用多个终端同时访问,并执行SQL语句,查看是否会通过HAProxy服务自动负载到其它Impala Daemon节点,由于集群启用了Kerberos,所以在执行Impala shell命令前,需要先获取令牌。
重新kinit一下
kinit 用户
2.使用Impala shell访问HAProxy服务的25003端口,命令如下
impala-shell -i master01:25003
3.打开第一个终端访问并执行SQL
4.同时打开第二个终端访问并执行SQL
通过以上测试可以看到,两个终端执行的SQL不在同一个Impala Daemon,这样就实现了Impala Daemon服务的负载均衡。
5.Impala JDBC测试
1.配置JDBC的地址为HAProxy服务所在的IP端口为25004,提示:代码块部分可以左右滑动查看噢
package com.cloudera.impalajdbc;
import com.cloudera.utils.JDBCUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.UserGroupInformation;
import java.io.IOException;
import java.security.PrivilegedAction;
import java.sql.*;
/**
* package: com.cloudera.impala
* describe: 该示例主要讲述通过JDBC连接Kerberos环境下的Impala
* creat_user: Fayson
* email: htechinfo@163.com
* 公众号:Hadoop实操
* creat_date: 2017/11/21
* creat_time: 下午7:32
*/
public class KBSimple {
private static String JDBC_DRIVER = "com.cloudera.impala.jdbc41.Driver";
private static String CONNECTION_URL = "jdbc:impala://ip-172-31-22-86.ap-southeast-1.compute.internal:25004/default;AuthMech=1;KrbRealm=CLOUDERA.COM;KrbHostFQDN=ip-172-31-22-86.ap-southeast-1.compute.internal;KrbServiceName=impala";
static {
try {
Class.forName(JDBC_DRIVER);
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
System.out.println("通过JDBC连接Kerberos环境下的Impala");
//登录Kerberos账号
try {
System.setProperty("java.security.krb5.conf", "/Volumes/Transcend/keytab/krb5.conf");
Configuration configuration = new Configuration();
configuration.set("hadoop.security.authentication" , "Kerberos");
UserGroupInformation. setConfiguration(configuration);
UserGroupInformation.loginUserFromKeytab("fayson@CLOUDERA.COM", "/Volumes/Transcend/keytab/fayson.keytab");
System.out.println(UserGroupInformation.getCurrentUser() + "------" + UserGroupInformation.getLoginUser());
UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
loginUser.doAs(new PrivilegedAction<Object>(){
public Object run() {
Connection connection = null;
ResultSet rs = null;
PreparedStatement ps = null;
try {
Class.forName(JDBC_DRIVER);
connection = DriverManager.getConnection(CONNECTION_URL);
ps = connection.prepareStatement("select * from test_table");
rs = ps.executeQuery();
rs = ps.executeQuery();
while (rs.next()) {
System.out.println(rs.getInt(1));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
JDBCUtils.disconnect(connection, rs, ps);
}
return null;
}
});
} catch (IOException e) {
e.printStackTrace();
}
}
}
6.总结
在Kerberos环境下使用HAProxy实现Impala负载均衡,需要配置Impala的Load Balance。
在Kerberos环境下一旦配置了Impala的LoadBalance,将不能再连单个Impala Daemon,只能连HAProxy。
在使用JDBC连接HAProxy时,需要注意JDBC连接串中的KrbHostFQDN要与HAProxy服务的hostname一致,否则会报认证失败的错误。