Hbase协处理器使用总结
使用背景
使用hbase同步数据到es,每次hbase客户端发送put请求后,触发协处理器将数据同步到es。
版本介绍
- jdk版本:1.8
- hbase版本:1.2.0
- es版本:6.8.5
- hadoop版本: 2.6.0
话不多说,直接上代码。有关协处理器了解请参考底部官网连接和相关博客。
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Hbase-Observer-ES</groupId>
<artifactId>Hbase-Observer-ES</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<!--<hadoop.version>2.7.3</hadoop.version>-->
<hadoop.version>2.6.0</hadoop.version>
<fastjson.version>1.2.70</fastjson.version>
<elasticsearch.version>6.8.5</elasticsearch.version>
<hbase.version>1.0.0</hbase.version>
<!--<hbase.version>1.2.6</hbase.version>-->
<http.version>4.5.7</http.version>
<log4j2.version>2.10.0</log4j2.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>${fastjson.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId> elasticsearch-cli</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-hbase-handler</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>${http.version}</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.4.13</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-common -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j2.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-api -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j2.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>24.0-jre</version>
</dependency>
</dependencies>
<build>
<finalName>${artifactId}-${project.version}</finalName>
<sourceDirectory>src/main/java</sourceDirectory>
<testSourceDirectory>src/test/java</testSourceDirectory>
<resources>
<resource>
<directory>src/main/resources/</directory>
<excludes>
<exclude>*.conf</exclude>
<exclude>*.properties</exclude>
</excludes>
</resource>
</resources>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.7</source> <!-- 源代码使用的JDK版本 -->
<target>1.7</target> <!-- 需要生成的目标class文件的编译版本 -->
<encoding>UTF-8</encoding><!-- 字符集编码 -->
</configuration>
</plugin>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<!-- maven资源文件复制插件 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>2.7</version>
<executions>
<execution>
<id>copy-resources</id>
<!-- here the phase you need -->
<phase>package</phase>
<goals>
<goal>copy-resources</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/conf</outputDirectory>
<resources>
<resource>
<directory>src/main/resources/</directory>
<includes>
<include>*.xml</include>
<include>*.conf</include>
<include>*.properties</include>
</includes>
<filtering>true</filtering>
</resource>
</resources>
<encoding>UTF-8</encoding>
</configuration>
</execution>
</executions>
</plugin>
<!-- 工程部署打包插件, 首次部署使用, 后期部署更换Jar即可 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.5.5</version>
<executions>
<execution>
<id>make-zip</id>
<!--绑定到package生命周期阶段上 -->
<phase>package</phase>
<goals>
<!--绑定到package生命周期阶段上 -->
<goal>single</goal>
</goals>
<configuration>
<!-- 描述文件路径 -->
<descriptors>
<descriptor>src/assembly/assembly.xml</descriptor>
</descriptors>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
主类HbaseObserver2ES.java
import com.mlamp.common.ESProperties;
import com.mlamp.utils.ESDataProcess;
import com.mlamp.utils.ElasticsearchConnectionSingleton;
import com.mlamp.utils.HbaseDataProcess;
import org.apache.hadoop.hbase.CoprocessorEnvironment;
import org.apache.hadoop.hbase.client.Durability;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
import org.apache.hadoop.hbase.coprocessor.ObserverContext;
import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
import org.apache.hadoop.hbase.regionserver.wal.WALEdit;
import org.elasticsearch.client.RestHighLevelClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.util.HashMap;
/**
* @author wushuai
* @version 1.0
* @date 2021-01-13
*/
public class HbaseObserver2ES extends BaseRegionObserver {
private Logger logger = LoggerFactory.getLogger(HbaseObserver2ES.class);
ESProperties es = new ESProperties();
String esNodeHost = es.getEsNodeHost();
String esNodePort = es.getEsNodePort();
String esIndex = es.getEsIndex();
String esUserName = es.getEsUserName();
String esUserPassword = es.getEsUserPassword();
RestHighLevelClient client = null;
@Override
public void start(CoprocessorEnvironment e) throws IOException {
logger.info("初始化ES连接=============>");
client = ElasticsearchConnectionSingleton.getRestHighLevelClient(esUserName, esUserPassword, esNodeHost, esNodePort);
}
@Override
public void stop(CoprocessorEnvironment e) throws IOException {
logger.info("关闭ES连接============>");
if (client != null) {
client.close();
}
}
@Override
public void postPut(ObserverContext<RegionCoprocessorEnvironment> e, Put put, WALEdit edit, Durability durability) throws IOException {
logger.info("postPut同步数据开始==========>");
HashMap<String, String> json = HbaseDataProcess.getHbaseData2Map(put);
logger.info("hbaseData================>"+json);
ESDataProcess.BulkData2ES(json, esIndex, client);
logger.info("postPut同步数据结束<==========");
}
}
HbaseDataProcess.java
package com.mlamp.utils;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import java.security.MessageDigest;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.NavigableMap;
/**
* @author wushuai
* @version 1.0
* @date 2021-01-17
*/
public class HbaseDataProcess {
public static HashMap<String, String> getHbaseData2Map(Put put) {
HashMap<String, String> json = new HashMap<>();
NavigableMap<byte[], List<Cell>> familyCellMap = put.getFamilyCellMap();
for (Map.Entry<byte[], List<Cell>> entry : familyCellMap.entrySet()) {
for (Cell cell : entry.getValue()) {
String key = Bytes.toString(CellUtil.cloneQualifier(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
json.put(key, value);
}
}
return json;
}
public static String stringMD5(String input) {
try {
// 拿到一个MD5转换器(如果想要SHA1参数换成”SHA1”)
MessageDigest messageDigest =MessageDigest.getInstance("MD5");
// 输入的字符串转换成字节数组
byte[] inputByteArray = input.getBytes();
// inputByteArray是输入字符串转换得到的字节数组
messageDigest.update(inputByteArray);
// 转换并返回结果,也是字节数组,包含16个元素
byte[] resultByteArray = messageDigest.digest();
// 字符数组转换成字符串返回
return byteArrayToHex(resultByteArray);
} catch(Exception e){
return "";
}
}
//下面这个函数用于将字节数组换成成16进制的字符串
public static String byteArrayToHex(byte[] byteArray) {
// 首先初始化一个字符数组,用来存放每个16进制字符
char[] hexDigits = {'0','1','2','3','4','5','6','7','8','9', 'A','B','C','D','E','F' };
// new一个字符数组,这个就是用来组成结果字符串的(解释一下:一个byte是八位二进制,也就是2位十六进制字符(2的8次方等于16的2次方))
char[] resultCharArray =new char[byteArray.length * 2];
// 遍历字节数组,通过位运算(位运算效率高),转换成字符放到字符数组中去
int index = 0;
for (byte b : byteArray) {
resultCharArray[index++] = hexDigits[b>>> 4 & 0xf];
resultCharArray[index++] = hexDigits[b& 0xf];
}
// 字符数组组合成字符串返回
return new String(resultCharArray);
}
}
ESDataProcess.java
package com.mlamp.utils;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.serializer.SerializerFeature;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.util.HashMap;
import static org.elasticsearch.repositories.fs.FsRepository.TYPE;
/**
* @author wushuai
* @version 1.0
* @date 2021-01-20
*/
public class ESDataProcess {
private static final Logger logger = LoggerFactory.getLogger(ElasticsearchConnectionSingleton.class);
public static void BulkData2ES(HashMap<String, String> json, String esIndex, RestHighLevelClient client) throws IOException {
BulkRequest bulkRequest = new BulkRequest();
String jsonStr = JSON.toJSONString(json, SerializerFeature.PrettyFormat);
logger.info("esData Ready=============>"+jsonStr);
IndexRequest indexRequest = new IndexRequest(esIndex + "_" + getStringFromMap(json, "dt"
)).id(HbaseDataProcess.stringMD5(getStringFromMap(json, "user_id"
)+getStringFromMap(json, "dt"
))).source(jsonStr, XContentType.JSON).type(TYPE);
bulkRequest.add(indexRequest);
client.bulk(bulkRequest, RequestOptions.DEFAULT);
}
public static String getStringFromMap(HashMap<String, String> json, String key) {
String str = "";
try {
str = json.get(key);
} catch (Exception e) {
logger.error("HashMap get数据异常:" + e);
} finally {
return str;
}
}
}
ElasticsearchConnectionSingleton.java
package com.mlamp.utils;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.client.TargetAuthenticationStrategy;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
* @Author wushuai
* @Date:2020/9/29 11:38 上午
*/
public class ElasticsearchConnectionSingleton {
private static final Logger logger = LoggerFactory.getLogger(ElasticsearchConnectionSingleton.class);
private static volatile RestHighLevelClient restHighLevelClient = null;
/**
* 初始化Elasticsearch单例连接
*
* @param userName
* @param passWord
* @param hosts
* @param port
* @return
*/
public static RestHighLevelClient getRestHighLevelClient(String userName, String passWord,
String hosts, String port) {
if (restHighLevelClient == null) {
synchronized (ElasticsearchConnectionSingleton.class) {
if (restHighLevelClient == null) {
logger.info("create es connection");
// RestClientBuilder restClientBuilder = createRestClientBuilder(userName, passWord, true);
RestClientBuilder restClientBuilder = createRestClientBuilder(userName, passWord,
hosts, port, true, true);
restHighLevelClient = new RestHighLevelClient(restClientBuilder);
logger.info("create es connection success");
}
}
}
return restHighLevelClient;
}
private static RestClientBuilder createRestClientBuilder(String userName, String userPassword,
String hosts, String port,
boolean isAuth, final boolean usePreemptiveAuth) {
String[] hostArray = hosts.split(",", -1);
HttpHost[] httpHosts = new HttpHost[hostArray.length];
for (int i = 0; i < hostArray.length; i++) {
httpHosts[i] = new HttpHost(hostArray[i], Integer.valueOf(port), "http");
}
RestClientBuilder restClientBuilder = RestClient.builder(httpHosts);
//restClientBuilder.setMaxRetryTimeoutMillis(5 * 60 * 1000);
restClientBuilder.setRequestConfigCallback(new RestClientBuilder.RequestConfigCallback() {
@Override
public RequestConfig.Builder customizeRequestConfig(RequestConfig.Builder requestConfigBuilder) {
requestConfigBuilder.setConnectTimeout(5000);
requestConfigBuilder.setSocketTimeout(40000);
requestConfigBuilder.setConnectionRequestTimeout(1000);
return requestConfigBuilder;
}
});
if (isAuth) {
final BasicCredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(userName, userPassword));
restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
@Override
public HttpAsyncClientBuilder customizeHttpClient(final HttpAsyncClientBuilder httpClientBuilder) {
if (usePreemptiveAuth == false) {
// disable preemptive auth by ignoring any authcache
httpClientBuilder.disableAuthCaching();
// don't use the "persistent credentials strategy"
httpClientBuilder.setTargetAuthenticationStrategy(new TargetAuthenticationStrategy());
}
return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
}
});
}
return restClientBuilder;
}
}
StringHandler.java
package com.mlamp.handler;
/**
* @author wushuai
* @version 1.0
* @date 2021-01-21
*/
public class StringHandler {
public Integer toInteger(String str){
int feild = 0;
try {
feild = Integer.parseInt(str);
}catch (Exception e){
}finally {
return feild;
}
}
}
ESProperties.java
package com.mlamp.common;
/**
* @author wushuai
* @version 1.0
* @date 2021-01-20
*/
public class ESProperties {
/**
* 集群测试
*/
String esNodeHost = "";
String esNodePort = "8200";
String esIndex = "";
String esUserName = "";
String esUserPassword = "";
/**
* 本地测试
*
* @return
*/
// String esNodeHost = "localhost";
// String esNodePort = "9200";
// String esIndex = "recommend_profile_test";
// String esUserName = "";
// String esUserPassword = "";
public String getEsNodeHost() {
return esNodeHost;
}
public void setEsNodeHost(String esNodeHost) {
this.esNodeHost = esNodeHost;
}
public String getEsNodePort() {
return esNodePort;
}
public void setEsNodePort(String esNodePort) {
this.esNodePort = esNodePort;
}
public String getEsIndex() {
return esIndex;
}
public void setEsIndex(String esIndex) {
this.esIndex = esIndex;
}
public String getEsUserName() {
return esUserName;
}
public void setEsUserName(String esUserName) {
this.esUserName = esUserName;
}
public String getEsUserPassword() {
return esUserPassword;
}
public void setEsUserPassword(String esUserPassword) {
this.esUserPassword = esUserPassword;
}
}
协处理器部署
本次采用动态加载,生产集群最适用的加载方式,无需重启集群,部署前请先阅读下方踩坑实录,有助于安全部署
- 先将开发好的协处理器类打成jar包,并上传到hdfs的某个目录
- 禁用目标表
disable ‘tablename’
- 加载协处理器
hbase alter 'tablename', METHOD => 'table_att', 'Coprocessor'=>'hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar| com.mlamp.HbaseObserver2ES|1001| arg1=1,arg2=2'
- 启用表
enable 'tablename'
协处理器卸载
- 禁用表
disable 'tablename'
- 卸载协处理器
alter 'tablename',METHOD=>'table_att_unset',NAME=>'coprocessor$1'
- 启用表
enable 'tablename'
踩坑实录
- 版本问题
- hbase2.0之后需要实现的Coprocessor和2.0之前的不同,2.0之后不再有BaseRegionObserver接口,2.0之后需要同时实现RegionObserver, Coprocessor两个接口
- HBase 2.0.2只支持到hadoop 2.7,自己测试的时候用的是hbase-2.2.6、elasticsearch-7.10.2和hadoop-3.1.4没有问题,但在生产集群上版本过低,请注意自己pom文件与集群环境的匹配问题
- 检查集群jdk版本与ideajdk版本是否一致,1.8版本编译的jar无法在1.7集群上运行
- 检查hbase-site.xml中是否有以下配置,此配置保证协处理器挂载的时候出错regionServer不会挂掉,但不能保证协处理器触发后regionServer的正常运行,因为触发后同步数据时涉及到网络io操作,连接es出问题可能导致regionServer挂掉.
<property>
<name>hbase.coprocessor.abortonerror</name>
<value>false</value>
</property>
- 报错:org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 1: type is missing;
解决方案:需要再pom文件添加额外的es依赖:
org.elasticsearch.client:elasticsearch-rest-client
org.elasticsearch:elasticsearch
pom文件
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
- 报错:ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: Class cn.com.newbee.feng.MyRegionObserver cannot be loaded Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks
解决方案:
- 在hdfs-site.xml中添加配置信息
<property>
<name>hbase.table.sanity.checks</name>
<value>false</value>
</property>
参考文章
hbase协处理器官网:
https://hbase.apache.org/2.2/book.html#cp
hbase1.4api官网:
https://hbase.apache.org/1.4/devapidocs/index.html
hbase2.2api官网:
https://hbase.apache.org/2.2/devapidocs/index.html
博客:
http://blog.itpub.net/12129601/viewspace-1690668/