Hbase协处理器使用总结


使用背景

使用hbase同步数据到es,每次hbase客户端发送put请求后,触发协处理器将数据同步到es。

版本介绍

  • jdk版本:1.8
  • hbase版本:1.2.0
  • es版本:6.8.5
  • hadoop版本: 2.6.0

话不多说,直接上代码。有关协处理器了解请参考底部官网连接和相关博客。

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>Hbase-Observer-ES</groupId>
    <artifactId>Hbase-Observer-ES</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
       
        <!--<hadoop.version>2.7.3</hadoop.version>-->
        <hadoop.version>2.6.0</hadoop.version>
        <fastjson.version>1.2.70</fastjson.version>
        <elasticsearch.version>6.8.5</elasticsearch.version>
        <hbase.version>1.0.0</hbase.version>
        <!--<hbase.version>1.2.6</hbase.version>-->
        <http.version>4.5.7</http.version>
        <log4j2.version>2.10.0</log4j2.version>
    </properties>

    <dependencies>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>${fastjson.version}</version>
        </dependency>



        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>${elasticsearch.version}</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-client</artifactId>
            <version>${elasticsearch.version}</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId> elasticsearch-cli</artifactId>
            <version>${elasticsearch.version}</version>
        </dependency>


        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-hbase-handler</artifactId>
            <version>2.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>${http.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpcore</artifactId>
            <version>4.4.13</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>${hbase.version}</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-common -->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-common</artifactId>
            <version>${hbase.version}</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>${hbase.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-auth</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>${log4j2.version}</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-api -->
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>${log4j2.version}</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>24.0-jre</version>
        </dependency>
    </dependencies>

    <build>
        <finalName>${artifactId}-${project.version}</finalName>
        <sourceDirectory>src/main/java</sourceDirectory>
        <testSourceDirectory>src/test/java</testSourceDirectory>
        <resources>
            <resource>
                <directory>src/main/resources/</directory>
                <excludes>
                    <exclude>*.conf</exclude>
                    <exclude>*.properties</exclude>
                </excludes>
            </resource>
        </resources>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <source>1.7</source> <!-- 源代码使用的JDK版本 -->
                    <target>1.7</target> <!-- 需要生成的目标class文件的编译版本 -->
                    <encoding>UTF-8</encoding><!-- 字符集编码 -->
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <!-- maven资源文件复制插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>2.7</version>
                <executions>
                    <execution>
                        <id>copy-resources</id>
                        <!-- here the phase you need -->
                        <phase>package</phase>
                        <goals>
                            <goal>copy-resources</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>${project.build.directory}/conf</outputDirectory>
                            <resources>
                                <resource>
                                    <directory>src/main/resources/</directory>
                                    <includes>
                                        <include>*.xml</include>
                                        <include>*.conf</include>
                                        <include>*.properties</include>
                                    </includes>
                                    <filtering>true</filtering>
                                </resource>
                            </resources>
                            <encoding>UTF-8</encoding>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <!-- 工程部署打包插件, 首次部署使用, 后期部署更换Jar即可 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.5.5</version>
                <executions>
                    <execution>
                        <id>make-zip</id>
                        <!--绑定到package生命周期阶段上 -->
                        <phase>package</phase>
                        <goals>
                            <!--绑定到package生命周期阶段上 -->
                            <goal>single</goal>
                        </goals>
                        <configuration>
                            <!-- 描述文件路径 -->
                            <descriptors>
                                <descriptor>src/assembly/assembly.xml</descriptor>
                            </descriptors>
                            <descriptorRefs>
                                <descriptorRef>jar-with-dependencies</descriptorRef>
                            </descriptorRefs>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

主类HbaseObserver2ES.java

import com.mlamp.common.ESProperties;
import com.mlamp.utils.ESDataProcess;
import com.mlamp.utils.ElasticsearchConnectionSingleton;
import com.mlamp.utils.HbaseDataProcess;
import org.apache.hadoop.hbase.CoprocessorEnvironment;
import org.apache.hadoop.hbase.client.Durability;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
import org.apache.hadoop.hbase.coprocessor.ObserverContext;
import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
import org.apache.hadoop.hbase.regionserver.wal.WALEdit;
import org.elasticsearch.client.RestHighLevelClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.util.HashMap;

/**
 * @author wushuai
 * @version 1.0
 * @date 2021-01-13
 */
public class HbaseObserver2ES extends BaseRegionObserver {
    private Logger logger = LoggerFactory.getLogger(HbaseObserver2ES.class);
    ESProperties es = new ESProperties();
    String esNodeHost = es.getEsNodeHost();
    String esNodePort = es.getEsNodePort();
    String esIndex = es.getEsIndex();
    String esUserName = es.getEsUserName();
    String esUserPassword = es.getEsUserPassword();
    RestHighLevelClient client = null;

    @Override
    public void start(CoprocessorEnvironment e) throws IOException {
        logger.info("初始化ES连接=============>");
        client = ElasticsearchConnectionSingleton.getRestHighLevelClient(esUserName, esUserPassword, esNodeHost, esNodePort);
    }

    @Override
    public void stop(CoprocessorEnvironment e) throws IOException {
        logger.info("关闭ES连接============>");
        if (client != null) {
            client.close();
        }
    }

    @Override
    public void postPut(ObserverContext<RegionCoprocessorEnvironment> e, Put put, WALEdit edit, Durability durability) throws IOException {
        logger.info("postPut同步数据开始==========>");
        HashMap<String, String> json = HbaseDataProcess.getHbaseData2Map(put);
        logger.info("hbaseData================>"+json);
        ESDataProcess.BulkData2ES(json, esIndex, client);
        logger.info("postPut同步数据结束<==========");
    }


}

HbaseDataProcess.java

package com.mlamp.utils;

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

import java.security.MessageDigest;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.NavigableMap;

/**
 * @author wushuai
 * @version 1.0
 * @date 2021-01-17
 */
public class HbaseDataProcess {




    public static HashMap<String, String> getHbaseData2Map(Put put) {
        HashMap<String, String> json = new HashMap<>();
        NavigableMap<byte[], List<Cell>> familyCellMap = put.getFamilyCellMap();
        for (Map.Entry<byte[], List<Cell>> entry : familyCellMap.entrySet()) {
            for (Cell cell : entry.getValue()) {
                String key = Bytes.toString(CellUtil.cloneQualifier(cell));
                String value = Bytes.toString(CellUtil.cloneValue(cell));
                json.put(key, value);
            }
        }
        return json;
    }
    public static String stringMD5(String input) {

        try {

            // 拿到一个MD5转换器(如果想要SHA1参数换成”SHA1”)

            MessageDigest messageDigest =MessageDigest.getInstance("MD5");



            // 输入的字符串转换成字节数组

            byte[] inputByteArray = input.getBytes();



            // inputByteArray是输入字符串转换得到的字节数组

            messageDigest.update(inputByteArray);



            // 转换并返回结果,也是字节数组,包含16个元素

            byte[] resultByteArray = messageDigest.digest();



            // 字符数组转换成字符串返回

            return byteArrayToHex(resultByteArray);



        } catch(Exception e){
            return "";

        }
    }
    //下面这个函数用于将字节数组换成成16进制的字符串

    public static String byteArrayToHex(byte[] byteArray) {

        // 首先初始化一个字符数组,用来存放每个16进制字符

        char[] hexDigits = {'0','1','2','3','4','5','6','7','8','9', 'A','B','C','D','E','F' };



        // new一个字符数组,这个就是用来组成结果字符串的(解释一下:一个byte是八位二进制,也就是2位十六进制字符(2的8次方等于16的2次方))

        char[] resultCharArray =new char[byteArray.length * 2];



        // 遍历字节数组,通过位运算(位运算效率高),转换成字符放到字符数组中去

        int index = 0;

        for (byte b : byteArray) {

            resultCharArray[index++] = hexDigits[b>>> 4 & 0xf];

            resultCharArray[index++] = hexDigits[b& 0xf];

        }



        // 字符数组组合成字符串返回

        return new String(resultCharArray);
    }
}

ESDataProcess.java

package com.mlamp.utils;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.serializer.SerializerFeature;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.HashMap;

import static org.elasticsearch.repositories.fs.FsRepository.TYPE;

/**
 * @author wushuai
 * @version 1.0
 * @date 2021-01-20
 */
public class ESDataProcess {
    private static final Logger logger = LoggerFactory.getLogger(ElasticsearchConnectionSingleton.class);

    public static void BulkData2ES(HashMap<String, String> json, String esIndex, RestHighLevelClient client) throws IOException {
        BulkRequest bulkRequest = new BulkRequest();
        String jsonStr = JSON.toJSONString(json, SerializerFeature.PrettyFormat);
        logger.info("esData Ready=============>"+jsonStr);
        IndexRequest indexRequest = new IndexRequest(esIndex + "_" + getStringFromMap(json, "dt"
        )).id(HbaseDataProcess.stringMD5(getStringFromMap(json, "user_id"
        )+getStringFromMap(json, "dt"
        ))).source(jsonStr, XContentType.JSON).type(TYPE);
        bulkRequest.add(indexRequest);
        client.bulk(bulkRequest, RequestOptions.DEFAULT);
    }

    public static String getStringFromMap(HashMap<String, String> json, String key) {
        String str = "";
        try {
            str = json.get(key);
        } catch (Exception e) {
            logger.error("HashMap get数据异常:" + e);
        } finally {
            return str;
        }


    }
}

ElasticsearchConnectionSingleton.java

package com.mlamp.utils;

import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.client.TargetAuthenticationStrategy;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;

import org.elasticsearch.client.RestHighLevelClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * @Author wushuai
 * @Date:2020/9/29 11:38 上午
 */
public class ElasticsearchConnectionSingleton {
    private static final Logger logger = LoggerFactory.getLogger(ElasticsearchConnectionSingleton.class);


    private static volatile RestHighLevelClient restHighLevelClient = null;

    /**
     * 初始化Elasticsearch单例连接
     *
     * @param userName
     * @param passWord
     * @param hosts
     * @param port
     * @return
     */
    public static RestHighLevelClient getRestHighLevelClient(String userName, String passWord,
                                                             String hosts, String port) {
        if (restHighLevelClient == null) {
            synchronized (ElasticsearchConnectionSingleton.class) {
                if (restHighLevelClient == null) {
                    logger.info("create es connection");
//                    RestClientBuilder restClientBuilder = createRestClientBuilder(userName, passWord, true);
                    RestClientBuilder restClientBuilder = createRestClientBuilder(userName, passWord,
                            hosts, port, true, true);
                    restHighLevelClient = new RestHighLevelClient(restClientBuilder);
                    logger.info("create es connection success");
                }
            }
        }
        return restHighLevelClient;
    }


    private static RestClientBuilder createRestClientBuilder(String userName, String userPassword,
                                                             String hosts, String port,
                                                             boolean isAuth, final boolean usePreemptiveAuth) {
        String[] hostArray = hosts.split(",", -1);
        HttpHost[] httpHosts = new HttpHost[hostArray.length];
        for (int i = 0; i < hostArray.length; i++) {
            httpHosts[i] = new HttpHost(hostArray[i], Integer.valueOf(port), "http");
        }

        RestClientBuilder restClientBuilder = RestClient.builder(httpHosts);
        //restClientBuilder.setMaxRetryTimeoutMillis(5 * 60 * 1000);
        restClientBuilder.setRequestConfigCallback(new RestClientBuilder.RequestConfigCallback() {
            @Override
            public RequestConfig.Builder customizeRequestConfig(RequestConfig.Builder requestConfigBuilder) {
                requestConfigBuilder.setConnectTimeout(5000);
                requestConfigBuilder.setSocketTimeout(40000);
                requestConfigBuilder.setConnectionRequestTimeout(1000);
                return requestConfigBuilder;
            }
        });

        if (isAuth) {
            final BasicCredentialsProvider credentialsProvider = new BasicCredentialsProvider();
            credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(userName, userPassword));
            restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                @Override
                public HttpAsyncClientBuilder customizeHttpClient(final HttpAsyncClientBuilder httpClientBuilder) {
                    if (usePreemptiveAuth == false) {
                        // disable preemptive auth by ignoring any authcache
                        httpClientBuilder.disableAuthCaching();
                        // don't use the "persistent credentials strategy"
                        httpClientBuilder.setTargetAuthenticationStrategy(new TargetAuthenticationStrategy());
                    }
                    return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
                }
            });
        }
        return restClientBuilder;
    }
}

StringHandler.java

package com.mlamp.handler;

/**
 * @author wushuai
 * @version 1.0
 * @date 2021-01-21
 */
public class StringHandler {
    public Integer toInteger(String str){
        int feild = 0;
        try {
            feild = Integer.parseInt(str);
        }catch (Exception e){

        }finally {
            return feild;
        }

    }
}

ESProperties.java

package com.mlamp.common;

/**
 * @author wushuai
 * @version 1.0
 * @date 2021-01-20
 */
public class ESProperties {
    /**
     * 集群测试
     */
    String esNodeHost = "";
    String esNodePort = "8200";
    String esIndex = "";
    String esUserName = "";
    String esUserPassword = "";

    /**
     * 本地测试
     *
     * @return
     */
//    String esNodeHost = "localhost";
//    String esNodePort = "9200";
//    String esIndex = "recommend_profile_test";
//    String esUserName = "";
//    String esUserPassword = "";
    public String getEsNodeHost() {
        return esNodeHost;
    }

    public void setEsNodeHost(String esNodeHost) {
        this.esNodeHost = esNodeHost;
    }

    public String getEsNodePort() {
        return esNodePort;
    }

    public void setEsNodePort(String esNodePort) {
        this.esNodePort = esNodePort;
    }

    public String getEsIndex() {
        return esIndex;
    }

    public void setEsIndex(String esIndex) {
        this.esIndex = esIndex;
    }

    public String getEsUserName() {
        return esUserName;
    }

    public void setEsUserName(String esUserName) {
        this.esUserName = esUserName;
    }

    public String getEsUserPassword() {
        return esUserPassword;
    }

    public void setEsUserPassword(String esUserPassword) {
        this.esUserPassword = esUserPassword;
    }


}

协处理器部署

本次采用动态加载,生产集群最适用的加载方式,无需重启集群,部署前请先阅读下方踩坑实录,有助于安全部署

  • 先将开发好的协处理器类打成jar包,并上传到hdfs的某个目录
  • 禁用目标表
disable ‘tablename’
  • 加载协处理器
hbase alter 'tablename', METHOD => 'table_att', 'Coprocessor'=>'hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar| com.mlamp.HbaseObserver2ES|1001|  arg1=1,arg2=2'
  • 启用表
enable 'tablename'

协处理器卸载

  • 禁用表
disable 'tablename'
  • 卸载协处理器
alter  'tablename',METHOD=>'table_att_unset',NAME=>'coprocessor$1'
  • 启用表
enable 'tablename'

踩坑实录

  1. 版本问题
  • hbase2.0之后需要实现的Coprocessor和2.0之前的不同,2.0之后不再有BaseRegionObserver接口,2.0之后需要同时实现RegionObserver, Coprocessor两个接口
  • HBase 2.0.2只支持到hadoop 2.7,自己测试的时候用的是hbase-2.2.6、elasticsearch-7.10.2和hadoop-3.1.4没有问题,但在生产集群上版本过低,请注意自己pom文件与集群环境的匹配问题
  • 检查集群jdk版本与ideajdk版本是否一致,1.8版本编译的jar无法在1.7集群上运行
  1. 检查hbase-site.xml中是否有以下配置,此配置保证协处理器挂载的时候出错regionServer不会挂掉,但不能保证协处理器触发后regionServer的正常运行,因为触发后同步数据时涉及到网络io操作,连接es出问题可能导致regionServer挂掉.
<property>
    <name>hbase.coprocessor.abortonerror</name>
    <value>false</value>
</property>
  1. 报错:org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 1: type is missing;
    解决方案:需要再pom文件添加额外的es依赖:
org.elasticsearch.client:elasticsearch-rest-client
org.elasticsearch:elasticsearch

pom文件

<dependency>
	<groupId>org.elasticsearch.client</groupId>
	<artifactId>elasticsearch-rest-client</artifactId>
	<version>${elasticsearch.version}</version>
</dependency>
<dependency>
	<groupId>org.elasticsearch</groupId>
	<artifactId>elasticsearch</artifactId>
	<version>${elasticsearch.version}</version>
</dependency>
  1. 报错:ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: Class cn.com.newbee.feng.MyRegionObserver cannot be loaded Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks

解决方案:

  • 在hdfs-site.xml中添加配置信息
<property>
    <name>hbase.table.sanity.checks</name>
    <value>false</value>
</property>

参考文章

hbase协处理器官网:

https://hbase.apache.org/2.2/book.html#cp

hbase1.4api官网:

https://hbase.apache.org/1.4/devapidocs/index.html

hbase2.2api官网:

https://hbase.apache.org/2.2/devapidocs/index.html

博客:

http://blog.itpub.net/12129601/viewspace-1690668/