hbase cf

转载

数据探索家 2024-11-25 10:08:40

文章标签 hbase cf hbase 数据库性能优化数据 文章分类 Hbase 数据库

hbase学习之安装与入门

hbase简介

HBase是一种分布式、可扩展、支持海量数据存储的NoSQL数据库。

逻辑和物理存储结构

逻辑结构

hbase cf_数据

物理结构

hbase cf_hbase cf_02

数据模型

1）Name Space
命名空间，类似于关系型数据库的database概念，每个命名空间下有多个表。HBase有两个自带的命名空间，分别是hbase和default，hbase中存放的是HBase内置的表，default表是用户默认使用的命名空间。

2）Table
类似于关系型数据库的表概念。不同的是，HBase定义表时只需要声明列族即可，不需要声明具体的列。这意味着，往HBase写入数据时，字段可以动态、按需指定。因此，和关系型数据库相比，HBase能够轻松应对字段变更的场景。

3）Row
HBase表中的每行数据都由一个RowKey和多个Column（列）组成，数据是按照RowKey的字典顺序存储的，并且查询数据时只能根据RowKey进行检索，所以RowKey的设计十分重要。

4）Column
HBase中的每个列都由Column Family(列族)和Column Qualifier（列限定符）进行限定，例如info：name，info：age。建表时，只需指明列族，而列限定符无需预先定义。

5）Time Stamp
用于标识数据的不同版本（version），每条数据写入时，系统会自动为其加上该字段，其值为写入HBase的时间。

6）Cell
由{rowkey, column Family：column Qualifier, time Stamp} 唯一确定的单元。cell中的数据全部是字节码形式存贮。

hbase基本架构

hbase cf_hbase_03

1）Region Server
Region Server为 Region的管理者，其实现类为HRegionServer，主要作用如下:
对于数据的操作：get, put, delete；
对于Region的操作：splitRegion、compactRegion。

2）Master
Master是所有Region Server的管理者，其实现类为HMaster，主要作用如下：
对于表的操作：create, delete, alter
对于RegionServer的操作：分配regions到每个RegionServer，监控每个RegionServer的状态，负载均衡和故障转移。

3）Zookeeper
HBase通过Zookeeper来做master的高可用、RegionServer的监控、元数据的入口以及集群配置的维护等工作。

4）HDFS
HDFS为Hbase提供最终的底层数据存储服务，同时为HBase提供高可用的支持。

hbase快速入门

安装部署

在部署hbase之前，请确保zookeeper和hadoop部署完毕

#1.解压Hbase到指定目录：
[atguigu@hadoop102 software]$ tar -zxvf hbase-2.0.5-bin.tar.gz -C /opt/module

#2.配置环境变量
[atguigu@hadoop102 ~]$ sudo vim /etc/profile.d/my_env.sh
#HBASE_HOME
export HBASE_HOME=/opt/module/hbase-2.0.5
export PATH=$PATH:$HBASE_HOME/bin

#3.修改HBase对应的配置文件。
#hbase-env.sh修改内容：不使用内部的zookeeper,使用我们自己安装的
export HBASE_MANAGES_ZK=false

#hbase-site.xml修改内容：
<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://hadoop102:8020/hbase</value>
    </property>

    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>

    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>hadoop102,hadoop103,hadoop104</value>
    </property>
</configuration>

#4.修改regionservers：
hadoop102
hadoop103
hadoop104

#5.删除/opt/module/hbase-2.0.5/lib目录下的slf4j-log4j12-1.7.25.jar,包冲突与hadoop
[atguigu@hadoop102 lib]$ pwd
/opt/module/hbase-2.0.5/lib
[atguigu@hadoop102 lib]$ ll|grep slf4j
-rw-rw-r-- 1 atguigu atguigu   12244 Apr 14  2020 slf4j-log4j12-1.7.25.jar
[atguigu@hadoop102 lib]$ all.sh rm -rf /opt/module/hbase-2.0.5/lib/slf4j-log4j12-1.7.25.jar


#5.将HBase安装目录以及环境变量远程发送到103，104
[atguigu@hadoop102 module]$ my_rsync hbase/

#6.启动并访问页面（会启动一个HMaster,三个HRegionServer） http://hadoop102:16010/
#提示：如果集群之间的节点时间不同步，会导致regionserver无法启动，抛出ClockOutOfSyncException异常。
#默认30s,设置hbase.master.maxclockskew的属性值
start-hbase.sh

启动hbase成功

配置高可用

在HBase中HMaster负责监控HRegionServer的生命周期，均衡RegionServer的负载，如果HMaster挂掉了，那么整个HBase集群将陷入不健康的状态，并且此时的工作状态并不会维持太久。所以HBase支持对HMaster的高可用配置。

#1.关闭HBase集群（如果没有开启则跳过此步）
[atguigu@hadoop102 hbase]$ bin/stop-hbase.sh

#2.在conf目录下创建backup-masters文件(名字不变)，hbase-env.sh中默认指定了名字
[atguigu@hadoop102 hbase]$ touch conf/backup-masters

#3.在backup-masters文件中配置高可用HMaster节点
[atguigu@hadoop102 hbase]$ echo hadoop103 > conf/backup-masters
hadoop103
#4.将整个conf目录scp到其他节点
[atguigu@hadoop102 conf]$ my_rsync.sh backup-masters

hbase shell操作

namespace

DDL

#1．查看当前Hbase中有哪些namespace
hbase(main):002:0> list_namespace

NAMESPACE                                                                                                                                                                                             
default(创建表时未指定命名空间的话默认在default下)                                                                                                
hbase(系统使用的，用来存放系统相关的元数据信息等，勿随便操作)  
#2．创建namespace
hbase(main):010:0>  create_namespace "test"

hbase(main):010:0> create_namespace "test01", {"author"=>"wyh", "create_time"=>"2020-03-10 08:08:08"}
#3.查看namespace
hbase(main):010:0>  describe_namespace "test01"
#4.修改namespace的信息（添加或者修改属性）
hbase(main):010:0> alter_namespace "test01", {METHOD => 'set', 'author' => 'weiyunhui'}
添加或者修改属性:
alter_namespace 'ns1', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'} 
删除属性:          
alter_namespace 'ns1', {METHOD => 'unset', NAME => ' PROPERTY_NAME '} 
#5.删除namespace
hbase(main):010:0> drop_namespace "test01"
注意: 要删除的namespace必须是空的，其下没有表。

table

DDL,DML

#0．查看当前数据库中有哪些表
hbase(main):002:0> list
#1．创建表
hbase(main):002:0> create 'student','info'
#2．插入数据到表
hbase(main):003:0> put 'student','1001','info:sex','male'
hbase(main):004:0> put 'student','1001','info:age','18'
hbase(main):005:0> put 'student','1002','info:name','Janna'
hbase(main):006:0> put 'student','1002','info:sex','female'
hbase(main):007:0> put 'student','1002','info:age','20'
#3．扫描查看表数据
hbase(main):008:0> scan 'student'
hbase(main):009:0> scan 'student',{STARTROW => '1001', STOPROW  => '1001'}
hbase(main):010:0> scan 'student',{STARTROW => '1001'}
#4．查看表结构
hbase(main):011:0> describe 'student'
#5．更新指定字段的数据
hbase(main):012:0> put 'student','1001','info:name','Nick'
hbase(main):013:0> put 'student','1001','info:age','100'
#6．查看“指定行”或“指定列族:列”的数据
hbase(main):014:0> get 'student','1001'
hbase(main):015:0> get 'student','1001','info:name'
#7．统计表数据行数
hbase(main):021:0> count 'student'
#8．删除数据
删除某rowkey的全部数据：
hbase(main):016:0> deleteall 'student','1001'
删除某rowkey的某一列数据：
hbase(main):017:0> delete 'student','1002','info:sex'
#9．清空表数据
hbase(main):018:0> truncate 'student'
提示：清空表的操作顺序为先disable，然后再truncate。
#10．删除表
首先需要先让该表为disable状态：
hbase(main):019:0> disable 'student'
然后才能drop这个表：
hbase(main):020:0> drop 'student'
提示：如果直接drop表，会报错：ERROR: Table student is enabled. Disable it first.
#11．变更表信息
将info列族中的数据存放3个版本：
hbase(main):022:0> alter 'student',{NAME=>'info',VERSIONS=>3}
hbase(main):022:0> get 'student','1001',{COLUMN=>'info:name',VERSIONS=>3}

hbase进阶

RegionServer架构

Block Cache移除策略，LRU。最近最少原则

hbase cf_数据_04

#1）StoreFile
保存实际数据的物理文件，StoreFile以Hfile的形式存储在HDFS上。每个Store会有一个或多个StoreFile（HFile），数据在每个StoreFile中都是有序的。

#2）MemStore
写缓存，由于HFile中的数据要求是有序的，所以数据是先存储在MemStore中，排好序后，等到达刷写时机才会刷写到HFile，每次刷写都会形成一个新的HFile。

#3）WAL
由于数据要经MemStore排序后才能刷写到HFile，但把数据保存在内存中会有很高的概率导致数据丢失，为了解决这个问题，数据会先写在一个叫做Write-Ahead logfile的文件中，然后再写入MemStore中。所以在系统出现故障的时候，数据可以通过这个日志文件重建。

#4）BlockCache
读缓存，每次查询出的数据会缓存在BlockCache中，方便下次查询。

写流程

hbase cf_hbase cf_05

#写流程：
1）Client先访问zookeeper，获取hbase:meta表位于哪个Region Server。

2）访问对应的Region Server，获取hbase:meta表，根据读请求的namespace:table/rowkey，查询出目标数据位于哪个Region Server中的哪个Region中。并将该table的region信息以及meta表的位置信息缓存在客户端的meta cache，方便下次访问。

3）与目标Region Server进行通讯；

4）将数据顺序写入（追加）到WAL；

5）将数据写入对应的MemStore，数据会在MemStore进行排序；

6）向客户端发送ack；

7）等达到MemStore的刷写时机后，将数据刷写到HFile。

MemStore Flush

hbase cf_hbase cf_06

# MemStore刷写时机：
1.当某个memstore的大小达到了hbase.hregion.memstore.flush.size（默认值128M），其所在region的所有memstore都会刷写。
当memstore的大小达到了
hbase.hregion.memstore.flush.size（默认值128M）
* hbase.hregion.memstore.block.multiplier（默认值4）
时，会阻止继续往该memstore写数据。

2.当region server中memstore的总大小达到
java_heapsize
*hbase.regionserver.global.memstore.size（默认值0.4）
*hbase.regionserver.global.memstore.size.lower.limit（默认值0.95），
region会按照其所有memstore的大小顺序（由大到小）依次进行刷写。直到region server中所有memstore的总大小减小到上述值以下。
当region server中memstore的总大小达到
java_heapsize
*hbase.regionserver.global.memstore.size（默认值0.4）
时，会阻止继续往所有的memstore写数据。

3. 到达自动刷写的时间，也会触发memstore flush。自动刷新的时间间隔由该属性进行配置hbase.regionserver.optionalcacheflushinterval（默认1小时）。

4.当WAL文件的数量超过hbase.regionserver.max.logs，region会按照时间顺序依次进行刷写，直到WAL文件数量减小到hbase.regionserver.max.logs以下（该属性名已经废弃，现无需手动设置，最大值为32）。

读流程

读流程

hbase cf_数据库_07

merge细节

hbase cf_数据_08

1）Client先访问zookeeper，获取hbase:meta表位于哪个Region Server。
2）访问对应的Region Server，获取hbase:meta表，根据读请求的namespace:table/rowkey，查询出目标数据位于哪个Region Server中的哪个Region中。并将该table的region信息以及meta表的位置信息缓存在客户端的meta cache，方便下次访问。
3）与目标Region Server进行通讯；
4）分别在MemStore和Store File（HFile）中查询目标数据，并将查到的所有数据进行合并。此处所有数据是指同一条数据的不同版本（time stamp）或者不同的类型（Put/Delete）。
5）将查询到的新的数据块（Block，HFile数据存储单元，默认大小为64KB）缓存到Block Cache。
6）将合并后的最终结果返回给客户端。

StoreFile Compation

hbase cf_hbase cf_09

由于memstore每次刷写都会生成一个新的HFile，且同一个字段的不同版本（timestamp）和不同类型（Put/Delete）有可能会分布在不同的HFile中，因此查询时需要遍历所有的HFile。为了减少HFile的个数，以及清理掉过期和删除的数据，会进行StoreFile Compaction。
Compaction分为两种，分别是Minor Compaction和Major Compaction。Minor Compaction会将临近的若干个较小的HFile合并成一个较大的HFile，并清理掉部分过期和删除的数据。Major Compaction会将一个Store下的所有的HFile合并成一个大HFile，并且会清理掉所有过期和删除的数据。

Region Split

hbase cf_性能优化_10

默认情况下，每个Table起初只有一个Region，随着数据的不断写入，Region会自动进行拆分。刚拆分时，两个子Region都位于当前的Region Server，但处于负载均衡的考虑，HMaster有可能会将某个Region转移给其他的Region Server。
Region Split时机：
1.当1个region中的某个Store下所有StoreFile的总大小超过hbase.hregion.max.filesize，该Region就会进行拆分（0.94版本之前）。
2.当1个region中的某个Store下所有StoreFile的总大小超过Min(initialSize*R^3 ,hbase.hregion.max.filesize")，该Region就会进行拆分。其中initialSize的默认值为2*hbase.hregion.memstore.flush.size，R为当前Region Server中属于该Table的Region个数（0.94版本之后）。
具体的切分策略为：
第一次split：1^3 * 256 = 256MB 
第二次split：2^3 * 256 = 2048MB 
第三次split：3^3 * 256 = 6912MB 
第四次split：4^3 * 256 = 16384MB > 10GB，因此取较小的值10GB 
后面每次split的size都是10GB了。

#第三中策略用的多，第一次分裂，256m,之后的都是10G
3.Hbase 2.0引入了新的split策略：如果当前RegionServer上该表只有一个Region，按照2 * hbase.hregion.memstore.flush.size分裂，否则按照hbase.hregion.max.filesize分裂。

hbase API

获取连接

private static Connection connection;
static {
    Configuration conf = HBaseConfiguration.create();
    conf.set("hbase.zookeeper.quorum","hadoop102,hadoop103,hadoop104");
    connection = ConnectionFactory.createConnection(conf);
}

namespace相关

/**
  * 创建nameSpace
  * @param nameSpace
  * @throws IOException
  */
public static  void createNameSpace(String nameSpace) throws IOException {
    //获取Admin对象
    Admin admin = connection.getAdmin();
    NamespaceDescriptor.Builder builder = NamespaceDescriptor.create(nameSpace);
    NamespaceDescriptor namespaceDescriptor = builder.build();
    //创建
    admin.createNamespace(namespaceDescriptor);
}

table相关

/**
     * 创建table
     * @param nameSpace
     * @param tableName
     * @param cfs
     */
    public static void createTable(String nameSpace,String tableName,String... cfs) throws IOException {
        Admin admin = connection.getAdmin();
        TableDescriptorBuilder tableDescriptorBuilder =
    TableDescriptorBuilder.newBuilder(TableName.valueOf(nameSpace, tableName));

        //设置列族的信息
        for (String cf : cfs) {
            ColumnFamilyDescriptorBuilder columnFamilyDescriptorBuilder =
                    ColumnFamilyDescriptorBuilder.newBuilder(cf.getBytes());
            ColumnFamilyDescriptor columnFamilyDescriptor = columnFamilyDescriptorBuilder.build();

            tableDescriptorBuilder.setColumnFamily(columnFamilyDescriptor);
        }

        //拿到TableDescriptor
        TableDescriptor TableDescriptor = tableDescriptorBuilder.build();

        admin.createTable(TableDescriptor);
    }

DML操作

put get scan

put

/**
     * put 数据和修改一次样
     *
     * @param nameSpace
     * @param tableName
     * @param rowKey
     * @param cf
     * @param c1
     * @param value
     */
    public static void putData(String nameSpace,String tableName,String rowKey,String cf,String c1,String value) throws IOException {
        Table table = connection.getTable(TableName.valueOf(nameSpace, tableName));
        Put put = new Put(Bytes.toBytes(rowKey));
        put.addColumn(Bytes.toBytes(cf),Bytes.toBytes(c1),Bytes.toBytes(value));
        table.put(put);
        table.close();
    }

delete

/**
     * 删除数据
     * @param nameSpace
     * @param tableName
     * @param rowKey
     * @param cf
     * @param c1
     * @throws IOException
     */
    public static void deleteData(String nameSpace,String tableName,String rowKey,String cf,String c1) throws IOException {
        Table table = connection.getTable(TableName.valueOf(nameSpace, tableName));
        Delete delete = new Delete(Bytes.toBytes(rowKey)); //如果只指定rowkey,就是删除整条数据
//        delete.addFamily(Bytes.toBytes(cf));指定删除某个列族的数据    // DeleteFamily
//        delete.addColumns(Bytes.toBytes(cf),Bytes.toBytes(c1));   // DeleteColume
        delete.addColumn(Bytes.toBytes(cf),Bytes.toBytes(c1));      // Delete
        table.delete(delete);
        table.close();

    }

get

//TODO 单条数据查询(GET)
    public static void getDate(String tableName, String rowKey, String cf, String cn) throws IOException {

        //1.获取配置信息并设置连接参数
        Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

        //2.获取连接
        Connection connection = ConnectionFactory.createConnection(configuration);

        //3.获取表的连接
        Table table = connection.getTable(TableName.valueOf(tableName));

        //4.创建Get对象
        Get get = new Get(Bytes.toBytes(rowKey));
        // 指定列族查询
        // get.addFamily(Bytes.toBytes(cf));
        // 指定列族:列查询
        // get.addColumn(Bytes.toBytes(cf), Bytes.toBytes(cn));

        //5.查询数据
        Result result = table.get(get);

        //6.解析result
        for (Cell cell : result.rawCells()) {
            System.out.println("ROW:" + Bytes.toString(CellUtil.cloneRow(cell)) +
                        " CF:" + Bytes.toString(CellUtil.cloneFamily(cell))+
                        " CL:" + Bytes.toString(CellUtil.cloneQualifier(cell))+
                        " VALUE:" + Bytes.toString(CellUtil.cloneValue(cell)));
        }

        //7.关闭连接
        table.close();
        connection.close();

    }

scan

public static void scanData(String tableName){
    //1.获取配置信息并设置连接参数
        Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "hadoop102,hadoop103,hadoop104");

        //2.获取连接
        Connection connection = ConnectionFactory.createConnection(configuration);

        //3.获取表的连接
        Table table = connection.getTable(TableName.valueOf(tableName));

        //4.创建Scan对象
        Scan scan = new Scan();
    	//5.添加扫描行
    	//scan.withStartRow(Bytes.toBytes(startRow))
        //scan.withStopRow(Bytes.toBytes(stopRow))

        //6.扫描数据
        ResultScanner results = table.getScanner(scan);

        //7.解析results
        for (Result result : results) {
            for (Cell cell : result.rawCells()) {
      System.out.println(
                        Bytes.toString(CellUtil.cloneRow(cell))+":"+
                                Bytes.toString(CellUtil.cloneFamily(cell))+":" +
                                Bytes.toString(CellUtil.cloneQualifier(cell)) +":" +
                                Bytes.toString(CellUtil.cloneValue(cell))
                );
            }
        }

        //7.关闭资源
        table.close();
        connection.close();
}

hbase优化

预分区

每一个region维护着startRow与endRowKey，如果加入的数据符合某个region维护的rowKey范围，则该数据交给这个region维护。那么依照这个原则，我们可以将数据所要投放的分区提前大致的规划好，以提高HBase性能。

1.手动设定预分区
hbase> create 'staff1','info',SPLITS => ['1000','2000','3000','4000']

2.生成16进制序列预分区
create 'staff2','info',{NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}

3.按照文件中设置的规则预分区
创建splits.txt文件内容如下：
aaaa
bbbb
cccc
dddd
然后执行：
create 'staff3','info',SPLITS_FILE => 'splits.txt'

4.使用JavaAPI创建预分区
//自定义算法，产生一系列Hash散列值存储在二维数组中
byte[][] splitKeys = 某个散列值函数
//创建HbaseAdmin实例
HBaseAdmin hAdmin = new HBaseAdmin(HbaseConfiguration.create());
//创建HTableDescriptor实例
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
//通过HTableDescriptor实例和散列值二维数组创建带有预分区的Hbase表
hAdmin.createTable(tableDesc, splitKeys);

RowKey设计

一条数据的唯一标识就是rowkey，那么这条数据存储于哪个分区，取决于rowkey处于哪个一个预分区的区间内，设计rowkey的主要目的，就是让数据均匀的分布于所有的region中，在一定程度上防止数据倾斜。接下来我们就谈一谈rowkey常用的设计方案。

设计思路案例

场景: 大量的运营商的通话数据，现在需要合理设计预分区以及rowkey

rowkey设计  + 预分区: 
rowkey设计原则:  唯一性  散列性  长度

#用户数据结构：
1388888888(主叫) 13999999999(被叫) 2021-05-14 12:12:12  360 ......
	  
业务: 查询某个用户 某天  某月  某年 的通话记录 

原则：存数据不是目的地，主要是查询
预分区: 这个没法定，预计规划50个分区 .

-∞  ~  00|
00| ~  01|
01| ~  02|
.......

分析:   假如将某个用户某天的数据存到一个分区中. 查某天的数据只需要扫描一个分区
       假如将某个用户某月的数据存到一个分区中. 查某天 某月的数据只需要扫描一个分区.


# 对用户+月份去hash值，然后对分区数取余，这样就能确保每个用户的每个月的数据都在同一个分区
# 前缀01_是区域操作后得出的结果。用来确定进哪个分区
rowkey: 01_1388888888_2021-05-14 12:12:12  ->  1388888888_2021-05 % 分区数  = 01
		01_1388888888_2021-05-15 12:12:12  ->  1388888888_2021-05 % 分区数  = 01 
		01_1388888888_2021-05-16 12:12:12
		01_1388888888_2021-05-17 12:12:12
		
	
        03_1377777777_2021-05-16 12:12:12  ->  1377777777_2021-05 % 分区数  = 03

  
验证:
查询  1388888888 用户 2020年08月的通话记录
  1) 先计算分区号
     1388888888_2020-08  % 50  = 04 
  2) rowkey  
     04_1388888888_2020-08-........
  3) scan 
     scan "teldata" ,{STARTROW=> '04_1388888888_2020-08' STOPROW=> '04_1388888888_2020-08|'}

查询  1388888888 用户 2020年08月08日的通话记录     
  1) 先计算分区号
     1388888888_2020-08  % 50  = 04 
  2) rowkey  
     04_1388888888_2020-08-08........
  3) scan 
     scan "teldata" ,{STARTROW=> '04_1388888888_2020-08-08' STOPROW=> '04_1388888888_2020-08-08|'}

查询  1388888888 用户 2020年08月 和 09月的通话记录 

  1) 先计算分区号
     1388888888_2020-08  % 50  = 04 
     1388888888_2020-09  % 50  = 06 
  2) rowkey  
     04_1388888888_2020-08-........
     06_1388888888_2020-09-........
  3) scan 
     scan "teldata" ,{STARTROW=> '04_1388888888_2020-08' STOPROW=> '04_1388888888_2020-08|'}
     scan "teldata" ,{STARTROW=> '06_1388888888_2020-09' STOPROW=> '06_1388888888_2020-09|'}

查询  1388888888 用户 2020年08月09日 和 10日的通话记录      

  1) 先计算分区号
     1388888888_2020-08  % 50  = 04 
  2) rowkey  
     04_1388888888_2020-08-09........
     04_1388888888_2020-08-09........
     04_1388888888_2020-08-10........
  3) scan 
     scan "teldata" ,{STARTROW=> '04_1388888888_2020-08-09' STOPROW=> '04_1388888888_2020-08-10|'}

内存优化

HBase操作过程中需要大量的内存开销，毕竟Table是可以缓存在内存中的，但是不建议分配非常大的堆内存，因为GC过程持续太久会导致RegionServer处于长期不可用状态，一般16~36G内存就可以了，如果因为框架占用内存过高导致系统内存不足，框架一样会被系统服务拖死。

基础优化

1.Zookeeper会话超时时间
hbase-site.xml
属性：zookeeper.session.timeout
解释：默认值为90000毫秒（90s）。当某个RegionServer挂掉，90s之后Master才能察觉到。可适当减小此值，以加快Master响应，可调整至60000毫秒。
    
2.设置RPC监听数量
hbase-site.xml
属性：hbase.regionserver.handler.count
解释：默认值为30，用于指定RPC监听的数量，可以根据客户端的请求数进行调整，读写请求较多时，增加此值。
    
3.手动控制Major Compaction
hbase-site.xml
属性：hbase.hregion.majorcompaction
解释：默认值：604800000秒（7天）， Major Compaction的周期，若关闭自动Major Compaction，可将其设为0
    
4.优化HStore文件大小
hbase-site.xml
属性：hbase.hregion.max.filesize
解释：默认值10737418240（10GB），如果需要运行HBase的MR任务，可以减小此值，因为一个region对应一个map任务，如果单个region过大，会导致map任务执行时间过长。该值的意思就是，如果HFile的大小达到这个数值，则这个region会被切分为两个Hfile。
    
5.优化HBase客户端缓存
hbase-site.xml
属性：hbase.client.write.buffer
解释：默认值2097152bytes（2M）用于指定HBase客户端缓存，增大该值可以减少RPC调用次数，但是会消耗更多内存，反之则反之。一般我们需要设定一定的缓存大小，以达到减少RPC次数的目的。
    
6.指定scan.next扫描HBase所获取的行数
hbase-site.xml
属性：hbase.client.scanner.caching
解释：用于指定scan.next方法获取的默认行数，值越大，消耗内存越大。
    
7.BlockCache占用RegionServer堆内存的比例
hbase-site.xml
属性：hfile.block.cache.size
解释：默认0.4，读请求比较多的情况下，可适当调大
    
8.MemStore占用RegionServer堆内存的比例
hbase-site.xml
属性：hbase.regionserver.global.memstore.size
解释：默认0.4，写请求较多的情况下，可适当调大

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：redission锁如何实现不可重入

下一篇：spark 内容推荐

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

hbase cf

hbase cf

hbase学习之安装与入门

hbase简介

逻辑和物理存储结构

数据模型

hbase基本架构

hbase快速入门

安装部署

配置高可用

hbase shell操作

hbase进阶

RegionServer架构

写流程

MemStore Flush

读流程

StoreFile Compation

Region Split

hbase API

namespace相关

table相关

hbase优化

预分区

RowKey设计

内存优化

基础优化

51CTO博客