缓存(BlockCache)
为了提高Hbase集群的读写性能,官方团队设计了两种缓存策略,这里说的缓存就是Block Cache。关于BlockCache官方提供了两种策略,堆内(on-heap)缓存LruBlockCache和BucketCache,其中BucketCache通常使用堆外(off-heap)内存。通常LruBlockCache被称为L1缓存,默认是开启的,建议不要关闭;BucketCache被称为L2缓存,开启L2缓存需要配置相关参数,例如hbase.bucketcache.combinedcache.enabled、hbase.bucketcache.ioengine和hbase.bucketcache.size。
LruBlockCache是默认的缓存,它在Java堆内存中管理。BucketCache通常使用堆外内存(off-heap),但是也可是使用文件形式(file-backed)或者堆内存(heap)。BucketCache与LruBlockCache相比,读取数据延迟要高一些,但是相对来说比较稳定,因为GC比LruBlockCache要少。因为缓存由LruBlockCache自己管理,而不是GC管理。
配置LruBlockCache
LruBlockCache默认时开启状态,由参数hfile.block.cache.size控制,取值范围0 ~ 1.0小数,表示占堆内存(heap-size)的百分比。通常MemCache和LruBlockCache之和要小于0.8,即hbase.regionserver.global.memstore.size + hfile.block.cache.size < 0.8。表示最多占用80%的堆内存,另外20%用作其他用途。
<property>
<name>hbase.regionserver.global.memstore.size</name>
<value>0.4</value>
</property>
<!--L1读缓存-->
<property>
<name>hfile.block.cache.size</name>
<value>0.4</value>
</property>
配置BucketCache
BucketCache默认也是开启的,表示和LruBlockCache协同工作。当开启LruBlockCache缓存时,意味着使用了L1+L2混合缓存模式,缓存统一由CombinedBlockCache进行管理。Data block(真实数据)存储在L2中,Meta block元信息、Index block索引信息、BLOOM block存储在L1中。
如果想某个表禁用L2缓存,可通过shell命令设置cacheDataInL1参数为true,或者代码中配置HColumnDescriptor.setCacheDataInL1(true)。
create 't', {NAME => 't', CONFIGURATION => {CACHE_DATA_IN_L1 => 'true'}}
BucketCache三种存储方式堆内存(on-heap)、堆外内存(off-heap)、文件(file)。下面介绍使用off-heap配置:
1、修改配置文件hbase-env.sh,设置堆外内存
HBASE_OFFHEAPSIZE=16G
2、修改配置文件hbase-site.xml
<!--L2读缓存-->
<!--开启L2缓存 2.x版本之后废除该参数-->
<property>
<name>hbase.bucketcache.combinedcache.enabled</name>
<value>true</value>
<description>
Whether or not the bucketcache is used in league with the LRU on-heap block cache. In this mode, indices and blooms are kept in the LRU blockcache and the data blocks are kept in the bucketcache
</description>
</property>
<property>
<name>hbase.bucketcache.ioengine</name>
<value>offheap</value>
</property>
<property>
<name>hbase.bucketcache.size</name>
<value>34816</value>
<description>
A float that EITHER represents a percentage of total heap memory size to give to the cache (if less than 1.0) OR, it is the total capacity in megabytes of BucketCache. Default: 0.0
</description>
</property>
读写性能调整
根据集群环境和实际应用场景,往往需要调整一些参数,使得集群能够发挥最大效率。
读多写少型:
适当减小hbase.regionserver.global.memstore.size,让MemCache内存小一些
适当增加hfile.block.cache.size
适当调整hbase.hregion.memstore.flush.size
其他调整Companct相关参数
读少写多型:
适当增加hbase.regionserver.global.memstore.size,让MemCache内存大一些
适当减小hfile.block.cache.size
增加客户端buffersize
如不考虑安全关闭WAL
GC调优
HBASE_MASTER_JAVA_OPTS="-XX:MaxPermSize=256m -XX:SurvivorRatio=2 -XX:+UseParNewGC -XX:ParallelGCThreads=12 -XX:+UseConcMarkSweepGC -XX:ParallelCMSThreads=16 -XX:+CMSParallelRemarkEnabled -XX:MaxTenuringThreshold=15 -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:-DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps"
HBASE_REGIONSERVER_JAVA_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=8 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 -XX:MaxTenuringThreshold=1 -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 -XX:G1OldCSetRegionThresholdPercent=5 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy"
Java批量写入
final BufferedMutator.ExceptionListener listener = (e, mutator) -> {
for (int i = 0; i < e.getNumExceptions(); i++) {
System.out.println("Failed to sent put <<" + e.getRow(i) + ">> to Hbase...");
}
};
BufferedMutatorParams params = new BufferedMutatorParams(TableName.valueOf(tablename))
.listener(listener);
params.writeBufferSize(10 * 1024 * 1024);
final BufferedMutator mutator;
try {
mutator = connection.getBufferedMutator(params);
mutator.mutate(puts);
mutator.flush();
} catch (IOException e) {
e.printStackTrace();
}