目录
安装部署
集群的启动和停止
Shell操作
表操作
命名空间操作
数据操作
API编程实现
环境准备
代码实现
执行效果
hbase与mapreduce集成
环境配置
案例1:统计hbase表中数据
案例2:将本地数据存入hbase表
案例3:将表中数据通过自定义mapreduce放入hbase表中
案例4:查询数据并插入新表
hbase优化
高可用
预分区
统一时间
HBase是一种分布式、可扩展、支持海量数据存储的NoSQL数据库。逻辑上,HBase 的数据模型同关系型数据库很类似,数据存储在一张表中,有行有列。但从HBase 的底层物理存储结构(K-V)来看,HBase 更像是一个 multi-dimensional map。
hbase的逻辑结构
hbase的物理存储结构
hbase的基本架构
安装部署
安装部署hbase之前,需确保hadoop集群和zookeeper集群正常启动。
将下载好的安装包上传到hadoop101 /opt/software中。
//解压
[hadoop@hadoop101 software]$ tar -zxvf hbase-2.3.7-bin.tar.gz -C /opt/module/
//配置
[hadoop@hadoop101 software]$ cd /opt/module/hbase-2.3.7
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/
找到这两个地方并修改,JAVA_HOME要根据自己上传的路径配置。
export JAVA_HOME=/opt/module/jdk1.8.0_212
export HBASE_MANAGES_ZK=false
//配置hbase-site.xml
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/hbase-site.xml
<!--指定Region服务器共享的目录,用来持久存储hbase的数据,url必须完全正确-->
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop101:8020/HBase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 0.98 后的新变动,
之前版本没有.port,默认端口为 60000 -->
<property>
<name>hbase.master.port</name>
<value>16000</value>
</property>
<!--配置zookeeper集群地址,不要指定znode路径-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop101,hadoop102,hadoop103</value>
</property>
<!--指定zookeeper数据存储目录-->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/module/zookeeper-3.5.7/zkData</value>
</property>
<!--本地文件的存放目录-->
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<!--HMaster相关配置-->
<property>
<name>hbase.master.info.bindAddress</name>
<value>hadoop101</value>
</property>
//配置regionservers
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/regionservers
hadoop101
hadoop102
hadoop103
//软连接hadoop 配置文件到HBase
[hadoop@hadoop101 hbase-2.3.7]$ ln -s /opt/module/hadoop-3.1.3/etc/hadoop/core-site.xml /opt/module/hbase-2.3.7/conf/core-site.xml
[hadoop@hadoop101 hbase-2.3.7]$ ln -s /opt/module/hadoop-3.1.3/etc/hadoop/hdfs-site.xml /opt/module/hbase-2.3.7/conf/hdfs-site.xml
//分发
[hadoop@hadoop101 hbase-2.3.7]$ cd /opt/module/
[hadoop@hadoop101 module]$ xsync hbase-2.3.7/
集群的启动和停止
仅在hadoop101上运行下列命令:
//启动
[hadoop@hadoop101 hbase-2.3.7]$ bin/
//停止
[hadoop@hadoop101 hbase-2.3.7]$ bin/
正常启动后,我们可以到浏览器上访问http://hadoop101:16010。
如果浏览器拒绝访问
- 检查之前的配置是否正确,保证刚刚配置的hbase-site.xml和/hadoop-3.1.3/etc/core-site.xml中的路径一致(我统一配置为8020)。
- 关闭hbase, zookeeper, hadoop,再重新启动,一定要保证启动hbase之前,hadoop和zookeeper都正常启动了。
- 如果还无法解决,可以查看hbase的日志,查看具体错误,自行解决。
[hadoop@hadoop101 hbase-2.3.7]$ cd logs
[hadoop@hadoop101 logs]$ ll
//查看最近的log
[hadoop@hadoop101 logs]$ cat hbase-hadoop-master-hadoop101.log
//我的错误如下,提示zookeeper连接有问题,因此重启zookeeper即可
zookeeper.ClientCnxn: Opening socket connection to server hadoop101/192.168.120.101:2181. Will not attempt to authenticate using SASL (unknown error)
Shell操作
//进入hbase客户端命令行
[hadoop@hadoop101 hbase-2.3.7]$ bin/hbase shell
//查看帮助命令
hbase(main):001:0> help
表操作
//查看当前数据库中有哪些表
hbase(main):002:0> list
//创建表
hbase(main):003:0> create "student", "sinfo"
//查看表的详情信息
hbase(main):004:0> describe "student"
//修改将info列族中版本为3
hbase(main):005:0> alter "student",{NAME => 'sinfo', VERSIONS => '3'}
//查看
hbase(main):006:0> describe "student"
//删除表
hbase(main):007:0> disable "student"
hbase(main):008:0> drop "student"
//直接drop会报错:Table student is enabled. Disable it first.
命名空间操作
//查看命名空间
hbase(main):009:0> list_namespace
//创建命名空间
hbase(main):010:0> create_namespace "bigdata"
//在命名空间下创建表
hbase(main):011:0> create "bigdata:student", "info"
我们可以来到浏览器上,能看到我们创建的student表。
//删除命名空间
//先关闭表
hbase(main):012:0> disable "bigdata:student"
//再删除表
hbase(main):013:0> drop "bigdata:student"
//最后删除命名空间
hbase(main):014:0> drop_namespace "bigdata"
数据操作
hbase(main):003:0> put 'bigdata:student','1001','info:name','Alice'
hbase(main):004:0> put 'bigdata:student','1001','info:sex','F'
hbase(main):005:0> put 'bigdata:student','1001','info:age','23'
hbase(main):006:0> put 'bigdata:student','1002','info:name','Bob'
hbase(main):007:0> put 'bigdata:student','1002','info:sex','M'
hbase(main):008:0> put 'bigdata:student','1002','info:age','22'
hbase(main):009:0> put 'bigdata:student','1003','info:name','Caroline'
hbase(main):010:0> put 'bigdata:student','1003','info:sex','F'
hbase(main):011:0> put 'bigdata:student','1003','info:age','24'
//查看表中数据
hbase(main):012:0> scan 'bigdata:student'
//统计表中数据
hbase(main):013:0> count 'bigdata:student'
//查看指定数据
hbase(main):014:0> get 'bigdata:student','1001'
hbase(main):015:0> get 'bigdata:student','1001','info'
hbase(main):016:0> get 'bigdata:student','1001','info:name'
//删除表中数据
hbase(main):017:0> delete 'bigdata:student','1001','info:name'
//查看是否删除
hbase(main):018:0> scan 'bigdata:student'
//清空表中数据
hbase(main):019:0> truncate 'bigdata:student'
//查看是否清空
hbase(main):020:0> scan 'bigdata:student'
API编程实现
上述操作也可以通过API编程实现。
环境准备
创建hbase_demo工程,并导入依赖pom.xml。
<!--自定义属性设置版本号-->
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<hbase-version>2.3.7</hbase-version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<hadoop-version>3.1.3</hadoop-version>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase-version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase-version}</version>
</dependency>
</dependencies>
创建日志log4j.properties。
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
代码实现
public class HbaseAPI {
//日志
private static Logger logger=Logger.getLogger(HbaseAPI.class);
private static Connection connection;
private static Admin admin;
//静态代码块
static{
try {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum","hadoop101,hadoop102,hadoop103");
//获取连接对象
connection = ConnectionFactory.createConnection(conf);
//获取管理员对象
admin = connection.getAdmin();
} catch (IOException e) {
e.printStackTrace();
}finally {
if(admin!=null){
try {
admin.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public static void main(String[] args) throws IOException {
createNamespace("bigdata");
listNamespace();
System.out.println("bigdata:student exists: "+ isTableExist("bigdata:student"));
createTable("bigdata:student", "sinfo");
putData("bigdata:student","1001","sinfo","name","Alice");
putData("bigdata:student","1001","sinfo","sex","F");
putData("bigdata:student","1001","sinfo","age","23");
putData("bigdata:student","1002","sinfo","name","Bob");
putData("bigdata:student","1002","sinfo","sex","M");
putData("bigdata:student","1002","sinfo","age","22");
putData("bigdata:student","1003","sinfo","name","Caroline");
putData("bigdata:student","1003","sinfo","sex","F");
putData("bigdata:student","1003","sinfo","age","24");
getData("bigdata:student","1001",null,null);
getDataScan("bigdata:student","1003","sinfo",null);
getCount("bigdata:student");
deleteData("bigdata:student","1002","sinfo","sex");
getData("bigdata:student","1002",null,null);
deleteNamespace("bigdata","student");
}
//判断表是否存在
public static boolean isTableExist(String tableName) throws IOException {
return admin.tableExists(TableName.valueOf(tableName));
}
//创建表
public static void createTable(String tableName,String... cfs) throws IOException {
//判断列族是否存在
if(cfs.length<=0){
("列族不存在");
return ;
}
//判断表是否存在
if(isTableExist(tableName)){
("表存在");
return;
}
//创建表的描述器
HTableDescriptor hTableDescriptor=new HTableDescriptor(TableName.valueOf(tableName));
for (String cf : cfs) {
//创建表的列描述器
HColumnDescriptor hColumnDescriptor=new HColumnDescriptor(cf);
//把列的描述器加入表的描述器中
hTableDescriptor.addFamily(hColumnDescriptor);
}
//创建表
admin.createTable(hTableDescriptor);
}
//删除表
public static void deleteTable(String tableName) throws IOException {
//判断表是否存在
if(!isTableExist(tableName)){
("表不存在");
return ;
}
//关闭表
admin.disableTable(TableName.valueOf(tableName));
//删除表
admin.deleteTable(TableName.valueOf(tableName));
}
//创建命名空间
public static void createNamespace(String namespace){
try {
//创建命名空间的描述器
NamespaceDescriptor descriptor=NamespaceDescriptor.create(namespace).build();
//创建命名空间
admin.createNamespace(descriptor);
} catch (IOException e) {
e.printStackTrace();
}
}
//查看命名空间
public static void listNamespace() throws IOException {
String[] namespaces = admin.listNamespaces();
(Arrays.asList(namespaces));
}
//删除命名空间
public static void deleteNamespace(String namespace,String tableName) throws IOException {
if(isTableExist(namespace+":"+tableName)){
//删除表
deleteTable(namespace+":"+tableName);
}
//删除命名空间
admin.deleteNamespace(namespace);
}
//向表中插入数据
public static void putData(String tableName,String rowkey,String cfs,String cn,String value) throws IOException {
//获取表
Table table = connection.getTable(TableName.valueOf(tableName));
//创建Put对象
Put put=new Put(Bytes.toBytes(rowkey));
//添加数据
put.addColumn(Bytes.toBytes(cfs),Bytes.toBytes(cn),Bytes.toBytes(value));
//插入数据
table.put(put);
}
//获取表中数据
public static void getData(String tableName,String rowkey,String cfs,String cn) throws IOException {
//获取表
Table table = connection.getTable(TableName.valueOf(tableName));
//创建get对象
Get get=new Get(Bytes.toBytes(rowkey));
//获取数据
Result result = table.get(get);
for (Cell cell : result.rawCells()) {
(Bytes.toString(cell.getRowArray())+"\t"+
Bytes.toString(cell.getFamilyArray())+"\t"+
Bytes.toString(cell.getQualifierArray())+"\t"+
Bytes.toString(cell.getValueArray()));
}
//释放资源
table.close();
}
//查询表中数据
public static void getDataScan(String tableName,String rowkey,String cfs,String cn) throws IOException {
//获取表
Table table = connection.getTable(TableName.valueOf(tableName));
//创建Scan对象
Scan scan=new Scan(Bytes.toBytes(rowkey));
scan.addFamily(Bytes.toBytes(cfs));
//查询表
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
for (Cell cell : result.rawCells()) {
(Bytes.toString(cell.getRowArray())+"\t"+
Bytes.toString(cell.getFamilyArray())+"\t"+
Bytes.toString(cell.getQualifierArray())+"\t"+
Bytes.toString(cell.getValueArray()));
}
}
//释放资源
table.close();
}
//统计表中数据
public static void getCount(String tableName) throws IOException {
//获取表
Table table = connection.getTable(TableName.valueOf(tableName));
//创建Scan对象
Scan scan=new Scan();
scan.setFilter(new FirstKeyOnlyFilter());
//查询表
ResultScanner scanner = table.getScanner(scan);
//统计共有几条数据
long count=0;
for (Result result : scanner) {
count+=result.size();
}
(tableName+"共有:"+count+"条数据");
//释放资源
table.close();
}
//删除表中数据
public static void deleteData(String tableName,String rowkey,String cfs,String cn) throws IOException {
//获取表
Table table = connection.getTable(TableName.valueOf(tableName));
//创建Delete对象
Delete delete=new Delete(Bytes.toBytes(rowkey));
delete.addColumn(Bytes.toBytes(cfs),Bytes.toBytes(cn));
//删除表中数据
table.delete(delete);
}
}
执行效果
hbase与mapreduce集成
环境配置
//配置环境变量
[hadoop@hadoop101 hbase-2.3.7]$ sudo vim /etc/profile.d/my_env.sh
##HBASE_HOME
export HBASE_HOME=/opt/module/hbase-2.3.7
export PATH=$PATH:$HBASE_HOME/bin
//刷新
[hadoop@hadoop101 hbase-2.3.7]$ source /etc/profile
//分发
[hadoop@hadoop101 hbase-2.3.7]$ sudo /home/hadoop/bin/xsync /etc/profile.d/my_env.sh
//配置
[hadoop@hadoop101 hbase-2.3.7]$ cd /opt/module/hadoop-3.1.3
[hadoop@hadoop101 hadoop-3.1.3]$ vim etc/hadoop/
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/module/hbase-2.3.7/lib/*
//分发
[hadoop@hadoop101 hadoop-3.1.3]$ xsync etc/hadoop/
//重新启动hbase和hadoop集群
[hadoop@hadoop101 hbase-2.3.7]$ bin/
[hadoop@hadoop101 hbase-2.3.7]$ stop
[hadoop@hadoop101 hbase-2.3.7]$ start
[hadoop@hadoop101 hbase-2.3.7]$ bin/
案例1:统计hbase表中数据
//进入shell命令
[hadoop@hadoop101 hbase-2.3.7]$ bin/hbase shell
//创建命名空间
hbase(main):001:0> create_namespace "bigdata"
//创建表
hbase(main):002:0> create "bigdata:student", "sinfo"
//插入数据
hbase(main):003:0> put 'bigdata:student','1001','sinfo:name','Alice'
hbase(main):004:0> put 'bigdata:student','1001','sinfo:sex','F'
hbase(main):005:0> put 'bigdata:student','1001','sinfo:age','23'
hbase(main):006:0> put 'bigdata:student','1002','sinfo:name','Bob'
hbase(main):007:0> put 'bigdata:student','1002','sinfo:sex','M'
hbase(main):008:0> put 'bigdata:student','1002','sinfo:age','22'
hbase(main):009:0> put 'bigdata:student','1003','sinfo:name','Caroline'
hbase(main):010:0> put 'bigdata:student','1003','sinfo:sex','F'
hbase(main):011:0> put 'bigdata:student','1003','sinfo:age','24'
//查看数据
hbase(main):012:0> scan "bigdata:student"
//退出
hbase(main):013:0> quit
//统计student表中数据数量
[hadoop@hadoop101 hbase-2.3.7]$ /opt/module/hadoop-3.1.3/bin/yarn jar lib/hbase-mapreduce-2.3.7.jar rowcounter bigdata:student
案例2:将本地数据存入hbase表
//创建本地数据
[hadoop@hadoop101 hbase-2.3.7]$ vim fruit.tsv
注意,文件中是以\t(tab)为分隔符的。
1001 Apple Red
1002 Pear Yellow
1003 Pineapple Yellow
//把fruit.tsv文件上传到hdfs中的/fruit文件夹下
[hadoop@hadoop101 hbase-2.3.7]$ hdfs dfs -mkdir -p /fruit
[hadoop@hadoop101 hbase-2.3.7]$ hdfs dfs -put fruit.tsv /fruit
我们可以到http://hadoop101:9870 上查看。
//在hbase中创建fruit表
hbase(main):001:0> create "bigdata:fruit", "info"
//将本地数据存入hbase表中
[hadoop@hadoop101 hbase-2.3.7]$ /opt/module/hadoop-3.1.3/bin/yarn jar lib/hbase-mapreduce-2.3.7.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color bigdata:fruit hdfs://hadoop101:8020/fruit/fruit.tsv
我们再来到hbase中查看,存入成功。
案例3:将表中数据通过自定义mapreduce放入hbase表中
需求:将以下 fruit 表中的一部分数据,通过 mapreduce 迁入到 hbase 表中。
在API编程实现的基础上,在pom.xml中导入如下依赖:
<!--hbase与mapreduce依赖-->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-mapreduce</artifactId>
<version>${hbase-version}</version>
</dependency>
这里我们介绍两种实现方式。
//第一种实现方式
public class HdfsToHbseMr1 {
//Mapper阶段
public static class FruitToMrMapper extends Mapper<LongWritable, Text,LongWritable,Text> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//写出
context.write(key,value);
}
}
//Reducer阶段
public static class FruitToMrReducer extends TableReducer<LongWritable,Text, NullWritable>{
@Override
protected void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
for (Text value : values) {
String[] split = value.toString().split("\t");
//创建Put对象
Put put=new Put(Bytes.toBytes(split[0]));
//给put赋值
put.addColumn(Bytes.toBytes("info"),Bytes.toBytes("name"),Bytes.toBytes(split[1]));
put.addColumn(Bytes.toBytes("info"),Bytes.toBytes("color"),Bytes.toBytes(split[2]));
//写出去
context.write(NullWritable.get(),put);
}
}
}
//Driver阶段
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//创建配置对象
Configuration conf=new Configuration();
//获取job对象
Job job = Job.getInstance(conf);
//设置jar位置
job.setJarByClass(HdfsToHbseMr1.class);
//设置Mapper
job.setMapperClass(FruitToMrMapper.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
//设置Reducer
TableMapReduceUtil.initTableReducerJob(
args[1],
FruitToMrReducer.class,
job
);
//设置输入路径
FileInputFormat.setInputPaths(job,new Path(args[0]));
//提交
boolean result = job.waitForCompletion(true);
System.exit(result?0:1);
}
}
//第二种实现方式
public class HdfsToHbseMr2 {
//Mapper阶段
public static class FruitMr2HbaseMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//获取hdfs中一行数据:1001 Apple Red
String line=new String(value.getBytes(),0,value.getLength(),"UTF-8");
//切割
String[] split = line.split("\t");
//创建对象ImmutableBytesWritable
ImmutableBytesWritable k=new ImmutableBytesWritable(Bytes.toBytes(split[0]));
//创建Put对象
Put v=new Put(Bytes.toBytes(split[0]));
//给put赋值
v.addColumn(Bytes.toBytes("info"),Bytes.toBytes("name"),Bytes.toBytes(split[1]));
v.addColumn(Bytes.toBytes("info"),Bytes.toBytes("color"),Bytes.toBytes(split[2]));
//输出
context.write(k,v);
}
}
//Reducer阶段
public static class FruitMr2HbaseReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable>{
@Override
protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
for (Put put : values) {
context.write(NullWritable.get(),put);
}
}
}
//Driver阶段
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//创建配置对象
Configuration conf=new Configuration();
//获取job对象
Job job = Job.getInstance(conf);
//设置jar位置
job.setJarByClass(HdfsToHbseMr2.class);
//设置Mapper
job.setMapperClass(FruitMr2HbaseMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
//设置Reducer
TableMapReduceUtil.initTableReducerJob(
args[1],
FruitMr2HbaseReducer.class,
job
);
//设置输入路径
FileInputFormat.setInputPaths(job,new Path(args[0]));
//提交
boolean result = job.waitForCompletion(true);
System.exit(result?0:1);
}
}
打包后上传到hbase-2.3.7目录下。在pom.xml中导入打包插件,注意更改类所在的位置噢!
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin </artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.hbase.HdfsToHbseMr2</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
执行:
[hadoop@hadoop101 hbase-2.3.7]$ yarn jar hdfstohbase1.jar com.hbase.HdfsToHbseMr1 /fruit/fruit.tsv bigdata:fruit1
案例4:查询数据并插入新表
需求:把bigdata:fruit中数据有关name列查询出来存入到hbase中的bigdata:fruit3表中
public class HbaseMrHbase {
//Mapper阶段
public static class FruitHbaseMrMapper extends TableMapper<ImmutableBytesWritable, Put> {
@Override
protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
//创建Put对象
Put v=new Put(key.get());
for (Cell cell : value.rawCells()) {
//判断当前的cell是否是name列限定符
if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
//添加name的值
v.add(cell);
}
}
//输出
context.write(key,v);
}
}
//Reducer阶段
public static class FruitHbaseMrReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable>{
@Override
protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
for (Put put : values) {
context.write(NullWritable.get(),put);
}
}
}
//Driver阶段
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//创建配置对象
Configuration conf=new Configuration();
//获取job对象
Job job = Job.getInstance(conf);
//设置jar位置
job.setJarByClass(HbaseMrHbase.class);
//设置Mapper
TableMapReduceUtil.initTableMapperJob(
args[0],
new Scan(),
FruitHbaseMrMapper.class,
ImmutableBytesWritable.class,
Put.class,
job
);
//设置Reducer
TableMapReduceUtil.initTableReducerJob(
args[1],
FruitHbaseMrReducer.class,
job
);
//提交
boolean result = job.waitForCompletion(true);
System.exit(result?0:1);
}
}
hbase优化
高可用
在 HBase 中 HMaster 负责监控 HRegionServer 的生命周期,均衡 RegionServer 的负载,如果 HMaster 挂掉了,那么整个 HBase 集群将陷入不健康的状态,并且此时的工作状态并不会维持太久。所以 HBase 支持对 HMaster 的高可用配置。
//关闭hbase
[hadoop@hadoop101 hbase-2.3.7]$ bin/
//在 conf 目录下创建 backup-masters 文件,注意backup-masters不能改变
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/backup-masters
hadoop102
hadoop103
//分发
[hadoop@hadoop101 hbase-2.3.7]$ xsync conf/backup-masters
//重启
[hadoop@hadoop101 hbase-2.3.7]$ bin/
查看进程,启动成功。
预分区
每一个 region 维护着 StartRow 与 EndRow,如果加入的数据符合某个 Region 维护的RowKey 范围,则该数据交给这个 Region 维护。那么依照这个原则,我们可以将数据所要投放的分区提前大致的规划好,以提高 HBase 性能。
//手动设定预分区
hbase(main):001:0> create "bigdata:jk","sinfo","partition1",SPLITS=>['1000','2000','3000','4000']
//生成 16 进制序列预分区
hbase(main):002:0> create "bigdata:jk2","sinfo",'partition2',{NUMREGIONS=>15,SPLITALGO=>'HexStringSplit'}
//创建文件实现预分区
[hadoop@hadoop101 hbase-2.3.7]$ vim splits.txt
aa
bb
cc
dd
hbase(main):001:0> create "bigdata:jk3","sinfo",'partition4',SPLITS_FILE => 'splits.txt'
我们可以来到浏览器上查看:
统一时间
在三台客户机下分别运行下列命令:
[hadoop@hadoop101 hbase-2.3.7]$ sudo ntpdate