HBASE的基本操作实验目的实验总结 hbase实验报告

转载

mob64ca140eb362 2023-11-04 13:24:50

文章标签 HBASE的基本操作实验目的实验总结 hbase 大数据 hadoop mapreduce 文章分类 Hbase 数据库

安装部署

集群的启动和停止

Shell操作

表操作

命名空间操作

数据操作

API编程实现

环境准备

代码实现

执行效果

hbase与mapreduce集成

环境配置

案例1：统计hbase表中数据

案例2：将本地数据存入hbase表

案例3：将表中数据通过自定义mapreduce放入hbase表中

案例4：查询数据并插入新表

hbase优化

高可用

预分区

统一时间

HBase是一种分布式、可扩展、支持海量数据存储的NoSQL数据库。逻辑上，HBase 的数据模型同关系型数据库很类似，数据存储在一张表中，有行有列。但从HBase 的底层物理存储结构（K-V）来看，HBase 更像是一个 multi-dimensional map。

HBASE的基本操作实验目的实验总结 hbase实验报告_大数据

hbase的逻辑结构

HBASE的基本操作实验目的实验总结 hbase实验报告_hbase_02

hbase的物理存储结构

HBASE的基本操作实验目的实验总结 hbase实验报告_hbase_03

hbase的基本架构

安装部署

安装部署hbase之前，需确保hadoop集群和zookeeper集群正常启动。

将下载好的安装包上传到hadoop101 /opt/software中。

//解压
[hadoop@hadoop101 software]$ tar -zxvf hbase-2.3.7-bin.tar.gz -C /opt/module/
//配置
[hadoop@hadoop101 software]$ cd /opt/module/hbase-2.3.7
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/

找到这两个地方并修改，JAVA_HOME要根据自己上传的路径配置。

export JAVA_HOME=/opt/module/jdk1.8.0_212
export HBASE_MANAGES_ZK=false

//配置hbase-site.xml
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/hbase-site.xml

<!--指定Region服务器共享的目录，用来持久存储hbase的数据，url必须完全正确-->
  <property>
     <name>hbase.rootdir</name>
     <value>hdfs://hadoop101:8020/HBase</value>
  </property>
  <property>
     <name>hbase.cluster.distributed</name>
     <value>true</value>
  </property>
  <!-- 0.98 后的新变动，
               之前版本没有.port,默认端口为 60000 -->
  <property>
     <name>hbase.master.port</name>
     <value>16000</value>
  </property>
  <!--配置zookeeper集群地址，不要指定znode路径-->
  <property>
     <name>hbase.zookeeper.quorum</name>
     <value>hadoop101,hadoop102,hadoop103</value>
  </property>
  <!--指定zookeeper数据存储目录-->
  <property>
     <name>hbase.zookeeper.property.dataDir</name>
     <value>/opt/module/zookeeper-3.5.7/zkData</value>
  </property>
  <!--本地文件的存放目录-->
  <property>
     <name>hbase.tmp.dir</name>
     <value>./tmp</value>
  </property>
  <property>
     <name>hbase.unsafe.stream.capability.enforce</name>
     <value>false</value>
  </property>
  <!--HMaster相关配置-->
  <property>
     <name>hbase.master.info.bindAddress</name>
     <value>hadoop101</value>
  </property>

//配置regionservers
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/regionservers

hadoop101
hadoop102
hadoop103

//软连接hadoop 配置文件到HBase
[hadoop@hadoop101 hbase-2.3.7]$ ln -s /opt/module/hadoop-3.1.3/etc/hadoop/core-site.xml /opt/module/hbase-2.3.7/conf/core-site.xml   
[hadoop@hadoop101 hbase-2.3.7]$ ln -s /opt/module/hadoop-3.1.3/etc/hadoop/hdfs-site.xml /opt/module/hbase-2.3.7/conf/hdfs-site.xml
//分发
[hadoop@hadoop101 hbase-2.3.7]$ cd /opt/module/
[hadoop@hadoop101 module]$ xsync hbase-2.3.7/

集群的启动和停止

仅在hadoop101上运行下列命令：

//启动
[hadoop@hadoop101 hbase-2.3.7]$ bin/
//停止
[hadoop@hadoop101 hbase-2.3.7]$ bin/

HBASE的基本操作实验目的实验总结 hbase实验报告_hbase_04

正常启动后，我们可以到浏览器上访问http://hadoop101:16010。

HBASE的基本操作实验目的实验总结 hbase实验报告_HBASE的基本操作实验目的实验总结_05

如果浏览器拒绝访问

检查之前的配置是否正确，保证刚刚配置的hbase-site.xml和/hadoop-3.1.3/etc/core-site.xml中的路径一致（我统一配置为8020）。
关闭hbase, zookeeper, hadoop，再重新启动，一定要保证启动hbase之前，hadoop和zookeeper都正常启动了。
如果还无法解决，可以查看hbase的日志，查看具体错误，自行解决。

[hadoop@hadoop101 hbase-2.3.7]$ cd logs
[hadoop@hadoop101 logs]$ ll
//查看最近的log
[hadoop@hadoop101 logs]$ cat hbase-hadoop-master-hadoop101.log

//我的错误如下，提示zookeeper连接有问题，因此重启zookeeper即可
zookeeper.ClientCnxn: Opening socket connection to server hadoop101/192.168.120.101:2181. Will not attempt to authenticate using SASL (unknown error)

Shell操作

//进入hbase客户端命令行
[hadoop@hadoop101 hbase-2.3.7]$ bin/hbase shell

//查看帮助命令
hbase(main):001:0> help

HBASE的基本操作实验目的实验总结 hbase实验报告_hbase_06

表操作

//查看当前数据库中有哪些表
hbase(main):002:0> list
//创建表
hbase(main):003:0> create "student", "sinfo"
//查看表的详情信息
hbase(main):004:0> describe "student"

HBASE的基本操作实验目的实验总结 hbase实验报告_hbase_07

//修改将info列族中版本为3
hbase(main):005:0> alter "student",{NAME => 'sinfo', VERSIONS => '3'}
//查看
hbase(main):006:0> describe "student"

HBASE的基本操作实验目的实验总结 hbase实验报告_大数据_08

//删除表
hbase(main):007:0> disable "student"                                                                                                                                    
hbase(main):008:0> drop "student"
//直接drop会报错：Table student is enabled. Disable it first.

命名空间操作

//查看命名空间
hbase(main):009:0> list_namespace
//创建命名空间                                                                                                                                      
hbase(main):010:0> create_namespace "bigdata"
//在命名空间下创建表                                                                                                                                     
hbase(main):011:0> create "bigdata:student", "info"

我们可以来到浏览器上，能看到我们创建的student表。

HBASE的基本操作实验目的实验总结 hbase实验报告_mapreduce_09

//删除命名空间

//先关闭表
hbase(main):012:0> disable "bigdata:student"
//再删除表                                                                                                                                        
hbase(main):013:0> drop "bigdata:student"
//最后删除命名空间                                                                                                                                 
hbase(main):014:0> drop_namespace "bigdata"

数据操作

hbase(main):003:0> put 'bigdata:student','1001','info:name','Alice'                                                                                                                                       
hbase(main):004:0> put 'bigdata:student','1001','info:sex','F'                                                                                                                                  
hbase(main):005:0> put 'bigdata:student','1001','info:age','23'                                                                                                                                    
hbase(main):006:0> put 'bigdata:student','1002','info:name','Bob'                                                                                                                                      
hbase(main):007:0> put 'bigdata:student','1002','info:sex','M'                                                                                                                                     
hbase(main):008:0> put 'bigdata:student','1002','info:age','22'                                                                                                                                     
hbase(main):009:0> put 'bigdata:student','1003','info:name','Caroline'                                                                                                                                     
hbase(main):010:0> put 'bigdata:student','1003','info:sex','F'                                                                                                                                 
hbase(main):011:0> put 'bigdata:student','1003','info:age','24'

//查看表中数据
hbase(main):012:0> scan 'bigdata:student'
//统计表中数据
hbase(main):013:0> count 'bigdata:student'
//查看指定数据
hbase(main):014:0> get 'bigdata:student','1001'                                                                                                                                         
hbase(main):015:0> get 'bigdata:student','1001','info'                                                                                                                                      
hbase(main):016:0> get 'bigdata:student','1001','info:name'

//删除表中数据
hbase(main):017:0> delete 'bigdata:student','1001','info:name'
//查看是否删除                                                                                                                                          
hbase(main):018:0> scan 'bigdata:student'

//清空表中数据
hbase(main):019:0> truncate 'bigdata:student'
//查看是否清空                                                                                                                                      
hbase(main):020:0> scan 'bigdata:student'

API编程实现

上述操作也可以通过API编程实现。

环境准备

创建hbase_demo工程，并导入依赖pom.xml。

<!--自定义属性设置版本号-->
    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <hbase-version>2.3.7</hbase-version>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <hadoop-version>3.1.3</hadoop-version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.8.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>${hbase-version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>${hbase-version}</version>
        </dependency>
    </dependencies>

创建日志log4j.properties。

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

代码实现

public class HbaseAPI {
    //日志
    private static Logger logger=Logger.getLogger(HbaseAPI.class);
    private static Connection connection;
    private static Admin admin;
    //静态代码块
    static{
        try {
            Configuration conf = HBaseConfiguration.create();
            conf.set("hbase.zookeeper.quorum","hadoop101,hadoop102,hadoop103");
            //获取连接对象
            connection = ConnectionFactory.createConnection(conf);
            //获取管理员对象
            admin = connection.getAdmin();
        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            if(admin!=null){
                try {
                    admin.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    public static void main(String[] args) throws IOException {
        createNamespace("bigdata");
        listNamespace();
        System.out.println("bigdata:student exists: "+ isTableExist("bigdata:student"));
        createTable("bigdata:student", "sinfo");
        putData("bigdata:student","1001","sinfo","name","Alice");
        putData("bigdata:student","1001","sinfo","sex","F");
        putData("bigdata:student","1001","sinfo","age","23");
        putData("bigdata:student","1002","sinfo","name","Bob");
        putData("bigdata:student","1002","sinfo","sex","M");
        putData("bigdata:student","1002","sinfo","age","22");
        putData("bigdata:student","1003","sinfo","name","Caroline");
        putData("bigdata:student","1003","sinfo","sex","F");
        putData("bigdata:student","1003","sinfo","age","24");
        getData("bigdata:student","1001",null,null);
        getDataScan("bigdata:student","1003","sinfo",null);
        getCount("bigdata:student");
        deleteData("bigdata:student","1002","sinfo","sex");
        getData("bigdata:student","1002",null,null);
        deleteNamespace("bigdata","student");
    }

    //判断表是否存在
    public static boolean isTableExist(String tableName) throws IOException {
        return admin.tableExists(TableName.valueOf(tableName));
    }

    //创建表
    public static void createTable(String tableName,String... cfs) throws IOException {
        //判断列族是否存在
        if(cfs.length<=0){
            ("列族不存在");
            return ;
        }
        //判断表是否存在
        if(isTableExist(tableName)){
            ("表存在");
            return;
        }
        //创建表的描述器
        HTableDescriptor hTableDescriptor=new HTableDescriptor(TableName.valueOf(tableName));
        for (String cf : cfs) {
            //创建表的列描述器
            HColumnDescriptor hColumnDescriptor=new HColumnDescriptor(cf);
            //把列的描述器加入表的描述器中
            hTableDescriptor.addFamily(hColumnDescriptor);
        }
        //创建表
        admin.createTable(hTableDescriptor);
    }

    //删除表
    public static void deleteTable(String tableName) throws IOException {
        //判断表是否存在
        if(!isTableExist(tableName)){
            ("表不存在");
            return ;
        }
        //关闭表
        admin.disableTable(TableName.valueOf(tableName));
        //删除表
        admin.deleteTable(TableName.valueOf(tableName));
    }

    //创建命名空间
    public static void createNamespace(String namespace){
        try {
            //创建命名空间的描述器
            NamespaceDescriptor descriptor=NamespaceDescriptor.create(namespace).build();
            //创建命名空间
            admin.createNamespace(descriptor);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    //查看命名空间
    public static void listNamespace() throws IOException {
        String[] namespaces = admin.listNamespaces();
        (Arrays.asList(namespaces));
    }

    //删除命名空间
    public static void deleteNamespace(String namespace,String tableName) throws IOException {
        if(isTableExist(namespace+":"+tableName)){
            //删除表
            deleteTable(namespace+":"+tableName);
        }
        //删除命名空间
        admin.deleteNamespace(namespace);
    }

    //向表中插入数据
    public static void putData(String tableName,String rowkey,String cfs,String cn,String value) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建Put对象
        Put put=new Put(Bytes.toBytes(rowkey));
        //添加数据
        put.addColumn(Bytes.toBytes(cfs),Bytes.toBytes(cn),Bytes.toBytes(value));
        //插入数据
        table.put(put);
    }

    //获取表中数据
    public static void getData(String tableName,String rowkey,String cfs,String cn) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建get对象
        Get get=new Get(Bytes.toBytes(rowkey));
        //获取数据
        Result result = table.get(get);
        for (Cell cell : result.rawCells()) {
            (Bytes.toString(cell.getRowArray())+"\t"+
                    Bytes.toString(cell.getFamilyArray())+"\t"+
                    Bytes.toString(cell.getQualifierArray())+"\t"+
                    Bytes.toString(cell.getValueArray()));
        }
        //释放资源
        table.close();
    }

    //查询表中数据
    public static void getDataScan(String tableName,String rowkey,String cfs,String cn) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建Scan对象
        Scan scan=new Scan(Bytes.toBytes(rowkey));
        scan.addFamily(Bytes.toBytes(cfs));
        //查询表
        ResultScanner scanner = table.getScanner(scan);
        for (Result result : scanner) {
            for (Cell cell : result.rawCells()) {
                (Bytes.toString(cell.getRowArray())+"\t"+
                        Bytes.toString(cell.getFamilyArray())+"\t"+
                        Bytes.toString(cell.getQualifierArray())+"\t"+
                        Bytes.toString(cell.getValueArray()));
            }
        }
        //释放资源
        table.close();
    }

    //统计表中数据
    public static void getCount(String tableName) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建Scan对象
        Scan scan=new Scan();
        scan.setFilter(new FirstKeyOnlyFilter());
        //查询表
        ResultScanner scanner = table.getScanner(scan);
        //统计共有几条数据
        long count=0;
        for (Result result : scanner) {
            count+=result.size();
        }
        (tableName+"共有:"+count+"条数据");
        //释放资源
        table.close();
    }

    //删除表中数据
    public static void deleteData(String tableName,String rowkey,String cfs,String cn) throws IOException {
        //获取表
        Table table = connection.getTable(TableName.valueOf(tableName));
        //创建Delete对象
        Delete delete=new Delete(Bytes.toBytes(rowkey));
        delete.addColumn(Bytes.toBytes(cfs),Bytes.toBytes(cn));
        //删除表中数据
        table.delete(delete);
    }

}

执行效果

HBASE的基本操作实验目的实验总结 hbase实验报告_大数据_10

hbase与mapreduce集成

环境配置

//配置环境变量
[hadoop@hadoop101 hbase-2.3.7]$ sudo vim /etc/profile.d/my_env.sh

##HBASE_HOME
export HBASE_HOME=/opt/module/hbase-2.3.7
export PATH=$PATH:$HBASE_HOME/bin

//刷新
[hadoop@hadoop101 hbase-2.3.7]$ source /etc/profile
//分发
[hadoop@hadoop101 hbase-2.3.7]$ sudo /home/hadoop/bin/xsync /etc/profile.d/my_env.sh

//配置
[hadoop@hadoop101 hbase-2.3.7]$ cd /opt/module/hadoop-3.1.3
[hadoop@hadoop101 hadoop-3.1.3]$ vim etc/hadoop/

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/module/hbase-2.3.7/lib/*

//分发
[hadoop@hadoop101 hadoop-3.1.3]$ xsync etc/hadoop/

//重新启动hbase和hadoop集群
[hadoop@hadoop101 hbase-2.3.7]$ bin/
[hadoop@hadoop101 hbase-2.3.7]$  stop
[hadoop@hadoop101 hbase-2.3.7]$  start
[hadoop@hadoop101 hbase-2.3.7]$ bin/

案例1：统计hbase表中数据

//进入shell命令
[hadoop@hadoop101 hbase-2.3.7]$ bin/hbase shell

//创建命名空间
hbase(main):001:0> create_namespace "bigdata"                                                                                                                                       
//创建表
hbase(main):002:0> create "bigdata:student", "sinfo"
//插入数据
hbase(main):003:0> put 'bigdata:student','1001','sinfo:name','Alice'                                                                                                                                       
hbase(main):004:0> put 'bigdata:student','1001','sinfo:sex','F'                                                                                                                                        
hbase(main):005:0> put 'bigdata:student','1001','sinfo:age','23'                                                                                                                                     
hbase(main):006:0> put 'bigdata:student','1002','sinfo:name','Bob'                                                                                                                                     
hbase(main):007:0> put 'bigdata:student','1002','sinfo:sex','M'                                                                                                                                  
hbase(main):008:0> put 'bigdata:student','1002','sinfo:age','22'                                                                                                                                  
hbase(main):009:0> put 'bigdata:student','1003','sinfo:name','Caroline'                                                                                                                                   
hbase(main):010:0> put 'bigdata:student','1003','sinfo:sex','F'                                                                                                                                 
hbase(main):011:0> put 'bigdata:student','1003','sinfo:age','24'      
//查看数据                                                                                                                             
hbase(main):012:0> scan "bigdata:student"
//退出
hbase(main):013:0> quit

//统计student表中数据数量
[hadoop@hadoop101 hbase-2.3.7]$ /opt/module/hadoop-3.1.3/bin/yarn jar lib/hbase-mapreduce-2.3.7.jar rowcounter bigdata:student

HBASE的基本操作实验目的实验总结 hbase实验报告_hbase_11

案例2：将本地数据存入hbase表

//创建本地数据
[hadoop@hadoop101 hbase-2.3.7]$ vim fruit.tsv

注意，文件中是以\t（tab）为分隔符的。

1001    Apple   Red
1002    Pear    Yellow
1003    Pineapple       Yellow

//把fruit.tsv文件上传到hdfs中的/fruit文件夹下
[hadoop@hadoop101 hbase-2.3.7]$ hdfs dfs -mkdir -p /fruit
[hadoop@hadoop101 hbase-2.3.7]$ hdfs dfs -put fruit.tsv /fruit

我们可以到http://hadoop101:9870 上查看。

HBASE的基本操作实验目的实验总结 hbase实验报告_hbase_12

//在hbase中创建fruit表
hbase(main):001:0> create "bigdata:fruit", "info"

//将本地数据存入hbase表中
[hadoop@hadoop101 hbase-2.3.7]$ /opt/module/hadoop-3.1.3/bin/yarn jar lib/hbase-mapreduce-2.3.7.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color bigdata:fruit hdfs://hadoop101:8020/fruit/fruit.tsv

我们再来到hbase中查看，存入成功。

案例3：将表中数据通过自定义mapreduce放入hbase表中

需求：将以下 fruit 表中的一部分数据，通过 mapreduce 迁入到 hbase 表中。

在API编程实现的基础上，在pom.xml中导入如下依赖：

<!--hbase与mapreduce依赖-->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-mapreduce</artifactId>
            <version>${hbase-version}</version>
        </dependency>

这里我们介绍两种实现方式。

//第一种实现方式
public class HdfsToHbseMr1 {
    //Mapper阶段
    public static class FruitToMrMapper extends Mapper<LongWritable, Text,LongWritable,Text> {
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            //写出
            context.write(key,value);
        }
    }
    //Reducer阶段
    public static class FruitToMrReducer extends TableReducer<LongWritable,Text, NullWritable>{
        @Override
        protected void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            for (Text value : values) {
                String[] split = value.toString().split("\t");
                //创建Put对象
                Put put=new Put(Bytes.toBytes(split[0]));
                //给put赋值
                put.addColumn(Bytes.toBytes("info"),Bytes.toBytes("name"),Bytes.toBytes(split[1]));
                put.addColumn(Bytes.toBytes("info"),Bytes.toBytes("color"),Bytes.toBytes(split[2]));
                //写出去
                context.write(NullWritable.get(),put);
            }
        }
    }
    //Driver阶段
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //创建配置对象
        Configuration conf=new Configuration();
        //获取job对象
        Job job = Job.getInstance(conf);
        //设置jar位置
        job.setJarByClass(HdfsToHbseMr1.class);
        //设置Mapper
        job.setMapperClass(FruitToMrMapper.class);
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(Text.class);
        //设置Reducer
        TableMapReduceUtil.initTableReducerJob(
                args[1],
                FruitToMrReducer.class,
                job
        );
        //设置输入路径
        FileInputFormat.setInputPaths(job,new Path(args[0]));
        //提交
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);
    }
}

//第二种实现方式
public class HdfsToHbseMr2 {
    //Mapper阶段
    public static class FruitMr2HbaseMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            //获取hdfs中一行数据:1001 Apple  Red
            String line=new String(value.getBytes(),0,value.getLength(),"UTF-8");
            //切割
            String[] split = line.split("\t");
            //创建对象ImmutableBytesWritable
            ImmutableBytesWritable k=new ImmutableBytesWritable(Bytes.toBytes(split[0]));
            //创建Put对象
            Put v=new Put(Bytes.toBytes(split[0]));
            //给put赋值
            v.addColumn(Bytes.toBytes("info"),Bytes.toBytes("name"),Bytes.toBytes(split[1]));
            v.addColumn(Bytes.toBytes("info"),Bytes.toBytes("color"),Bytes.toBytes(split[2]));
            //输出
            context.write(k,v);
        }
    }
    //Reducer阶段
    public static class FruitMr2HbaseReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable>{
        @Override
        protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
            for (Put put : values) {
                context.write(NullWritable.get(),put);
            }
        }
    }
    //Driver阶段
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //创建配置对象
        Configuration conf=new Configuration();
        //获取job对象
        Job job = Job.getInstance(conf);
        //设置jar位置
        job.setJarByClass(HdfsToHbseMr2.class);
        //设置Mapper
        job.setMapperClass(FruitMr2HbaseMapper.class);
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        job.setMapOutputValueClass(Put.class);
        //设置Reducer
        TableMapReduceUtil.initTableReducerJob(
                args[1],
                FruitMr2HbaseReducer.class,
                job
        );
        //设置输入路径
        FileInputFormat.setInputPaths(job,new Path(args[0]));
        //提交
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);
    }
}

打包后上传到hbase-2.3.7目录下。在pom.xml中导入打包插件，注意更改类所在的位置噢！

<build>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin </artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>com.hbase.HdfsToHbseMr2</mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

执行：

[hadoop@hadoop101 hbase-2.3.7]$ yarn jar hdfstohbase1.jar com.hbase.HdfsToHbseMr1 /fruit/fruit.tsv bigdata:fruit1

案例4：查询数据并插入新表

需求：把bigdata:fruit中数据有关name列查询出来存入到hbase中的bigdata:fruit3表中

public class HbaseMrHbase {
    //Mapper阶段
    public static class FruitHbaseMrMapper extends TableMapper<ImmutableBytesWritable, Put> {
        @Override
        protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
            //创建Put对象
            Put v=new Put(key.get());
            for (Cell cell : value.rawCells()) {
                //判断当前的cell是否是name列限定符
                if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
                    //添加name的值
                    v.add(cell);
                }
            }
            //输出
            context.write(key,v);
        }
    }
    //Reducer阶段
    public static class FruitHbaseMrReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable>{
        @Override
        protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
            for (Put put : values) {
                context.write(NullWritable.get(),put);
            }
        }
    }
    //Driver阶段
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //创建配置对象
        Configuration conf=new Configuration();
        //获取job对象
        Job job = Job.getInstance(conf);
        //设置jar位置
        job.setJarByClass(HbaseMrHbase.class);
        //设置Mapper
        TableMapReduceUtil.initTableMapperJob(
                args[0],
                new Scan(),
                FruitHbaseMrMapper.class,
                ImmutableBytesWritable.class,
                Put.class,
                job
        );
        //设置Reducer
        TableMapReduceUtil.initTableReducerJob(
                args[1],
                FruitHbaseMrReducer.class,
                job
        );
        //提交
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);
    }
}

hbase优化

高可用

在 HBase 中 HMaster 负责监控 HRegionServer 的生命周期，均衡 RegionServer 的负载，如果 HMaster 挂掉了，那么整个 HBase 集群将陷入不健康的状态，并且此时的工作状态并不会维持太久。所以 HBase 支持对 HMaster 的高可用配置。

//关闭hbase
[hadoop@hadoop101 hbase-2.3.7]$ bin/
//在 conf 目录下创建 backup-masters 文件，注意backup-masters不能改变
[hadoop@hadoop101 hbase-2.3.7]$ vim conf/backup-masters

hadoop102
hadoop103

//分发
[hadoop@hadoop101 hbase-2.3.7]$ xsync conf/backup-masters
//重启
[hadoop@hadoop101 hbase-2.3.7]$ bin/

查看进程，启动成功。

HBASE的基本操作实验目的实验总结 hbase实验报告_mapreduce_13

预分区

每一个 region 维护着 StartRow 与 EndRow，如果加入的数据符合某个 Region 维护的RowKey 范围，则该数据交给这个 Region 维护。那么依照这个原则，我们可以将数据所要投放的分区提前大致的规划好，以提高 HBase 性能。

//手动设定预分区
hbase(main):001:0> create "bigdata:jk","sinfo","partition1",SPLITS=>['1000','2000','3000','4000']
//生成 16 进制序列预分区
hbase(main):002:0> create "bigdata:jk2","sinfo",'partition2',{NUMREGIONS=>15,SPLITALGO=>'HexStringSplit'}

//创建文件实现预分区
[hadoop@hadoop101 hbase-2.3.7]$ vim splits.txt

aa
bb
cc
dd

hbase(main):001:0> create "bigdata:jk3","sinfo",'partition4',SPLITS_FILE => 'splits.txt'

我们可以来到浏览器上查看：

HBASE的基本操作实验目的实验总结 hbase实验报告_hadoop_14

HBASE的基本操作实验目的实验总结 hbase实验报告_HBASE的基本操作实验目的实验总结_15

HBASE的基本操作实验目的实验总结 hbase实验报告_mapreduce_16

HBASE的基本操作实验目的实验总结 hbase实验报告_大数据_17

统一时间

在三台客户机下分别运行下列命令：

[hadoop@hadoop101 hbase-2.3.7]$ sudo ntpdate

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：android 声波律动声波震动音乐

下一篇：从程序员到架构师下载程序员转架构

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯