hbase版本 janusgraph hbase nonode for /hbase/master

转载

技术领航探索者 2023-09-01 11:09:56

文章标签 hbase版本 janusgraph hbase hadoop 数据 xml 文章分类 Hbase 数据库

hbase编程指南

@(HBASE)[hbase, 大数据]

hbase编程指南
一概述

一创建项目

1pomxml
2在eclipse中运行的注意事项
3关于addResource的说明

二Best Practices
三常用API

一创建Configuration及Connection对象
二表管理

1创建表
2判断表是否存在
3删除表

三插入数据

1插入单条数据
2使用缓存

四读取数据单个数据和一批数据

1遍历返回数据的方法

五append数据
六扫描表
七更改表结构

四常见异常

1javaioIOException No FileSystem for scheme hdfs
2UnknownHostException
3NoSuchMethodErroraddFamily
4SASL authentication failed

本文示范了如何创建表，删除表，更改表结构，还有put, get, scan等操作。

完整代码请见：
https://github.com/lujinhong/hbasecommons

一、概述

（一）创建项目

1、pom.xml

pom.xml中除了hbase以外，还需要添加hadoop相关的依赖：

<dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>1.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.5.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.5.0</version>
    </dependency>

          <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-client</artifactId>
         <version>2.5.0</version>
    </dependency>

2、在eclipse中运行的注意事项

将hadoop/hbase的配置文件加入classpath中。

3、关于addResource的说明

（1）一般情况下，以如下方式加载hbase的配置文件：

Configuration Connfiguration = HBaseConfiguration.create();
Connfiguration.addResource(new Path(HbaseSiteXml));
Connection = ConnectionFactory.createConnection(Connfiguration);

（2）如果没有使用Configuration对象，则只会加载classpath中的hbase-site.xml。
（3）如果使用String作为参数，则此hbase-site.xml必须在classpath中：

Connfiguration.addResource("/home/hadoop/hbase/hbase-site.xml");

（4）如果需要加载不在classpath中的hbase-site.xml，则需要使用Path对象：

Connfiguration.addResource(new Path(HbaseSiteXml));

注意这里的Path 不是java的Path，面是Hadoop的Path。

二、Best Practices

Connection是非常heavy的，但线程安全，一般而言，一个应用（严格来说应该是一个JVM）创建一个连接即可。如果实在需要创建多个，可以考虑pool，但一般不需要。
Table，Admin, Scanner都是lighweigh的，但非线程安全。
上面几个接口都必须记住close，但connection应该作好封装，避免频繁关闭与创建。另外，这几个接口都是AutoClosable的，可以使用新的try语法。
Use BufferedMutator for streaming / batch Puts. BufferedMutator replaces HTable.setAutoFlush(false) and is the supported high-performance streaming writes API.
在编译中必须使用CDH的jar包，而不是apache的，CDH为了兼容性做了一些修改，详最后一节。
建表时指定压缩方式，这会消耗一点CPU时间，但比起从磁盘读取数据所节省的时间，它基本可以忽略不计。create 'ljhtest2',{NAME => 'f1', COMPRESSION => 'SNAPPY'}
必须使用预分区。
对于MR等实时要求不高，或者数据量太大的情况下，加上scan/get.setCacheBlocks(false)。

三、常用API

（一）创建Configuration及Connection对象

在客户端中连接hbase，首先需要创建一个Connection对象，然后就可以使用connection获取Table, Admin, Scanner等对象，进行相应的操作。

Configuration config = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(config);

基本方法如上：
（1）创建一个Configuration对象，它会从classpath中查找hadoop/hbase的配置文件。
（2）创建Connection对象。

正如上面所如，创建connection是一个很heavy的操作，应该谨慎使用。最好将其封装在一个方法中getConnection()的方法中返回，而不是直接创建，同时考虑使用单例模式，如：

private static Connection connection = null;

private HBaseHelper(Configuration conf) throws IOException {
    configuration = conf;
    connection = ConnectionFactory.createConnection(configuration);
    this.admin = connection.getAdmin();
}

/*
 * 用于获取一个HBaseHelper对象的入口，需要提供一个Configuration对象，这个配置主要指定hbase-site.xml与core-
 * site.xml。
 * 使用单例，保证只创建一个helper，因为每创建一个connection都是高代价的，如果需要多个连接，请使用Pool。
 */
public static HBaseHelper getHelper(Configuration configuration) throws IOException {
    if (helper == null) {
        helper = new HBaseHelper(configuration);
    }
    return helper;
}

然后通过getConnection()方法获取到connection对象：

public Configuration getConfiguration() {
    return configuration;
}

由于HBaseHelper是单例对象，因此其成员变量也是只有一个的。
同时提供一个close()用于关掉Connection对象，因为如果用户随意关闭了connection，会导致需要经常重新创建Connection对象：

@Override
public void close() throws IOException {
    admin.close();
    connection.close();
}

这个方法只会在整个应用关闭后才应该调用，比如某些框架的cleanUp()方法等，一般情况下只要应用程序还在运行就不应该调用这个方法。

由于HBaseHelper实现了较多功能，所以这里将HBaseHelper设为单例，如果只需要将Connection设为单例也是可以的，此时代码相对简单。

private static Connection connection = null;

public static Connection getConnection(Configuration config) throws IOException {
    if (connection == null) {
        connection = ConnectionFactory.createConnection(config);
    }
    return connection;
}

（二）表管理

1、创建表

创建表的完整应用如下：

public void createTable(TableName table, int maxVersions, byte[][] splitKeys, String... colfams)
        throws IOException {
    HTableDescriptor desc = new HTableDescriptor(table);
    for (String cf : colfams) {
        HColumnDescriptor coldef = new HColumnDescriptor(cf);
        coldef.setMaxVersions(maxVersions);
        desc.addFamily(coldef);
    }
    if (splitKeys != null) {
        admin.createTable(desc, splitKeys);
    } else {
        admin.createTable(desc);
    }
}

几个参数的意思分别为表名，最多保留多少个版本，用于预分区的keys，family的名称。

使用预分区创建表，形式如byte[][] splits = new byte[][]{Bytes.toBytes(“row2000id”),Bytes.toBytes(“row4000id”),Bytes.toBytes(“row6000id”),Bytes.toBytes(“row8000id”)};

同时还应封装将见的应用方式：

public void createTable(String table, String... colfams) throws IOException {
    createTable(TableName.valueOf(table), 1, null, colfams);
}

public void createTable(TableName table, String... colfams) throws IOException {
    createTable(table, 1, null, colfams);
}

public void createTable(String table, int maxVersions, String... colfams) throws IOException {
    createTable(TableName.valueOf(table), maxVersions, null, colfams);
}

public void createTable(TableName table, int maxVersions, String... colfams) throws IOException {
    createTable(table, maxVersions, null, colfams);
}

public void createTable(String table, byte[][] splitKeys, String... colfams) throws IOException {
    createTable(TableName.valueOf(table), 1, splitKeys, colfams);
}

关键步骤为：
（1）获取一个Admin对象，用于管理表。这个对象在HBaseHelper中创建了，所以这里就不创建了。
（2）创建一个HTableDescriptor对象，表示一个表，但这个表还不存在。与下面的Table类对比。这个对象还可以设置很多属性，如压缩格式，文件大小等。
（3）判断表是否已经存在，若存在的话，先disable, 然后delete。
（4）创建表。

Admin, HTableDescriptor对象都是轻量级的，只要有需要就可以创建，

2、判断表是否存在

public boolean existsTable(String table) throws IOException {
    return existsTable(TableName.valueOf(table));
}

public boolean existsTable(TableName table) throws IOException {
    return admin.tableExists(table);
}

其实上面的代码就是直接调用hbase API的tableExists()方法，但不需要每次重新创建admin对象等。

3、删除表

public void disableTable(String table) throws IOException {
    disableTable(TableName.valueOf(table));
}

public void disableTable(TableName table) throws IOException {
    admin.disableTable(table);
}

public void dropTable(String table) throws IOException {
    dropTable(TableName.valueOf(table));
}

public void dropTable(TableName table) throws IOException {
    if (existsTable(table)) {
        if (admin.isTableEnabled(table))
            disableTable(table);
        admin.deleteTable(table);
    }
}

（三）插入数据

1、插入单条数据

下面定义了各种常见的put方式，最后一种其实并不常用。

public void put(String table, String row, String fam, String qual, String val) throws IOException {
    put(TableName.valueOf(table), row, fam, qual, val);
}

public void put(TableName table, String row, String fam, String qual, String val) throws IOException {
    Table tbl = connection.getTable(table);
    Put put = new Put(Bytes.toBytes(row));
    put.addColumn(Bytes.toBytes(fam), Bytes.toBytes(qual), Bytes.toBytes(val));
    tbl.put(put);
    tbl.close();
}

public void put(String table, String row, String fam, String qual, long ts, String val) throws IOException {
    put(TableName.valueOf(table), row, fam, qual, ts, val);
}

public void put(TableName table, String row, String fam, String qual, long ts, String val) throws IOException {
    Table tbl = connection.getTable(table);
    Put put = new Put(Bytes.toBytes(row));
    put.addColumn(Bytes.toBytes(fam), Bytes.toBytes(qual), ts, Bytes.toBytes(val));
    tbl.put(put);
    tbl.close();
}

public void put(String table, String[] rows, String[] fams, String[] quals, long[] ts, String[] vals)
        throws IOException {
    put(TableName.valueOf(table), rows, fams, quals, ts, vals);
}

public void put(TableName table, String[] rows, String[] fams, String[] quals, long[] ts, String[] vals)
        throws IOException {
    Table tbl = connection.getTable(table);
    for (String row : rows) {
        Put put = new Put(Bytes.toBytes(row));
        for (String fam : fams) {
            int v = 0;
            for (String qual : quals) {
                String val = vals[v < vals.length ? v : vals.length - 1];
                long t = ts[v < ts.length ? v : ts.length - 1];
                System.out.println("Adding: " + row + " " + fam + " " + qual + " " + t + " " + val);
                put.addColumn(Bytes.toBytes(fam), Bytes.toBytes(qual), t, Bytes.toBytes(val));
                v++;
            }
        }
        tbl.put(put);
    }
    tbl.close();
}

这里每次put一个数据均会创建一个Table对象，然后close这个对象。虽然说这个对象是轻量级的，但如果发生一个循环里面，则不断的创建及destory对象还是会有较大的消耗的，这种情况应该考虑复用Table对象，或者使用下面介绍的缓存技术。

2、使用缓存

在hbase1.0.0以后，使用BufferedMutator处理缓存，这些数据会先在客户端中保存，直到缓冲区满了，或者是显示调用flush方法数据才会通过PRC请求发送至hbase。

/*
 * 将一系列的数据put进table的fam:qual中，由rows和vals来定义写入的数据，它们的长期必须相等。
 */
public  void put(String table, String[] rows, String fam, String qual, 
        String[] vals) throws IOException {
    if (rows.length != vals.length) {
        LOG.error("rows.lenght {} is not equal to val.length {}", rows.length, vals.length);
    }
    try (BufferedMutator mutator = connection.getBufferedMutator(TableName.valueOf(table));) {
        for (int i = 0; i < rows.length; i++) {
            Put p = new Put(Bytes.toBytes(rows[i]));
            p.addColumn(Bytes.toBytes(fam), Bytes.toBytes(qual), Bytes.toBytes(vals[i]));
            mutator.mutate(p);
            //System.out.println(mutator.getWriteBufferSize());
        }
        mutator.flush();

    }
}
public void put(String table, String[] rows, String fam, String qual, String[] vals) throws IOException {
    put(TableName.valueOf(table), rows, fam, qual, vals);
}

最后的输出是缓冲区大小，默认是2M，由参数hbase.client.write.buffer.决定。可以通过下面方法得到：

mutator.getWriteBufferSize()

怎样设置缓冲区大小呢？

（四）读取数据：单个数据和一批数据

/*
 * 获取table表中，所有rows行中的，fam:qual列的值。
 */
public Result get(String table, String row, String fam, String qual) throws IOException {
    return get(TableName.valueOf(table), new String[]{row}, new String[]{fam}, new String[]{qual})[0];
}

public Result get(TableName table, String row, String fam, String qual) throws IOException {
    return get(table, new String[]{row}, new String[]{fam}, new String[]{qual})[0];
}

public Result[] get(TableName table, String[] rows, String fam, String qual) throws IOException {
    return get(table, rows, new String[]{fam}, new String[]{qual});
}

public Result[] get(String table, String[] rows, String fam, String qual) throws IOException {
    return get(TableName.valueOf(table), rows, new String[]{fam}, new String[]{qual});
}

public Result[] get(String table, String[] rows, String[] fams, String[] quals) throws IOException {
    return get(TableName.valueOf(table), rows, fams, quals);
}

/*
 * 获取table表中，所有rows行中的，fams和quals定义的所有行。
 */
public Result[] get(TableName table, String[] rows, String[] fams, String[] quals) throws IOException {
    Table tbl = connection.getTable(table);
    List<Get> gets = new ArrayList<Get>();
    for (String row : rows) {
        Get get = new Get(Bytes.toBytes(row));
        get.setMaxVersions();
        if (fams != null) {
            for (String fam : fams) {
                for (String qual : quals) {
                    get.addColumn(Bytes.toBytes(fam), Bytes.toBytes(qual));
                }
            }
        }
        gets.add(get);
    }
    Result[] results = tbl.get(gets);

    tbl.close();
    return results;
}

1、遍历返回数据的方法

for (Result result : results) {
        for (Cell cell : result.rawCells()) {
            System.out.println("Cell: " + cell + ", Value: "
                    + Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
        }
    }

如果直接调用result.toString()，则只返回前面那部分，即cell，而没有value部分。

（五）append数据

append既可以给一行append新的一列，也可以给一列里面的内容append新的内容。

List<Append> appends = new ArrayList<Append>();
    for (int i = 0; i < lineCount;i = i + 1) {
        int r_int = rand.nextInt(tasks.size());
        String rowid = "1";
        Append append = new Append(Bytes.toBytes(rowid));
        append.add(Bytes.toBytes("cf"), Bytes.toBytes("qual_"), Bytes.toBytes("test" + i));
        //table.append(append);
        appends.add(append);
        System.out.println("appending: " + i);
    }
    table.batch(appends);
    table.close();

（六）扫描表

将表打印出来：

public void dump(String table) throws IOException {
    dump(TableName.valueOf(table));
}

public void dump(TableName table) throws IOException {
    try (Table t = connection.getTable(table); ResultScanner scanner = t.getScanner(new Scan())) {
        for (Result result : scanner) {
            dumpResult(result);
        }
    }
}

public void dumpResult(Result result) {
    for (Cell cell : result.rawCells()) {
        System.out.println("Cell: " + cell + ", Value: "
                + Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
    }
}

Scanner的另一种常见用法：

Scan scan = new Scan();
    scan.addFamily(Bytes.toBytes(family));
    Filter filter = new PrefixFilter(Bytes.toBytes(rowkeyPrefix));
    scan.setFilter(filter);
    ResultScanner scanner = table.getScanner(scan);

（七）更改表结构

//有问题，而且一般不建议在代码中更改表结构。
public static void modifySchema(Connection connection) throws IOException {
    try (Admin admin = connection.getAdmin()) {

        TableName tableName = TableName.valueOf(TABLE_NAME);
        if (!admin.tableExists(tableName)) {
            System.out.println("Table does not exist.");
            System.exit(-1);
        }

        HTableDescriptor table = new HTableDescriptor(tableName);

        // Update existing table
        HColumnDescriptor newColumn = new HColumnDescriptor("NEWCF");
        newColumn.setCompactionCompressionType(Algorithm.GZ);
        newColumn.setMaxVersions(HConstants.ALL_VERSIONS);
        admin.addColumn(tableName, newColumn);

        // Update existing column family
        HColumnDescriptor existingColumn = new HColumnDescriptor(FAMILY);
        existingColumn.setCompactionCompressionType(Algorithm.GZ);
        existingColumn.setMaxVersions(HConstants.ALL_VERSIONS);
        table.modifyFamily(existingColumn);
        admin.modifyTable(tableName, table);

        // Disable an existing table
        admin.disableTable(tableName);

        // Delete an existing column family
        admin.deleteColumn(tableName, FAMILY.getBytes("UTF-8"));

        // Delete a table (Need to be disabled first)
        admin.deleteTable(tableName);
    }
}

四、常见异常

1、java.io.IOException: No FileSystem for scheme: hdfs

解决方法：将hadoop相关的jar包添加至classpath中。

2、UnknownHostException

Caused by: java.net.UnknownHostException: logstreaming

上面的logstreaming是hdfs的集群URL，这里表示未能正确加载hadoop的配置。解决办法：

export HADOOP_CONF_DIR=/home/hadoop/conf_loghbase

3、NoSuchMethodError：addFamily

Exception in thread “main” java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)Lorg/apache/hadoop/hbase/HTableDescriptor;
at co.cask.hbasetest.HBaseTest.createTable(HBaseTest.java:34)
at co.cask.hbasetest.HBaseTest.doMain(HBaseTest.java:49)
at co.cask.hbasetest.HBaseTest.main(HBaseTest.java:67)

在1.0.0之后，apache hbase将addFamily的返回值从void改成了HTableDescriptor，但CDH没改，因此如果使用其中一个作编译，另一个作为运行环境，则会出现上述错误。
解决办法：
使用同一版本编译。
如果使用CDH，则需要添加：

<repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>

然后指定CDH的版本：

<dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>1.0.0-cdh5.4.5</version>
    </dependency>

4、SASL authentication failed.

出现以下错误，提示没有kinit。但事实上你已经kinit。其中一个原因是要使用

java -cp `hbase classpath`:yourjar.jar Main

来运行任务，不能用

java -cp `/home/hadoop/hbase/lib`:yourjar.jar Main 


Caused by: java.io.IOException: Could not set up IO Streams to gdc-dn152-formal.i.nease.net/10.160.254.123:60020
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:773)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:890)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:859)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1193)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:32627)
        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1583)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1125)
        at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:299)
        ... 9 more
Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:673)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:631)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:739)
        ... 19 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
        at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:605)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:154)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:731)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:728)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:728)
        ... 19 more

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。