hbase说白了就是数据库,那么数据库一般都有增、删、改、查操作,我们下面就通过hbase shell 和java API ,对比看看他们分别是怎么操作hbase的。
[root@dev-02 bin]# ./start-hbase.sh
[zhangshk@fonova-hadoop1 ~]$ hbase shell
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
17/12/17 12:10:46 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.0.0-cdh5.5.2, rUnknown, Mon Jan 25 16:27:11 PST 2016
hbase(main):012:0> list
java API连接hbase的代码为:
package com.zhangshk;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.HBaseAdmin;
public class TestConnectionToHbase {
public static Configuration conf = HBaseConfiguration.create();
public static void main(String[] args) throws Exception{
HBaseAdmin hBaseAdmin = new HBaseAdmin(conf);
TableName[] tableNames = hBaseAdmin.listTableNames();
for (TableName tableName:
tableNames) {
hbase shell 创建一张表:
我们可以先看一下create的语法: hbase shell命令行下输入create
hbase(main):030:0> create
Create a table with namespace=ns1 and table qualifier=t1
hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}
Create a table with namespace=default and table qualifier=t1
hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
hbase> # The above in shorthand would be the following:
hbase> create 't1', 'f1', 'f2', 'f3'
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
Table configuration options can be put at the end.
hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
hbase> # Optionally pre-split the table into NUMREGIONS, using
hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
You can also keep around a reference to the created table:
hbase> t1 = create 't1', 'f1'
Which gives you a reference to the table named 't1', on which you can then
call methods.
语法十分的详细,我们下面开始创建一张表,表明为zhangshk:tb9, columnfamily为info
hbase(main):014:0* create 'zhangshk:tb9','info'
0 row(s) in 0.4140 seconds
=> Hbase::Table - zhangshk:tb9
hbase(main):019:0> exists 'zhangshk:tb9'
Table zhangshk:tb9 does exist
0 row(s) in 0.0200 seconds
注意:hbase shell创建表的时候必须至少需要制定column family 而java api可以不需要指定,只创建一个空表。
public static void main(String[] args) throws Exception {
HBaseAdmin hBaseAdmin = new HBaseAdmin(conf);
* 创建一张hbase表,并返回表名
* @param tableName
* @return
public static String createTable(HBaseAdmin hBaseAdmin,String tableName)throws Exception{
if (!hBaseAdmin.tableExists(tableName)){
hBaseAdmin.createTable(new HTableDescriptor(tableName));
return tableName;
hbase(main):034:0> create 'zhangshk:tb11'
hbase(main):035:0> put
ERROR: wrong number of arguments (0 for 4)
Here is some help for this command:
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates. To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:
hbase> put 'ns1:t1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value', ts1
hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:
hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase(main):038:0> put 'zhangshk:tb9','10002','info:age','22'
0 row(s) in 0.0170 seconds
hbase(main):039:0> put 'zhangshk:tb9','10003','info:name','zhangshk'
0 row(s) in 0.0130 seconds
hbase(main):040:0> put 'zhangshk:tb9','10002','info:name','zhangshk'
0 row(s) in 0.0120 seconds
hbase(main):041:0> put 'zhangshk:tb9','10002','info:sex','male'
0 row(s) in 0.0120 seconds
hbase(main):043:0> scan 'zhangshk:tb9'
10001 column=info:age, timestamp=1513486347992, value=10
10002 column=info:age, timestamp=1513487143038, value=22
10002 column=info:name, timestamp=1513487176361, value=zhangshk
10002 column=info:sex, timestamp=1513487191856, value=male
10003 column=info:name, timestamp=1513487165412, value=zhangshk
3 row(s) in 0.0130 seconds
如果用java API put数据的方式为:
public static void main(String[] args) throws Exception {
HBaseAdmin hBaseAdmin = new HBaseAdmin(conf);
HTable hTable = new HTable(conf,"zhangshk:tb9");
* 插入数据到hbase表中
* @param htable
* @throws Exception
public static void putData(HTable htable) throws Exception{
Put put = new Put(Bytes.toBytes("10003"));
hbase(main):047:0> scan 'zhangshk:tb9'
10001 column=info:age, timestamp=1513486347992, value=10
10002 column=info:age, timestamp=1513487143038, value=22
10002 column=info:name, timestamp=1513487176361, value=zhangshk
10002 column=info:sex, timestamp=1513487191856, value=male
10003 column=info:age, timestamp=1513491759194, value=18
10003 column=info:name, timestamp=1513491742964, value=zhangshk
3 row(s) in 0.0180 seconds
scan 和get都可以查询数据,scan是全表扫描或者范围扫描, 所以我们一般不会用这种方式查询数据,而是用get的方式,通过添加条件,这样查询就比较高效。
先来看看scan的hbase shell 使用说明:
hbase(main):048:0> scan
Here is some help for this command:
Scan a table; pass table name and optionally a dictionary of scanner
specifications. Scanner specifications may include one or more of:
If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
The filter can be specified in two ways:
1. Using a filterString - more information on this is available in the
Filter Language document attached to the HBASE-4176 JIRA
2. Using the entire package name of the filter.
Some examples:
hbase> scan 'hbase:meta'
hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
hbase> scan 't1', {REVERSED => true}
hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
(QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"}
hbase> scan 't1', {FILTER =>
org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
hbase> scan 't1', {CONSISTENCY => 'TIMELINE'}
For setting the Operation Attributes
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false). By
default it is enabled. Examples:
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default. Example:
hbase> scan 't1', {RAW => true, VERSIONS => 10}
Besides the default 'toStringBinary' format, 'scan' supports custom formatting
by column. A user can define a FORMATTER by adding it to the column name in
the scan specification. The FORMATTER can be stipulated:
1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt',
'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }
Note that you can specify a FORMATTER by column only (cf:qualifier). You cannot
specify a FORMATTER for all columns of a column family.
Scan can also be used directly from a table, by first getting a reference to a
table, like such:
hbase> t = get_table 't'
hbase> t.scan
Note in the above situation, you can still provide all the filtering, columns,
options, etc as described above.
hbase(main):051:0> scan 'zhangshk:tb9'
10001 column=info:age, timestamp=1513486347992, value=10
10002 column=info:age, timestamp=1513487143038, value=22
10002 column=info:name, timestamp=1513487176361, value=zhangshk
10002 column=info:sex, timestamp=1513487191856, value=male
10003 column=info:age, timestamp=1513491759194, value=18
10003 column=info:name, timestamp=1513491742964, value=zhangshk
3 row(s) in 0.0170 seconds
hbase(main):054:0> scan 'zhangshk:tb9',{LIMIT=>2}
10001 column=info:age, timestamp=1513486347992, value=10
10002 column=info:age, timestamp=1513487143038, value=22
10002 column=info:name, timestamp=1513487176361, value=zhangshk
10002 column=info:sex, timestamp=1513487191856, value=male
2 row(s) in 0.0140 seconds
hbase(main):055:0> scan 'zhangshk:tb9',{LIMIT=>2,STARTROW=>'10002'}
10002 column=info:age, timestamp=1513487143038, value=22
10002 column=info:name, timestamp=1513487176361, value=zhangshk
10002 column=info:sex, timestamp=1513487191856, value=male
10003 column=info:age, timestamp=1513491759194, value=18
10003 column=info:name, timestamp=1513491742964, value=zhangshk
2 row(s) in 0.0250 seconds
hbase(main):057:0> scan 'zhangshk:tb9',{LIMIT=>2,STARTROW=>'10002',COLUMNS=>'info:age'}
10002 column=info:age, timestamp=1513487143038, value=22
10003 column=info:age, timestamp=1513491759194, value=18
2 row(s) in 0.0300 seconds
hbase(main):058:0> scan 'zhangshk:tb9',{LIMIT=>2,STARTROW=>'10002',STOPROW=>'10003',COLUMNS=>'info:age'}
10002 column=info:age, timestamp=1513487143038, value=22
1 row(s) in 0.0200 seconds
java API操作
public static void main(String[] args) throws Exception {
HBaseAdmin hBaseAdmin = new HBaseAdmin(conf);
HTable hTable = new HTable(conf,"zhangshk:tb9");
* 通过scan ,全表扫描数据
* @param hTable
* @throws Exception
public static void scanTable(HTable hTable)throws Exception{
Scan scan = new Scan();
ResultScanner scanner = hTable.getScanner(scan);
Result results = scanner.next();
for (Cell cell:
results.rawCells()) {
Bytes.toString(CellUtil.cloneFamily(cell)) + "->" + Bytes.toString(CellUtil.cloneQualifier(cell))
+ "->" + Bytes.toString(CellUtil.cloneValue(cell)) + "->" + cell.getTimestamp());
和hbase shell 查询结果一致。
hbase(main):059:0> get
Here is some help for this command:
Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples:
hbase> get 'ns1:t1', 'r1'
hbase> get 't1', 'r1'
hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
hbase> get 't1', 'r1', {COLUMN => 'c1'}
hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
hbase> get 't1', 'r1', 'c1'
hbase> get 't1', 'r1', 'c1', 'c2'
hbase> get 't1', 'r1', ['c1', 'c2']
hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'}
hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
Besides the default 'toStringBinary' format, 'get' also supports custom formatting by
column. A user can define a FORMATTER by adding it to the column name in the get
specification. The FORMATTER can be stipulated:
1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
hbase> get 't1', 'r1' {COLUMN => ['cf:qualifier1:toInt',
'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }
Note that you can specify a FORMATTER by column only (cf:qualifier). You cannot specify
a FORMATTER for all columns of a column family.
The same commands also can be run on a reference to a table (obtained via get_table or
create_table). Suppose you had a reference t to table 't1', the corresponding commands
would be:
hbase> t.get 'r1'
hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
hbase> t.get 'r1', {COLUMN => 'c1'}
hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
hbase> t.get 'r1', 'c1'
hbase> t.get 'r1', 'c1', 'c2'
hbase> t.get 'r1', ['c1', 'c2']
hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE'}
hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
hbase shell:
hbase(main):063:0> get 'zhangshk:tb9','10003',{COLUMN=>'info:age'}
info:age timestamp=1513491759194, value=18
1 row(s) in 0.0080 seconds
java API:
public static void main(String[] args) throws Exception {
HBaseAdmin hBaseAdmin = new HBaseAdmin(conf);
HTable hTable = new HTable(conf,"zhangshk:tb9");
public static void getData(HTable hTable) throws Exception{
Get get = new Get(Bytes.toBytes("10003"));
Result result = hTable.get(get);
Cell[] cells = result.rawCells();
for (Cell cell:
cells) {
结果和hbase shell一样
update ,在hbase中 可以使用put来实现,
hbase(main):064:0> put 'zhangshk:tb9','10003','info:name','zhangshk_update'
0 row(s) in 0.0160 seconds
hbase(main):065:0> get 'zhangshk:tb9','10003'
info:age timestamp=1513491759194, value=18
info:name timestamp=1513493686439, value=zhangshk_update
2 row(s) in 0.0090 seconds
DELETE ,最后讲讲删除操作。
delete有 两个命令 ,一个是delete ,一个是deleteall 我们先来看看delete:
hbase(main):068:0> delete
Here is some help for this command:
Put a delete cell value at specified table/row/column and optionally
timestamp coordinates. Deletes must match the deleted cell's
coordinates exactly. When scanning, a delete cell suppresses older
versions. To delete a cell from 't1' at row 'r1' under column 'c1'
marked with the time 'ts1', do:
hbase> delete 'ns1:t1', 'r1', 'c1', ts1
hbase> delete 't1', 'r1', 'c1', ts1
hbase> delete 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same command can also be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:
hbase> t.delete 'r1', 'c1', ts1
hbase> t.delete 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
我们可以看到delete 的时候必须要指定timestamp,但是一般来说我们都不知道timestamp
hbase(main):071:0> deleteall
Here is some help for this command:
Delete all cells in a given row; pass a table name, row, and optionally
a column and timestamp. Examples:
hbase> deleteall 'ns1:t1', 'r1'
hbase> deleteall 't1', 'r1'
hbase> deleteall 't1', 'r1', 'c1'
hbase> deleteall 't1', 'r1', 'c1', ts1
hbase> deleteall 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:
hbase> t.deleteall 'r1'
hbase> t.deleteall 'r1', 'c1'
hbase> t.deleteall 'r1', 'c1', ts1
hbase> t.deleteall 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
hbase(main):073:0> deleteall 'zhangshk:tb9','10003'
0 row(s) in 0.0320 seconds
hbase(main):074:0> scan 'zhangshk:tb9'
10001 column=info:age, timestamp=1513486347992, value=10
10002 column=info:age, timestamp=1513487143038, value=22
10002 column=info:name, timestamp=1513487176361, value=zhangshk
10002 column=info:sex, timestamp=1513487191856, value=male
2 row(s) in 0.0130 seconds
java API中就没有delete 和deleteall 的概念 ,只有一个delete ,应该是把他们统一起来了。
public static void deleteData(HTable hTable) throws Exception{
Delete delete = new Delete(Bytes.toBytes("10002"));
hbase(main):075:0> scan 'zhangshk:tb9'
10001 column=info:age, timestamp=1513486347992, value=10
1 row(s) in 0.0190 seconds
我们只讲了删除了 ,那么要做就做的不留痕迹,我们只是删除了数据,那怎么删除表呢,对于hbase中的表我们需要先disable表,之后再执行删除操作,删除我们使用的是drop关键字,用来删除整张表。
hbase(main):076:0> disable
Here is some help for this command:
Start disable of named table:
hbase> disable 't1'
hbase> disable 'ns1:t1'
hbase(main):077:0> disable 'zhangshk:tb9'
0 row(s) in 1.2640 seconds
hbase(main):080:0> drop 'zhangshk:tb9'
0 row(s) in 0.3400 seconds
hbase(main):081:0> describe 'zhangshk:tb9'
ERROR: Unknown table zhangshk:tb9!
java API 删除表:
* 删除表
* @param hBaseAdmin
* @throws Exception
public static void deleteTable(HBaseAdmin hBaseAdmin) throws Exception{
hbase(main):084:0* exit
[zhangshk@fonova-hadoop1 hbaseTest-1.0-SNAPSHOT]$
停掉hbase 进程
[zhangshk@fonova-hadoop1 ~]$ sh stop-hbase.sh
stopping hbase........
package com.zhangshk;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
public class TestConnectionToHbase {
public static Configuration conf = HBaseConfiguration.create();
public static void main(String[] args) throws Exception {
HBaseAdmin hBaseAdmin = new HBaseAdmin(conf);
HTable hTable = new HTable(conf,"zhangshk:tb9");
* 创建一张hbase表,并返回表名
* @param tableName
* @return
public static String createTable(HBaseAdmin hBaseAdmin,String tableName)throws Exception{
if (!hBaseAdmin.tableExists(tableName)){
hBaseAdmin.createTable(new HTableDescriptor(tableName).addFamily(new HColumnDescriptor("info")));
return tableName;
* 删除表
* @param hBaseAdmin
* @throws Exception
public static void deleteTable(HBaseAdmin hBaseAdmin) throws Exception{
* 插入数据到hbase表中
* @param htable
* @throws Exception
public static void putData(HTable htable) throws Exception{
Put put = new Put(Bytes.toBytes("10003"));
* 通过scan ,全表扫描数据
* @param hTable
* @throws Exception
public static void scanTable(HTable hTable)throws Exception{
Scan scan = new Scan();
ResultScanner scanner = hTable.getScanner(scan);
Result results = scanner.next();
for (Cell cell:
results.rawCells()) {
Bytes.toString(CellUtil.cloneFamily(cell)) + "->" + Bytes.toString(CellUtil.cloneQualifier(cell))
+ "->" + Bytes.toString(CellUtil.cloneValue(cell)) + "->" + cell.getTimestamp());
* get方式获取数据
* @param hTable
* @throws Exception
public static void getData(HTable hTable) throws Exception{
Get get = new Get(Bytes.toBytes("10003"));
Result result = hTable.get(get);
Cell[] cells = result.rawCells();
for (Cell cell:
cells) {
* 删除数据
* @param hTable
* @throws Exception
public static void deleteData(HTable hTable) throws Exception{
Delete delete = new Delete(Bytes.toBytes("10002"));