1、HBase的名称空间
1.1介绍
1、HBase中的名称空间类似于mysql中不同的数据库,HBase中有两个已经存在的名称空间【default、hbase】
2、hbase用于存放系统表
3、用户创建表时,如果不指定名称空间,默认放到default下
1.2 操作
1、创建名称空间:create_namespace '名称空间名称'
2、查看名称空间:
查看所有的名称空间: list_namespace
查看某一个名称空间: describe_namespace '名称空间名称'
3、在指定名称空间下创建表:
create '名称空间:表名' ,'列族1'...
4、删除名称空间:
drop_namespace '名称空间'
如果对应空间下, 还有表, 无法删除, 必须先删除表
2、HBase的表设计
2.1、列族设计
能少则少, 能用一个解决的, 坚决不使用两个
官方建议: 一般列族的配置不大于 5个,支持非常多
本次陌陌案例采用一个列族来解决: C1
2.2、列族设计
如果存放的数据存在版本变更,可以设计版本,如果不存在,默认设置为1即可
2.3、压缩方案设计
1、由于数据 是写多 读少的场景, 基本上90%以上都是写操作, 而且数据量非常的大, 希望能够在有限的空间下, 存储更多的数据, 此时可以选修压缩比最高的: GZIP(GZ)
2、如果读的多, 而且数据量比较大, 可以采用 LZO 或者snappy
设置压缩方案的方式:
1、在创建表时指定压缩方案:
create '表名',{NAME='列族',COMPRESSION=>'压缩方案'}
2、给已经建好的表添加压缩方案:
alter '表名',{NAME='列族',COMPRESSION=>'压缩方案'}
案例:
create 'MOMO_CHAT:MSG',{NAME=>'C1',COMPRESSION=>'GZ'}
2.4、HBase的预分区
1、创建表的时候,默认只有一个region,一个region只能被一个RegionServer管理。考虑到一台机器的负载压力,可以在创建表的时候进行预分区,即表创建时就有多个region,这种方案即hbase的预分区。
2、预分区使用rowkey的范围进行划分,即startKey——endKey
3、设置预分区的方式:
方式一: 手动分区
create '表名' ,'列族1'... , SPLITS=>['1','2','3','4','5']
方式二: 通过读取一个外部的文件, 来划分region
create '表名','列族1' ...., SPLITS_FILE => '文件路径'
方式三: hash 16进制 分区方案
create '表名' ,'列族名称1', .... , {NUMREGIONS=>N , SPLITALGO=>'HexStringSplit'}
4、本次案例,采用hash进制分区方案,设置6个分区
create 'MOMO_CHAT:MSG' ,{NAME=>'C1',COMPRESSION=>'GZ'},{NUMREGIONS=>6 , SPLITALGO=>'HexStringSplit'}
2.5、HBase的rowkey设计原则
1) 避免使用递增行键/时序数据 当做rowkey的前缀
因为: 递增行键或者时序数据, 前面数字有可能是一成不变, 此时会出现数据热点问题(所有数据都跑到一个region中)
2) 避免rowkey和列的长度过大(长)
因为: 希望数据能够在内存中保留的越多, 读取的效率越高, 如果rowkey或者列设置比较长, 导致在有限内存中存储数据更小, 从而让数据提前的就flush磁盘上, 影响读取效率
建议: rowkey长度一般为 10~100字节左右 , 尽可能的越短越好
3) 使用Long类型比String类型更节省空间:
如果rowkey中都是数字, 建议使用Long获取其他数值类型
4) 保证rowkey的唯一性
避免热点问题的方法:
1) 反转策略: 比如说可以将手机号或者时间戳等 这种前面一样但是后面会呈现随机的数据, 进行反转工作,就可以保证rowkey的前缀都不尽相同, 从而让数据能够落在不同的region中
2) 加盐策略: 给rowkey前缀添加固定长度的随机数 , 来保证让数据落在不同region中
3) hash取模: 给相同的数据加上同样的盐, 从而保证相关联的数据都在一起, 也可以保证数据落在不同region中
2.6、本次案例rowkey设计方式
本次案例要求按照发件人和收件人账号查询数据,所以本次rowkey设计时需要加入发件人和收件人的账号,再通过hash之后,
能保证所有数据均匀落在不同的region中,也能保证同一发件人和收件人的数据落在同一个region中,以此来提高数据读取速度。
rowkey设计格式:HASH(MD5加密)_发件人账户_收件人账户_时间戳
2.7、案例准备工作
2.7.1、建表
create 'MOMO_CHAT:MSG' ,{NAME=>'C1',COMPRESSION=>'GZ'},{NUMREGIONS=>6 , SPLITALGO=>'HexStringSplit'}
2.7.2、环境准备
<repositories>
<repository>
<id>aliyun</id>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
<updatePolicy>never</updatePolicy>
</snapshots>
</repository>
</repositories>
<dependencies>
<!--Hbase 客户端-->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.1.0</version>
</dependency>
<!--poi包: 用于java读取Excel文件中数据包-->
<dependency>
<groupId>com.github.cloudecho</groupId>
<artifactId>xmlbean</artifactId>
<version>1.5.5</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.0.1</version>
</dependency>
<!--json数据包 : json本质上就是有一定格式字符串-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.62</version>
</dependency>
<!--Phoenix相关jar包 可以省略(如果报错)-->
<!--<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>5.0.0-HBase-2.0</version>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-queryserver-client</artifactId>
<version>5.0.0-HBase-2.0</version>
</dependency>-->
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<target>1.8</target>
<source>1.8</source>
</configuration>
</plugin>
</plugins>
</build>
2.7.3、准备数据
public class Gen {
public static void main(String[] args) throws Exception {
//1. 读取数据:
String xlxsPath = "D:\\传智工作\\上课\\北京大数据48期\\实时阶段课程\\day16_实时阶段_HBase\\资料\\陌陌海量消息存储案例\\测试数据集.xlsx";
Map<String, List<String>> resultMap = ExcelReader.readXlsx(xlxsPath, "陌陌数据");
//4. 写入到Hbase中:
//4.1: 根据Hbase的连接工厂, 创建Hbase的连接对象
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum","node1:2181,node2:2181,node3:2181");
Connection connection = ConnectionFactory.createConnection(conf);
//4.2: 根据连接对象, 获取管理对象: Table
Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
//3. 生成10w条数据
for(int i =0 ; i<100000 ; i++){
//2. 调用 randomRow方法, 随机生成一行数据
Msg rowData = randomRow(resultMap);
//4.3: 执行相关的操作: 添加数据
Put put = new Put(getRowkey(rowData));
put.addColumn("C1".getBytes(),"msg_time".getBytes(),rowData.getMsg_time().getBytes());
put.addColumn("C1".getBytes(),"sender_nickyname".getBytes(),rowData.getSender_nickyname().getBytes());
put.addColumn("C1".getBytes(),"sender_account".getBytes(),rowData.getSender_account().getBytes());
put.addColumn("C1".getBytes(),"sender_sex".getBytes(),rowData.getSender_sex().getBytes());
put.addColumn("C1".getBytes(),"sender_ip".getBytes(),rowData.getSender_ip().getBytes());
put.addColumn("C1".getBytes(),"sender_os".getBytes(),rowData.getSender_os().getBytes());
put.addColumn("C1".getBytes(),"sender_phone_type".getBytes(),rowData.getSender_phone_type().getBytes());
put.addColumn("C1".getBytes(),"sender_network".getBytes(),rowData.getSender_network().getBytes());
put.addColumn("C1".getBytes(),"sender_gps".getBytes(),rowData.getSender_gps().getBytes());
put.addColumn("C1".getBytes(),"receiver_nickyname".getBytes(),rowData.getReceiver_nickyname().getBytes());
put.addColumn("C1".getBytes(),"receiver_ip".getBytes(),rowData.getReceiver_ip().getBytes());
put.addColumn("C1".getBytes(),"receiver_account".getBytes(),rowData.getReceiver_account().getBytes());
put.addColumn("C1".getBytes(),"receiver_os".getBytes(),rowData.getReceiver_os().getBytes());
put.addColumn("C1".getBytes(),"receiver_phone_type".getBytes(),rowData.getReceiver_phone_type().getBytes());
put.addColumn("C1".getBytes(),"receiver_network".getBytes(),rowData.getReceiver_network().getBytes());
put.addColumn("C1".getBytes(),"receiver_gps".getBytes(),rowData.getReceiver_gps().getBytes());
put.addColumn("C1".getBytes(),"receiver_sex".getBytes(),rowData.getReceiver_sex().getBytes());
put.addColumn("C1".getBytes(),"msg_type".getBytes(),rowData.getMsg_type().getBytes());
put.addColumn("C1".getBytes(),"distance".getBytes(),rowData.getDistance().getBytes());
put.addColumn("C1".getBytes(),"message".getBytes(),rowData.getMessage().getBytes());
table.put(put);
System.out.println("数据生成到-->"+i);
}
//4.4: 释放资源
table.close();
connection.close();
}
//定义一个方法: 随机生成一行数据
public static Msg randomRow(Map<String, List<String>> resultMap){
Msg msg = new Msg();
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date date = new Date();
msg.setMsg_time(format.format(date));
msg.setSender_nickyname(ExcelReader.randomColumn(resultMap,"sender_nickyname"));
msg.setSender_account(ExcelReader.randomColumn(resultMap,"sender_account"));
msg.setSender_sex(ExcelReader.randomColumn(resultMap,"sender_sex"));
msg.setSender_ip(ExcelReader.randomColumn(resultMap,"sender_ip"));
msg.setSender_os(ExcelReader.randomColumn(resultMap,"sender_os"));
msg.setSender_phone_type(ExcelReader.randomColumn(resultMap,"sender_phone_type"));
msg.setSender_network(ExcelReader.randomColumn(resultMap,"sender_network"));
msg.setSender_gps(ExcelReader.randomColumn(resultMap,"sender_gps"));
msg.setReceiver_nickyname(ExcelReader.randomColumn(resultMap,"receiver_nickyname"));
msg.setReceiver_ip(ExcelReader.randomColumn(resultMap,"receiver_ip"));
msg.setReceiver_account(ExcelReader.randomColumn(resultMap,"receiver_account"));
msg.setReceiver_os(ExcelReader.randomColumn(resultMap,"receiver_os"));
msg.setReceiver_phone_type(ExcelReader.randomColumn(resultMap,"receiver_phone_type"));
msg.setReceiver_network(ExcelReader.randomColumn(resultMap,"receiver_network"));
msg.setReceiver_gps(ExcelReader.randomColumn(resultMap,"receiver_gps"));
msg.setReceiver_sex(ExcelReader.randomColumn(resultMap,"receiver_sex"));
msg.setMsg_type(ExcelReader.randomColumn(resultMap,"msg_type"));
msg.setDistance(ExcelReader.randomColumn(resultMap,"distance"));
msg.setMessage(ExcelReader.randomColumn(resultMap,"message"));
return msg;
}
// 生成 rowkey
private static byte[] getRowkey(Msg msg) throws ParseException {
// 3. 构建ROWKEY
// 发件人ID1反转
StringBuilder stringBuilder = new StringBuilder(msg.getSender_account());
stringBuilder.append("_");
stringBuilder.append(msg.getReceiver_account());
stringBuilder.append("_");
// 转换为时间戳
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
stringBuilder.append(sdf.parse(msg.getMsg_time()).getTime());
byte[] orginkey = Bytes.toBytes(stringBuilder.toString());
// 为了避免ROWKEY过长,取前八位
String md5AsHex = MD5Hash.getMD5AsHex(orginkey).substring(0, 8);
return Bytes.toBytes(md5AsHex + "_" + stringBuilder.toString());
}
}
2.7.4、读取Excel数据
// 用于读取Excel文档数据, 并且用于随机返回指定列的数据
public class ExcelReader {
private static Logger log = Logger.getLogger("client");
public static void main(String[] args) {
String xlxsPath = "D:\\传智工作\\上课\\大数据进阶1期(线上)\\day16_HBase\\资料\\陌陌海量消息存储案例\\测试数据集.xlsx";
Map<String, List<String>> mapData = readXlsx(xlxsPath, "陌陌数据");
for(int i = 0; i < 10; ++i) {
System.out.println(randomColumn(mapData, "sender_nickyname"));
}
}
/**
* 随机获取某一列的数据
* @param columnName 列名
* @return 随机数据
*/
public static String randomColumn(Map<String, List<String>> resultMap, String columnName) {
List<String> valList = resultMap.get(columnName);
if(valList == null) throw new RuntimeException("未读取到列名为" + columnName + "的任何数据!");
Random random = new Random();
int randomIndex = random.nextInt(valList.size());
return valList.get(randomIndex);
}
/**
* 将Excel文件读取为Map结构: <column_name, list>
* 其中column_name为第4行的名字
* @param path Excel文件路径(要求Excel为2007)
* @param sheetName 工作簿名称
* @return Map结构
*/
public static Map<String, List<String>> readXlsx(String path, String sheetName)
{
// 列的数量
int columnNum = 0;
HashMap<String, List<String>> resultMap = new HashMap<String, List<String>>();
ArrayList<String> columnList = new ArrayList<String>();
try
{
OPCPackage pkg= OPCPackage.open(path);
XSSFWorkbook excel=new XSSFWorkbook(pkg);
//获取sheet
XSSFSheet sheet=excel.getSheet(sheetName);
// 加载列名
XSSFRow columnRow = sheet.getRow(3);
if(columnRow == null) {
throw new RuntimeException("数据文件读取错误!请确保第4行为英文列名!");
}
else {
Iterator<Cell> colIter = columnRow.iterator();
// 迭代所有列
while(colIter.hasNext()) {
Cell cell = colIter.next();
String colName = cell.getStringCellValue();
columnList.add(colName);
columnNum++;
}
}
System.out.println("读取到:" + columnNum + "列");
System.out.println(Arrays.toString(columnList.toArray()));
// 初始化resultMap
for(String colName : columnList) {
resultMap.put(colName, new ArrayList<String>());
}
// 迭代sheet
Iterator<Row> iter = sheet.iterator();
int i = 0;
int rownum = 1;
while(iter.hasNext()) {
Row row = iter.next();
Iterator<Cell> cellIter = row.cellIterator();
// 跳过前4行
if(rownum <= 4) {
++rownum;
continue;
}
while(cellIter.hasNext()) {
XSSFCell cell=(XSSFCell) cellIter.next();
//根据单元的的类型,读取相应的结果
if(cell.getCellType() == CellType.NUMERIC) {
resultMap.get(columnList.get(i % columnList.size())).add(Double.toString(cell.getNumericCellValue()));
}
else {
resultMap.get(columnList.get(i % columnList.size())).add(cell.getStringCellValue());
}
++i;
++rownum;
}
}
}
catch (Exception e) {
e.printStackTrace();
}
return resultMap;
}
}
2.7.5、实体类
public class Msg {
private String msg_time;
private String sender_nickyname;
private String sender_account;
private String sender_sex;
private String sender_ip;
private String sender_os;
private String sender_phone_type;
private String sender_network;
private String sender_gps;
private String receiver_nickyname;
private String receiver_ip;
private String receiver_account;
private String receiver_os;
private String receiver_phone_type;
private String receiver_network;
private String receiver_gps;
private String receiver_sex;
private String msg_type;
private String distance;
private String message;
public String getMsg_time() {
return msg_time;
}
public void setMsg_time(String msg_time) {
this.msg_time = msg_time;
}
public String getSender_nickyname() {
return sender_nickyname;
}
public void setSender_nickyname(String sender_nickyname) {
this.sender_nickyname = sender_nickyname;
}
public String getSender_account() {
return sender_account;
}
public void setSender_account(String sender_account) {
this.sender_account = sender_account;
}
public String getSender_sex() {
return sender_sex;
}
public void setSender_sex(String sender_sex) {
this.sender_sex = sender_sex;
}
public String getSender_ip() {
return sender_ip;
}
public void setSender_ip(String sender_ip) {
this.sender_ip = sender_ip;
}
public String getSender_os() {
return sender_os;
}
public void setSender_os(String sender_os) {
this.sender_os = sender_os;
}
public String getSender_phone_type() {
return sender_phone_type;
}
public void setSender_phone_type(String sender_phone_type) {
this.sender_phone_type = sender_phone_type;
}
public String getSender_network() {
return sender_network;
}
public void setSender_network(String sender_network) {
this.sender_network = sender_network;
}
public String getSender_gps() {
return sender_gps;
}
public void setSender_gps(String sender_gps) {
this.sender_gps = sender_gps;
}
public String getReceiver_nickyname() {
return receiver_nickyname;
}
public void setReceiver_nickyname(String receiver_nickyname) {
this.receiver_nickyname = receiver_nickyname;
}
public String getReceiver_ip() {
return receiver_ip;
}
public void setReceiver_ip(String receiver_ip) {
this.receiver_ip = receiver_ip;
}
public String getReceiver_account() {
return receiver_account;
}
public void setReceiver_account(String receiver_account) {
this.receiver_account = receiver_account;
}
public String getReceiver_os() {
return receiver_os;
}
public void setReceiver_os(String receiver_os) {
this.receiver_os = receiver_os;
}
public String getReceiver_phone_type() {
return receiver_phone_type;
}
public void setReceiver_phone_type(String receiver_phone_type) {
this.receiver_phone_type = receiver_phone_type;
}
public String getReceiver_network() {
return receiver_network;
}
public void setReceiver_network(String receiver_network) {
this.receiver_network = receiver_network;
}
public String getReceiver_gps() {
return receiver_gps;
}
public void setReceiver_gps(String receiver_gps) {
this.receiver_gps = receiver_gps;
}
public String getReceiver_sex() {
return receiver_sex;
}
public void setReceiver_sex(String receiver_sex) {
this.receiver_sex = receiver_sex;
}
public String getMsg_type() {
return msg_type;
}
public void setMsg_type(String msg_type) {
this.msg_type = msg_type;
}
public String getDistance() {
return distance;
}
public void setDistance(String distance) {
this.distance = distance;
}
public String getMessage() {
return message;
}
public void setMessage(String message) {
this.message = message;
}
@Override
public String toString() { // {key:value,key1:value1,key2:value .... }
return JSON.toJSONString(this);
}
2.7.6、查询操作
public class ChatMessageServiceImpl implements ChatMessageService {
private Connection connection ;
private Table table;
@Override
public List<Msg> getMessage(String date, String sender, String receiver) throws Exception {
//1. 根据连接工厂, 创建连接对象
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum","node1:2181,node2:2181,node3:2181");
connection = ConnectionFactory.createConnection(conf);
//2. 根据连接对象, 获取管理对象: Table对象
table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
//3. 执行相关的操作
Scan scan = new Scan();
scan.setLimit(100);
String startDate = date +" 00:00:00";
String endDate = date +" 23:59:59";
SingleColumnValueFilter startMsg_filter = new SingleColumnValueFilter("C1".getBytes(), "msg_time".getBytes(),
CompareOperator.GREATER_OR_EQUAL, new BinaryComparator(startDate.getBytes()));
SingleColumnValueFilter endMsg_filter = new SingleColumnValueFilter("C1".getBytes(), "msg_time".getBytes(),
CompareOperator.LESS_OR_EQUAL, new BinaryComparator(endDate.getBytes()));
SingleColumnValueFilter senderMsg_filter = new SingleColumnValueFilter("C1".getBytes(), "sender_account".getBytes(),
CompareOperator.EQUAL, new BinaryComparator(sender.getBytes()));
SingleColumnValueFilter receiverMsg_filter = new SingleColumnValueFilter("C1".getBytes(), "receiver_account".getBytes(),
CompareOperator.EQUAL, new BinaryComparator(receiver.getBytes()));
FilterList filterList = new FilterList();
filterList.addFilter(startMsg_filter);
filterList.addFilter(endMsg_filter);
filterList.addFilter(senderMsg_filter);
filterList.addFilter(receiverMsg_filter);
scan.setFilter(filterList);
ResultScanner results = table.getScanner(scan);
//4. 处理结果集
List<Msg> msgList = new ArrayList<Msg>();
for (Result result : results) {
List<Cell> listCells = result.listCells();
// 封装每一行数据, 返回一个msg对象
Msg msg = resusltMsg( listCells);
msgList.add(msg);
}
//5. 釋放資源
close();
return msgList;
}
@Override
public void close() throws Exception {
table.close();
connection.close();
}
private Msg resusltMsg( List<Cell> listCells) {
Msg msg = new Msg();
for (Cell cell : listCells) {
byte[] qualifierBytes = CellUtil.cloneQualifier(cell);
String qualifier = Bytes.toString(qualifierBytes);
byte[] valueBytes = CellUtil.cloneValue(cell);
String value = Bytes.toString(valueBytes);
if("msg_time".equalsIgnoreCase(qualifier)){
msg.setMsg_time(value);
}
if("sender_nickyname".equalsIgnoreCase(qualifier)){
msg.setSender_nickyname(value);
}
if("sender_account".equalsIgnoreCase(qualifier)){
msg.setSender_account(value);
}
if("sender_sex".equalsIgnoreCase(qualifier)){
msg.setSender_sex(value);
}
if("sender_ip".equalsIgnoreCase(qualifier)){
msg.setSender_ip(value);
}
if("sender_os".equalsIgnoreCase(qualifier)){
msg.setSender_os(value);
}
if("sender_phone_type".equalsIgnoreCase(qualifier)){
msg.setSender_phone_type(value);
}
if("sender_network".equalsIgnoreCase(qualifier)){
msg.setSender_network(value);
}
if("sender_gps".equalsIgnoreCase(qualifier)){
msg.setSender_gps(value);
}
if("receiver_nickyname".equalsIgnoreCase(qualifier)){
msg.setReceiver_nickyname(value);
}
if("receiver_ip".equalsIgnoreCase(qualifier)){
msg.setReceiver_ip(value);
}
if("receiver_account".equalsIgnoreCase(qualifier)){
msg.setReceiver_account(value);
}
if("receiver_os".equalsIgnoreCase(qualifier)){
msg.setReceiver_os(value);
}
if("receiver_phone_type".equalsIgnoreCase(qualifier)){
msg.setReceiver_phone_type(value);
}
if("receiver_network".equalsIgnoreCase(qualifier)){
msg.setReceiver_network(value);
}
if("receiver_gps".equalsIgnoreCase(qualifier)){
msg.setReceiver_gps(value);
}
if("receiver_sex".equalsIgnoreCase(qualifier)){
msg.setReceiver_sex(value);
}
if("msg_type".equalsIgnoreCase(qualifier)){
msg.setMsg_type(value);
}
if("distance".equalsIgnoreCase(qualifier)){
msg.setDistance(value);
}
if("message".equalsIgnoreCase(qualifier)){
msg.setMessage(value);
}
}
return msg ;
}
}