Hive 进阶 建表语句 分区 分桶 视图
- 一、建表出错不想drop跑路的万能语句
- 1.修改hive表结构 - alter table常用
- 2.insert overwrite
- 3.`注意`
- 二、Hive建表高阶语句
- CTAS - as select方式建表
- CTE(CATS with Common Table Expression )
- Like
- 创建临时表
- 三、Hive分区(Partitions)
- 静态分区实例
- 动态分区实例
- 四、加载本地数据文件到hive数据库表
- 五、数据分桶(Buckets)
- 分桶抽样
- 六、行转列 列转行
- 行转列
- 列转行
- 七、视图
- 视图格式:
- explode
- 侧视图(Lateral View)
一、建表出错不想drop跑路的万能语句
1.修改hive表结构 - alter table常用
作用 | 命令格式 |
修改表名 | alter table table_name rename to newName |
新增列 | alter table table_name add columns (column_namedatatype) |
删除(利用修改)/替换列 | alter table table_name replace columns (namedatatype,…) |
修改列名和属性 | alter table table_name change column_namenew_namenew_type |
2.insert overwrite
覆盖之前表所有添加内容添加新内容
insert overwrite table table_name values(...),(...),...
3.注意
hive一般不支持update和delete操作,需要修改配置,频繁update和delete操作违背hive的初衷,尽量使用增量添加方式。
二、Hive建表高阶语句
CTAS - as select方式建表
格式:
CREATE TABLE cats_employee as SELECT * FROM EMPLOYEE;
注意:CTAS不能创建partition,external,bucket table
用法实例:
create table myuser as select * from userinfos where age=40;
CTE(CATS with Common Table Expression )
CREATE TABLE cte_employee AS
WITH
r1 AS (SELECT name FROM r2 WHERE name = 'Michael'),
r2 AS (SELECT name FROM employee WHERE sex_age.sex= 'Male'),
r3 AS (SELECT name FROM employee WHERE sex_age.sex= 'Female')
SELECT * FROM r1 UNION ALL SELECT * FROM r3;
某复杂查询语句:
sql写法:
select r.username,r.className,r.score,(r.score/l.countScore*100) percent
from
(select className,sum(score) countScore from scores group by className) l
inner join (select u.*,s.className,s.score from userinfos u inner join scores s on u.userid=s.userid) r
on
l.className = r.className;
with as 写法:
with
a1 as (select className,sum(score) countScore from scores group by className),
a2 as (select u.*,s.className,s.score from userinfos u inner join scores s on u.userid=s.userid)
select a2.username,a2.className,a2.score,(a2.score/a1.countScore*100) from a1 inner join a2 on a1.className=a2.className;
Like
like 一般用于复制表的结构
CREATE TABLE employee_like LIKE employee;
创建临时表
临时表简单理解为不落盘的表
- 表只对当前session有效,session退出后自动删除
表 - 空间位于/tmp/hive-<user_name>(安全考虑)
- 如果创建的临时表表名已存在,实际用的是临时表
CREATE TEMPORARY TABLE tmp_table_name1 (c1 string);
CREATE TEMPORARY TABLE tmp_table_name2 AS..
CREATE TEMPORARY TABLE tmp_table_name3 LIKE..
三、Hive分区(Partitions)
分区主要用于提高性能
- 分区列的值将表划分为segments(文件夹)
- 查询时使用“分区”列和常规列类似
- 查询时Hive自动过滤掉不用于提高性能的分区
分为静态分区和动态分区
- 静态分区 自己过滤自己匹配数据,白话自定义分区
- 动态分区 设定属性–开启动态分区,白话自动分区
静态分区实例
//userinfos是已存在的表,可自建
//已存在表建立分区
insert overwrite table mypart partition(gender='male')
select userid,username from userinfos where gender='male';
//为表新增新的分区
alter table mypart add partition(gender='male') partition(gender='female');
//手动删除分区
alter table mypart drop partition(gender='female')
动态分区实例
//开启动态分区
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
//新表创建,建动态分区
create table mypart(mpid int,mpname string)
partitioned by(gender string)
row format delimited fields terminated by ',';
//insert方式添加
insert into table emplyee partition(year,month)
select name,year(start_date) year,month(start_date) month
from emplyee;
设置动态分区的个数上限
set hive.exec.max.created.files=600000;
四、加载本地数据文件到hive数据库表
load data local inpath '/opt/mydate.csv' overwrite into table mydemo.customs;
注意local是加载本地数据,不加local加载的是hdfs中的数据
五、数据分桶(Buckets)
//分桶只有动态分桶
set hive.enforce.bucketing=true
//定义分桶 分桶数最好是2的n次方
clustered by (userid) into 2 buckets
//必须使用insert方式加载数据
分桶抽样
随机抽样基于整行数据
select * from table_name tablesample(bucket 3 out of 32 on rand()) s;
随机抽样基于指定列
select* from table_name tablesample(bucket 3 out of 32 on id) s;
随机抽样基于block size
select *from table_name tablesample(10 percent) s;
select *from table_name tablesample(1M) s;
select *from table_name tablesample(10 rows) s;
六、行转列 列转行
行转列
对如下该表进行行转列操作
操作代码:
select userid,
sum(if(subject='chinese',score,0)) as chinese,
sum(if(subject='math'),score,0) as math
from tb_score
group by userid;
结果展示
列转行
对下表对应语文数学分数进行列转行操作
操作代码:
select userid,'chinese' as subject,chinese as score from view view_sc
union all
select useid,'math' as subject,math as score from view_sc;
结果展示:
七、视图
视图格式:
create view view_score as …
explode
explode作用,一段代码了解
下图以逗号分割字符
侧视图(Lateral View)
对下表创建外表
外表创建:
create external table mytest(name,likes string)
row format delimited fields terminated by ','
location '/mydata/0610';
查询结果:
侧视应用:
select name,myview from mytest lateral view explode(split(likes,'\\|')) a as myview;
结果展示: