HIVE表创建 hive怎么创建数据库

转载

网络安全战士 2023-06-14 21:30:02

文章标签 HIVE表创建 hive 数据库 hadoop 加载数据 文章分类 Hive 大数据

1.Hive的数据库操作

1:创建数据库:
	create database if not exists myhive;
	if not exists 最好写,方便后面执行shell脚本.
解释:
	1:当我们在hive每创建一个数据库，则Hive会自动在HDFS上创建一个文件
	夹:/user/hive/warehouse/myhive.db #数据库名字
	
	说明：hive的表存放位置模式是由hive-site.xml当中的一个属性指定的(默认就在里面,配置里面看不到)
	<name>hive.metastore.warehouse.dir</name>
	<value>/user/hive/warehouse</value>
	
2:查看数据库信息
	desc  database  myhive;
	解释：查看的信息就是元数据，这些元数据保存在mysql的hive数据库DBS表中

3:删除数据库:
	drop database mytest3 cascade;
	强制删除

4:创建数据库并且制定在HDFS上的位置:
create database myhive2 location '/myhive2';  #根目录下

2.Hive表操作

2.1 表分类

内部表（管理表）
	create  table stu(); 私有
外部表
	create external table stu(); 公有

2.2 内部表操作

1.创建表 
	create tables if not exists stu(id int,name string,address 
	string);
	解释：每创建一个表，Hive会自动在所属的数据库目录创建一个文件夹

2.插入语句
	insert into stu values(1,'玩具','湖北');
	解释：插入数据内部走的是MapReduce,每插入一次会生成一个小文件,所以
		 插入会很慢
        注意：默认情况下Hive字段之间的分隔符是'\001'
3.创建表并制定分隔符
	create table stu4(id int , name string) row format delimited 
	fields terminated by '\t';

4.创建表指定表存储格式和存储位置
	create table stu5(id int , name string) row format delimited 
	fields terminated by '\t' stored as textfile location '/stu5';

解释:
	stored as textfile  指定表文件存储格式是文本格式（默认）可以不写
	location '/stu5'  指定表文件目录的存储位置，但是理论上该表还是属于原数据库

5:创建表并复制表结构和表数据
 	create table stu1 as select * from stu;

6:创建表只复制表结构
	create table stu4 like stu2;

7:查看表结构详情
	desc  formatted  表名;

8:删除表
	drop table 表名;

总结:
内部表删除之后，元数据信息和表数据全部删除

2.3外部表操作

在创建表的时候可以指定external关键字创建外部表,外部表对应的文件存储在location指定的hdfs目录下,向该目录添加新文件的同时，该表也会读取到该文件(当然文件格式必须跟表定义的一致)。删除hive外部表的时候，数据仍然存放在hdfs当中，不会删掉, 删除的只是元数据。

1.数据装载载命令Load:
load data [local] inpath '/export/data/datas/student.txt' 
[overwrite]  into table student [partition (partcol1=val1,…)];

解释: 
	  load data -加载数据
	  有local-表示从本地linux加载数据到Hive
	  无local-表示从HDFS上加载数据到Hive(实际用的多).
		
	  inpath-表示加载数据的路径 
	  
	  有overwrute-表示覆盖表中的数据
	  无overwrite-追加表数据
		
	  into table 表名-具体加载到那张表
	  partition-表示上传到指定分区

2.3.1具体外部表操作

1.创建老师表
create external table teacher(t_id string,t_name string) row 
format delimited fields terminated by '\t';

2.创建学生表
create external table student (s_id string,s_name string,s_birth 
string , s_sex string) row format delimited fields terminated by '\t';

3.从本地文件系统向表中加载数据
load data local inpath '/export/data/hivedata/student.txt' into table student; (赋值  本地还在)

从HDFS向表中加载数据:
	其实就是一个移动文件的操作
	需要提前将数据上传到hdfs文件系统，
	hadoop fs -mkdir -p /hivedatas     -- 在HDFS上转件文件夹
	cd /export/data/hivedatas
	hadoop fs -put teacher.csv /hivedatas/   --将本地从Linux上传到HDFS
最后在Hive里面加载数据:
load data inpath '/hivedata/student.txt'[HDFS上的文件路径] into table student; (相当于剪切  实际用的多)

4、加载数据并覆盖已有数据
load data local inpath '/export/data/hivedatas/student.txt'[本地文件路径] overwrite  into table student;

2.4 复杂类型操作

1:array数组类型,Array中存放相同类型的数据

源数据:
说明:name与work_city之间制表符(\t)分隔，work_city中元素之间逗号分隔
zhangsan	  beijing,shanghai,tianjin,hangzhou
wangwu   	changchun,chengdu,wuhan,beijing

建表:
create external table hive_array(name string, work_city array<string>)
row format delimited fields terminated by '\t'
collection items terminated by  ',';

导入数据（HDFS导入）
load data  inpath '/hivedatas/work_city.txt' overwrite into table hive_array;

常用查询：
-- 查询所有数据
select * from hive_array;
-- 查询work_city数组中第一个元素
select name, work_city[0] as city from hive_array;
-- 查询location数组中元素的个数
select name, size(work_city) as city_size from hive_array;
-- 查询location数组中包含tianjin的信息
select * from hive_array where array_contains(work_city,'tianjin'); 

#array_contains 一个函数

map类型,map就是描述key-value数据

源数据:
说明：字段与字段分隔符: “,”；需要map字段之间的分隔符："#"；map内部k-v分隔符：":"
1,zhangsan,father:xiaoming#mother:xiaohuang#brother:xiaoxu,28
2,lisi,father:mayun#mother:huangyi#brother:guanyu,22
3,wangwu,father:wangjianlin#mother:ruhua#sister:jingtian,29
4,mayun,father:mayongzhen#mother:angelababy,26

建表语句
create table hive_map(
id int, name string, members map<string,string>, age int
)
row format delimited
fields terminated by ','
collection items terminated by  '#' 
map keys terminated by  ':'; 

数据从HDFS导入Hive:
load data inpath '/hivedatas/hive_map.txt' overwrite into table hive_map;

常用查询
select * from hive_map;
#根据键找对应的值
select id, name, members['father']  as father, members['mother']  as
mother, age from hive_map;
#获取所有的键
select id, name, map_keys(members) as relation from hive_map;
#获取所有的值
select id, name, map_values(members) as relation from hive_map;
#获取键值对个数
select id,name,size(members) as num from hive_map;
#获取有指定key的数据
select * from hive_map where array_contains(map_keys(members), 'brother');
#查找包含brother这个键的数据，并获取brother键对应的值
select id,name, members['brother'] as brother from hive_map where array_contains(map_keys(members), 'brother');

struct类型

源数据：
说明：字段之间#分割，第二个字段之间冒号分割
192.168.1.1#zhangsan:40
192.168.1.2#lisi:50
192.168.1.3#wangwu:60
192.168.1.4#zhaoliu:70

建表语句
create table hive_struct(
ip string, info struct<name:string, age:int>
)
row format delimited
fields terminated by '#'
collection items terminated by ':';

load data  inpath '/hivedatas/hive_struct.txt'[HDFS路径] overwrite into table hive_struct;

常用查询
select * from hive_struct;
#根据struct来获取指定的成员的值
select ip, info.name from hive_struct;

3 内部表,外部表之间的转换(tblproperties)

1、查询表的类型
desc formatted student; (详细查看)
	Table Type:            EXTERNAL_TABLE
desc 表明;  粗略查看

2、修改外部表student为外内部表
alter table student set tblproperties('EXTERNAL'='FALSE');

3、修改内部表student为外部表
alter table student set tblproperties('EXTERNAL'='TRUE');
注意:全部大写

4 分区表(partitioned ,partition)

分区不是独立的表模型,要和内部表或者外部表结合:
内部分区表
外部分区表

在hive中，分区就是分文件夹

1.创建表(单个分区)
create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by '\t';

解释:partitioned by  固定写法  表示分区
	month string 要分区的字段  但是严格意义上不算字段;  month名字随意

2.创建一个表带多个分区
create table score2 (s_id string,c_id string, s_score int) 
partitioned by (year string,month string,day string) row format 
delimited fields terminated by '\t';
注意:创建多个分区表 可以在HDFS上看见多个层级的文件夹


查看分区
show  partitions  score;
添加一个分区
alter table score add partition(month='202008');

同时添加多个分区
alter table score add partition(month='202009') partition(month = '202010');

删除分区
alter table score drop partition(month = '202010');

多分区联合查询使用union  all来实现
select * from score where month = '202006' union all select * 
from score where month = '202007';

只能清空管理表，也就是内部表
truncate table score4;

5 hive表中加载数据

1.直接向分区表中插入数据
通过insert into方式加载数据
create table score3 like score;
insert into table score3 partition(month ='202007') values ('001','002',100);

2.通过查询方式加载数据
create table score4 like score;
insert overwrite table score4 partition(month = '202006') select s_id,c_id,s_score from score;   #overwrite不能省略

3.通过查询插入数据
通过load方式加载数据
create table score5 like score;
load data local inpath '/export/data/hivedatas/score.csv' overwrite into table score5 partition(month='202006');

6 分桶表(clustered by(c_id) into 3 buckets)

分桶就是将数据划分到不同的文件，其实就是MapReduce的分区,分桶表不能直接添加数据,要借助临时表.

1.开启hive的桶表功能(如果执行该命令报错，表示这个版本的Hive已经自动开启了分桶功能，则直接进行下一步)
set hive.enforce.bucketing=true;

2.设置reduce的个数
set mapreduce.job.reduces=3;  

3.创建分桶表
create table course (c_id string,c_name string,t_id string) 
clustered by(c_id) into 3 buckets row format delimited fields 
terminated by '\t';

桶表的数据加载，由于桶表的数据加载通过hadoop fs  -put文件或者通过load  data均不好使，只能通过insert  overwrite

创建普通表，并通过insert  overwrite的方式将普通表的数据通过查询的方式加载到桶表当中去

创建普通表：
create table course_common (c_id string,c_name string,t_id string) row format delimited fields terminated by '\t';

普通表中加载数据
load data local inpath '/export/date/hivedatas/course.csv' into table course_common;

通过insert  overwrite给桶表中加载数据
insert overwrite table course 
select * from course_common cluster by(c_id);

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：mysql workbench导入数据库后行数只有1000行 mysql workbench 导入数据库

下一篇：contos7 docker镜像 docker镜像系统

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

HIVE表创建 hive怎么创建数据库

HIVE表创建 hive怎么创建数据库

1.Hive的数据库操作

2.Hive表操作

2.1 表分类

2.2 内部表操作

2.3外部表操作

2.3.1具体外部表操作

2.4 复杂类型操作

3 内部表,外部表之间的转换(tblproperties)

4 分区表(partitioned ,partition)

5 hive表中加载数据

6 分桶表(clustered by(c_id) into 3 buckets)

51CTO博客