Hive表的数据加载
加载本地文件到数据表
$ local data local inpath '/../../.' into table table_name;
加载hdfs文件到hive表
$ load data inpath '/load_students' into student_load_hdfs;
覆盖表中所有数据
overwrite 关键字
$ local data local inpath '/../../.'overwrite into table table_name ;
创建表时通过select加载
create table load_create_select as select stuid,stuname from student_load_hdfs;
创建表时通过insert加载
$ create table load_create_insert like student_load_hdfs;
insert into table load_create_insert select * from student_load_hdfs
创建表时,指定location加载
dfs -mkdir -p /user/hive/warehouse/db_hive.db/load_location_put
dfs -put /opt/modules/datafiles/load_students /user/hive/warehouse/db_hive.db/load_location_put/;
create table if not exists load_location_put(
stuid int ,
stuname string ,
stuage int
)
row format delimited fields terminated by '\t'
stored as textfile
$ location '/user/hive/warehouse/db_hive.db/load_location_put';
导出数据(对hive表数据分析后保存)
保存到本地目录文件中
insert overwrite local directory '/opt/modules/datafiles/' select * from load_location_put
保存到hdfs目录
insert overwrite directory '/hive_imp/load_localtion_put' select * from load_localtion_put;
重定向结果
在hive外部执行
bin/hive "select * from
db_hive.load_location_put;">/opt/modules/datafiles/load_location_put.txt
外部分区表
建表
create external table if not exists student_external_part(
stuid int ,
stuname string ,
stuage int
)
partitioned by (class string)
row format delimited fields terminated by '\t'
stored as textfile
location '/user/student_external_part'
;
加载数据
load data loacal inpath '/,./../.' into table table_name partition(class='ss');
分区表与常规表加载数据的区别
创建常规表,加载数据
创建常规表与分区表对比
加载数据前,外部数据表无数据
加载数据前外部分区表的分区
加载数据到分区
alter table student_part add partition(class='01');
查看分区表信息
通过UI查看信息
常规表上传数据到hdfs的表路径中,然后创建表,数据会加载到表中,而分区表则不会,还需要使用alter来加载分区
export,import 数据
export(从hive导出到hdfs) 数据
export table student_load_hdfs to '/hive_inp/export_tables'
通过UI查看
import (从hdfs导入到hive表)数据
import table student_load_hdfs2 from '/hive_inp/export_tables';
sort by,distribute by ,cluster by ,order by 的作用
order by
全局排序,只作用于一个reduce
sort by
对每一个Reduce内部的数据进行排序
>set mapreduce.job.reduces=3;
>select * from emp sort by empno desc;
>insert overwrite local directory '/opt/datafiles/sort-by/ ' select * from emp sort by empno desc;
结果截图
distribute by
作用类似于分区partitioner ,底层是mapreduce,通常与sort by进行使用,在sort by之前使用
set setmapre.job.reduces=3
insert overwrite local directory '/opt/datafiles/sort-by/ ' select * from emp distribute by deptno sort by empno desc;
cluster by
当sort by,distribute by 字段相同时,,用clustor代替
insert overwrite local directory '/opt/datafiles/sort-by/ ' select * from emp cluster by empno desc;
输出结果
我这里因为插入数据间的类型不为制表符的原因 导致输出的类型有问题