hive 定义hdfs表创建hive表指定hdfs文件路径

转载

云中谁寄锦书来 2023-07-12 11:19:29

文章标签 hive 定义hdfs表 hadoop hive 严格模式 文章分类 Hive 大数据

一、概念

Hive是基于Hadoop的开源数据仓库工具，用于处理海量结构化数据；
Hive把HDFS中结构化的数据映射成表；
Hive通过把HiveSQL进行解析和转换，最终生成一系列基于Hadoop的MapReduce任务，通过执行这些任务完成数据处理。
理解：若没有hive则需要直接使用MapReduce直接处理数据，耗时耗力

分区相当于创建不同文件夹

二、数据准备

1.在hadoop的家目录下新建一个目录datas：
mkdir datas
2.通过xftp将数据源文件传到datas中
3.“hadoop fs -mkdir /datas”
4.“hadoop fs -chmod g+w /datas”
5.“hadoop fs -put /home/hadoop/datas/* /datas”
#将hadoop下的数据源传到HDFS中

三、创建数据库，创建表，加载数据到表

–1.创建caicai数据库

create database if not exists caicai;

创建完成后，验证该数据库在HDFS中是否存在，在hadoop下执行
hadoop fs -ls /user/hive/warehouse/;
（创建数据库是若未指定位置，默认在warehouse中，安装hive时自己创建的）
可以看到存在caicai.db数据库

扩展：可以在hive中查询HDFS的目录及本地Linux系统的目录

hive>dfs -ls /;     #查询HDFS目录
hive>!ls /;           #查询本地Linux目录

–2.使用caicai库

use caicai;

–3.创建user_info表

create external table if not exists user_info (
 user_id string,
 user_name string,
 sex string,
 age int,
 city string,
 firstactivetime string,
 level int,
 extra1 string,
 extra2 map<string,string>)
 row format delimited fields terminated by ‘\t’
 collection items terminated by ‘,’
 map keys terminated by ‘:’
 lines terminated by ‘\n’
 stored as textfile;

加载数据源
load data inpath ‘/datas/user_info/user_info.txt’ overwrite into table user_info;

–4.创建user_trade表

create external table if not exists user_trade (
 user_name string,
 piece int,
 price double,
 pay_amount double,
 goods_category string,
 pay_time bigint)
 partitioned by (dt string)
 row format delimited fields terminated by ‘\t’;执行如下命令以设置动态分区：
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.exec.max.dynamic.partitions=10000;
 set hive.exec.max.dynamic.partitions.pernode=10000;将数据源文件上传到HDFS上
 “hdfs dfs -put /home/hadoop/datas/user_trade/* /user/hive/warehouse/caicai.db/user_trade”十分重要，不修复查询不出来数据

修复分区表：

msck repair table user_trade;**

此时，未设置严格模式前，不指定分区是可以整表查询的：
查询：

select * from user_trade limit 3;

hive 定义hdfs表创建hive表指定hdfs文件路径_hive 定义hdfs表

设置严格模式：

set hive.mapred.mode=strict;

设置严格模式后，不加分区查询会报错：

`select * from user_trade limit 6;`

hive 定义hdfs表创建hive表指定hdfs文件路径_hive_02

需要指定分区并查询

select * from user_trade where dt='2017-01-12';

hive 定义hdfs表创建hive表指定hdfs文件路径_严格模式_03

取消严格模式：

set hive.mapred.mode=nonstrict;

查看分区：
show partitions user_trade

四、默认无严格模式，设置永久开启方法

家目录下写文件
/home/hadoop/
vim .hiverc

#在命令行中显示当前数据库名
set hive.cli.print.current.db=true; 
#查询出来的结果显示列的名称
set hive.cli.print.header=true;
#设置hive执行的严格模式
set hive.mapred.mode=strict;

严格模式：(限制3种查询)
1.分区表在查询时必须写分区条件
2.笛卡尔积不能查询（进行表关联的时候不写关联条件）
3.使用order by进行排序的时候，必须加limit语句。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：clickhouse redis 在线实时数仓 clickhouse restful

下一篇：网格式微内核架构微网格化

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯