hive里是否能设data type hive add date

转载

mob64ca140e4022 2023-11-17 14:02:20

文章标签 hive里是否能设data type hadoop hive hdfs 字段 文章分类 Hive 大数据

hive语句操作

一：建表语句

1.建表

create table t_order(id string,create_time string,amount float,uid string)
 row format delimited
 fields terminated by ‘,’; //读取文件时以，作为分隔符

2.删除表

drop table t_order;
 hive会从元数据库中清除关于这个表的信息；
hive还会从hdfs中删除这个表的表目录；

3.创建带分区的表

create table t_access(ip string,url string,access_time string)
 partitioned by(dt string)
 row format delimited
 fields terminated by ‘,’;
 //注意：分区字段不能是表定义中的已存在字段，否组会冲突；以为分区字段也会被当成数组字段值被返回，其实这是一个伪字段；
 向指定分区导入数据：load data local inpath ‘/root/access.log.2017-08-04.log’ into table t_access partition(dt=‘20170804’);
 针对指定分区进行查询：select count(*) from t_access where dt=‘20170804’;
 //一个表可以带有多个分区！！！
 查看表的分区：show partitions t_access;

4.导入文件到数据库

load data local inpath ‘/root/access.log.2017-08-04.log’ into table t_access ;

5.通过已存在表来建表

create table t_user_2 like t_user;

//新建的t_user_2表结构定义与源表t_user一致，但是没有数据

6.建表的同时插入数据

create table t_access_user as select ip,url from t_access;
 //创建的表的字段与查询语句的字段是一样的

二：数据文件的导入和导出

1.在hive的交互式shell中用hive命令来导入本地数据到表目录

hive>load data local inpath ‘/root/order.data.2’ into table t_order;

2.用hive命令导入hdfs中的数据文件到表目录

hive>load data inpath ‘/access.log.2017-08-06.log’ into table t_access partition(dt=‘20170806’);

3.手动用hdfs命令，将文件放入表目录

直接手动操作

4.将hive表中的数据导入HDFS的文件

insert overwrite directory ‘/root/access-data’
 row format delimited fields terminated by ‘,’
 select * from t_access;

5.将hive表中的数据导入本地磁盘文件

insert overwrite local directory ‘/root/access-data’
 row format delimited fields terminated by ‘,’
 select * from t_access limit 100000;

6:hive的文件格式

HIVE支持很多种文件格式： SEQUENCE FILE | TEXT FILE | PARQUET FILE | RC FILE
 create table t_pq(movie string,rate int) stored as textfile;
 create table t_pq(movie string,rate int) stored as sequencefile;
 create table t_pq(movie string,rate int) stored as parquetfile;

三：字段类型

数字类型

tinyint (1-byte signed integer, from -128 to 127)
 smallint (2-byte signed integer, from -32,768 to 32,767)
 int/integer (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)
 bigint (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
 float (4-byte single precision floating point number)
 double (8-byte double precision floating point number)
 示例：
 create table t_test(a string ,b int,c bigint,d float,e double,f tinyint,g smallint)

日期类型

TIMESTAMP (Note: Only available starting with Hive 0.8.0) 【本质是长整型】
 DATE (Note: Only available starting with Hive 0.12.0)

字符串类型

STRING
 VARCHAR (Note: Only available starting with Hive 0.12.0)
 CHAR (Note: Only available starting with Hive 0.13.0)

其他类型

BOOLEAN
 BINARY (Note: Only available starting with Hive 0.8.0)

复合类型

arrays: ARRAY<data_type> 例：字段名 array
 collection items terminated by ‘:’ //内容中以：分割的都放入array
 maps: MAP<primitive_type, data_type> 例：字段名 map<string,string> //类似于字典操作
 collection items terminated by ‘#’ //内容中以#分割的都放入map
 map keys terminated by ‘:’; //map中每个字段：为分割符，前面是k，后面是v
 select may类型字段名[‘key值’] as key中的值 from 表名; //取map指定字段的key值
 select map_keys(may类型字段名) as 字段名 表名; //取map字段中的所有key（返回一个数组）
 select map_values(may类型字段名) from 表名; //获取所有value（返回一个数组，在查询后加[]课返回指定位置数据）
 structs: STRUCT<col_name : data_type, …> //可以存放多种数据类型

四：基本语句

1.基本查询示例

select * from t_access;
 select count(*) from t_access;
 select max(ip) from t_access;
 sql中的单行函数与聚合函数
 select substr(name,0,2), age from t_user; //截取0-2
 select name, sex, max(age) from t_user group by sex; //取出最大age，以sex排序

2.条件查询

select * from t_access where access_time<‘2017-08-06 15:30:20’
 select * from t_access where access_time<‘2017-08-06 16:30:20’ and ip>‘192.168.33.3’;

3.内连接

// 指定join条件
 select a.* ,b.* from t_a a join t_b b on a.name=b.name; 以a，b的name作为条件，对a，b两个表进行内连接查询（下面相同）

4.左外链接

select a.* ,b.* from t_a a left outer join t_b b on a.name=b.name;

5.右外连接

select a.* ,b.* from t_a a right outer join t_b b on a.name=b.name;

6.全外链接

select a.* ,b.* from t_a a full outer join t_b b on a.name=b.name;

7.左半连接

select a.* from t_a a left semi join t_b b on a.name=b.name;

8.分组聚合
 select dt,count(*),max(ip) as cnt from t_access group by dt; //所有分组
 select dt,count(*),max(ip) as cnt from t_access group by dt having dt>‘20170804’; //指定分组
 注意： 一旦有group by子句，那么，在select子句中就不能有 （分组字段，聚合函数） 以外的字段
 select 要求查询的字段，只能在一组数据中产生一个结果。一组产生一个结果。
 或者说：select中出现的字段要么是分组字段，要么是聚合函数（对多条数据产生一个值）
 如：ip不唯一，那么在使用group by url 时不能出现select ip！！！！

9：子查询

子查询如同sql语句使用，这里不做讲解

五：内置函数

查看全部函数：show functions;

1.类型转换函数

cast与java中的类型强转类似
 select cast(“5” as int) from dual;
 select cast(“2017-08-03” as date) ;

2.数学函数

 select round(5.4); – 5
  select round(5.1345,3); --5.135
  select ceil(5.4); // select ceiling(5.4); – 6
  select floor(5.4); – 5
  select abs(-5.4); – 5.4
  select greatest(3,5); – 5
  select greatest(3,5,6); – 6
  select least(3,5,6);

3.字符串函数

select substr(“abcdefg”,2,3); //截取2.3两个字符串
 select concat(“ab”,“xy”) from dual; //拼接字符串
 select concat_ws(".",“192”,“168”,“33”,“44”) from dual; //拼接字符串，中间以.作为连接
 select length(“192.168.33.44”) from dual; //计算字符串长度
 select split(“192.168.33.44”,"\ \ ."); //把一个字符串按照他指定的符号进行分割，返回到一个数组
 select upper(name) //把name字段中的值全部转化为大写
 select current_date; //常量，获取当前系统时间
 select from_unixtime(unix_timestamp(),“yyyy/MM/dd HH:mm:ss”); //unix时间戳转字符串
 select unix_timestamp(“2017/08/10 17:50:30”,“yyyy/MM/dd HH:mm:ss”); //字符串转unix时间戳
 select explode(subjects) as sub from t_stu_subject; //通过explode 对arrary类型字段进行炸裂，形成一张单独表
 select id,name,tmp.sub from t_stu_subject lateral view explode(subjects) tmp as sub; //lateral生成横向连接，类似左连接
 select moive_name,array_contains(actors,‘吴刚’) from t_movie; //contains函数在array类字段actors中搜索，如果存在返回true
 select sort_array(array(‘c’,‘b’,‘a’)); //对array中的值进行排序输出
 select size(actors) as actor_number from t_movie; //对array中的值进行计数，返回一个int类型
 //size(Map<K.V>) 返回一个int值
 //map_keys(Map<K.V>) 返回一个数组
 //map_values(Map<K.V>) 返回一个数组

4.条件函数

1：case：
 select id,name,
 case
 when age<28 then ‘youngth’
 when age>27 and age<40 then ‘zhongnian’
 else ‘old’
 end
 from t_user;
 2：if：
 select id,if(age>25,‘working’,‘worked’) from t_user;
 select moive_name,if(array_contains(actors,‘吴刚’),‘好电影’,'烂电影‘）from t_move;

5.json解析函数

create table t_rate2 as
 select json_tuple(json,‘movie’,‘rate’,‘timeStamp’,‘uid’) as (movie, rate, ts, uid)
 from t_tatingjson;
 //注意： json_tuple不会对嵌套的json进行解析，会将嵌套的json直接当作某一个字段的value返回，可以通过用自定以函数的方式对嵌套的json进行解析 例：minute(from_unixtime(cast(ts as bigint))) as minute,from_unixtime(cast(ts as bigint)) as ts

6.分析函数

select id,age,name,sex,
 row_number() over(partition by sex order by age desc) as rank
 from t_rownumber
 使用row_number函数，对表中的数据按照性别分组，按照年龄倒序排序并进行标记

六：常用操作

1.删除一个已经存在的表
 drop table employee;
 2.修改一个表明，重命名
 alter table old_name rename to new_name;
 3.修改一个字段的数据类型或者名字，并修改顺序
 ALTER TABLE 表名 CHANGE COLUMN 旧字段名 新的字段名(如果不变就保持原字段) 字段类型(若不变就采用原来的字段) COMMENT ‘新的字段备注’ AFTER 列名;
 4.给某一个表增加某一列的信息
 alter table tab_name add columns(value1 string);
 5.重命名表
 ALTER TABLE table_name RENAME TO new_table_name;
 6.修改表中某一列的属性
 alter table tab_name set tblproperties(‘value’=‘hello’);
 7.修改某一个表的某一列的信息
 alter table tab_name change column key key_1 int comment ‘h’ after value;

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：linux yarn 阿里阿里云 linux

下一篇：android 集成chrome Android 集成html

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

hive里是否能设data type hive add date

hive里是否能设data type hive add date

hive语句操作

一：建表语句

1.建表

2.删除表

3.创建带分区的表

4.导入文件到数据库

5.通过已存在表来建表

6.建表的同时插入数据

二：数据文件的导入和导出

1.在hive的交互式shell中用hive命令来导入本地数据到表目录

2.用hive命令导入hdfs中的数据文件到表目录

3.手动用hdfs命令，将文件放入表目录

4.将hive表中的数据导入HDFS的文件

5.将hive表中的数据导入本地磁盘文件

6:hive的文件格式

三：字段类型

数字类型

日期类型

字符串类型

其他类型

复合类型

四：基本语句

1.基本查询示例

2.条件查询

3.内连接

4.左外链接

5.右外连接

6.全外链接

7.左半连接

9：子查询

五：内置函数

1.类型转换函数

2.数学函数

3.字符串函数

4.条件函数

5.json解析函数

6.分析函数

六：常用操作

51CTO博客