查看hive表最近的分区查看hive分区数

转载

lgmyxbjfu 2023-10-22 17:23:57

文章标签 查看hive表最近的分区 1024程序员节 hive Hive 数据 文章分类 Hive 大数据

hql 建表语法格式：

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name      
 [(col_name data_type [COMMENT col_comment], ...)]      
 [COMMENT table_comment]                                 
 [PARTITIONED BY(col_name data_type [COMMENT col_comment], ...)]
 [CLUSTERED BY (col_name, col_name, ...)
 [SORTED BY(col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
 [ROW FORMAT row_format] 
 [STORED AS file_format]
 [LOCATION hdfs_path]

简单的hive的建表语句：

create table student(id string,name string ,sex string ,age int,address string)
row format delimited fields terminated by ','
stored as textfile;

students.txt：

95001,李勇,男,20,CS
95002,刘晨,女,19,IS
95003,王敏,女,22,MA
95004,张立,男,19,IS
95005,刘刚,男,18,MA
95006,孙庆,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95009,梦圆圆,女,18,MA
95010,孔小涛,男,19,CS
95011,包小柏,男,18,MA
95012,孙花,女,20,CS
95013,冯伟,男,21,CS
95014,王小丽,女,19,CS
95015,王君,男,18,MA
95016,钱国,男,21,MA
95017,王风娟,女,18,IS
95018,王一,女,19,IS
95019,邢小丽,女,19,IS
95020,赵钱,男,21,IS
95021,周二,男,17,MA
95022,郑明,男,20,MA

上传student.txt到/user/hive/warehouse/hive.db/student下：

查看hive表最近的分区查看hive分区数_hive

查看hive表最近的分区查看hive分区数_数据_02

Hive使用mysql表当作数据库载体时，当在hive上创建了数据库和时，会分别在mysql和hdfs上产生数据。在mysql里会生成一个数据库hive，hive里可以查看DBS（查看数据库情况）、TBLS（查看表信息情况）、SDS、CDS、COLUMNS_V2（表中的列信息）。

Hive建议不要建主键！！！

hive存放元数据（库、表、存储信息、列（字段））放在数据库（MySQL数据库）中，真实数据放在hdfs上。

Hive 特点

优点：

1、可扩展性,横向扩展，Hive 可以自由的扩展集群的规模，一般情况下不需要重启服务横向扩展：通过分担压力的方式扩展集群的规模纵向扩展：一台服务器cpu i7-6700k 4核心8线程，8核心16线程，内存64G => 128G

2、延展性，Hive支持自定义函数，用户可以根据自己的需求来实现自己的函数

3、良好的容错性，可以保障即使有节点出现问题，SQL 语句仍可完成执行

缺点：

1、Hive 不支持记录级别的增删改操作，但是用户可以通过查询生成新表或者将查询结果导入到文件中（当前选择的 hive-3.x 的版本支持记录级别的插入操作）

2、Hive 的查询延时很严重，因为 MapReduce Job 的启动过程消耗很长时间，所以不能用在交互查询系统中。

3、Hive 不支持事务（因为没有增删改，所以主要用来做OLAP（联机分析处理），而不是OLTP（联机事务处理），这就是数据处理的两大级别）

1.1 基本数据类型

Hive 支持关系型数据中大多数基本数据类型

类型	描述	示例
boolean	true/false	TRUE
tinyint	1字节的有符号整数	-128~127 1Y
smallint	2个字节的有符号整数，-32768~32767	1S
int	4个字节的带符号整数	1
bigint	8字节带符号整数	1L
float	4字节单精度浮点数	1.0
double	8字节双精度浮点数	1.0
deicimal	任意精度的带符号小数	1.0
String	字符串，变长	“a”,’b’
varchar	变长字符串	“a”,’b’
char	固定长度字符串	“a”,’b’
binary	字节数组	无法表示
timestamp	时间戳，纳秒精度	122327493795
date	日期	‘2018-04-07’

和其他的SQL语言一样，这些都是保留字。需要注意的是所有的这些数据类型都是对Java中接口的实现，因此这些类型的具体行为细节和Java中对应的类型是完全一致的。例如，string类型实现的是Java中的String，float实现的是Java中的float，等等。

1.2 复杂类型

类型	描述	示例
array	有序的的同类型的集合	array(1,2)
map	key-value,key必须为原始类型，value可以任意类型	map(‘a’,1,’b’,2)
struct	字段集合,类型可以不同	struct(‘1’,1,1.0), named_stract(‘col1’,’1’,’col2’,1,’clo3’,1.0)

array：

create table movie(movie_name string ,actors array<string>,first_show string)
row format delimited
fields terminated by ','
collection items terminated by ':';

movie.dat：

战狼2,吴京:吴刚:龙母,2017-08-16
三生三世十里桃花,刘亦菲:痒痒,2017-08-20
普罗米修斯,苍老师:小泽老师:波多老师,2020-10-19

load data local inpath '/mine/movie.dat' into table movie;

select * from movie;

查看hive表最近的分区查看hive分区数_Hive_03

根据数组角标拿到array<string>里的数据：

select movie_name,actors[0],actors[1],first_show from movie;

查看hive表最近的分区查看hive分区数_1024程序员节_04

array_contains函数：

select * from movie where array_contains(actors,'吴刚');

查看hive表最近的分区查看hive分区数_1024程序员节_05

size函数：

select * , size(actors) from movie;

查看hive表最近的分区查看hive分区数_数据_06

map：

create table family (id string , name string, family_members map<string,string> , age int)
row format delimited
fields terminated by ','
collection items terminated by '#'
map keys terminated by ':';

family.dat：

1,zhangsan,father:xiaoming#mother:xiaohuang#brother:xiaoxu,28
2,lisi,father:mayun#mother:huangyi#brother:guanyu,22
3,wangwu,father:wangjianlin#mother:ruhua#sister:jingtian,29
4,mayun,father:mayongzhen#mother:angelababy,26

load data local inpath '/mine/family.dat' into table family;

select * from family;

查看hive表最近的分区查看hive分区数_Hive_07

select * , family_members['father'] as father,family_members['mother'] as mother from family;

查看hive表最近的分区查看hive分区数_数据_08

map_keys：

select * , map_keys(family_members) as keys from family;

查看hive表最近的分区查看hive分区数_Hive_09

map_values：

select * , map_values(family_members) as keys from family;

查看hive表最近的分区查看hive分区数_数据_10

size(map_keys(family_members))、 size(map_keys(family_members))：

select * , size(map_keys(family_members)) as keys from family;
select * , size(map_values(family_members)) as keys from family;

查看hive表最近的分区查看hive分区数_1024程序员节_11

select * , family_members['brother'] from family where array_contains(map_keys(family_members),'brother');

查看hive表最近的分区查看hive分区数_数据_12

struct：

create table t_user(id int , name string,info struct<age:int,sex:string,address:string>)
row format delimited fields terminated by ','
collection items terminated by ':';

user.dat：

1,zhangsan,18:male:深圳
2,lisi,28:female:北京
3,wangwu,38:male:广州
4,赵六,26:female:上海
5,钱七,35:male:杭州

load data local inpath '/mine/user.dat' into table t_user;

select * from t_user;

查看hive表最近的分区查看hive分区数_数据_13

select * , info.address  as address from t_user;

查看hive表最近的分区查看hive分区数_hive_14

重点：hive不支持解析多字节的分隔符，也就是说hive只能解析':', 不支持解析'::'，所以用普通方式建表来使用是行不通的！！！

#将hive的SQL执行转成本地模式执行，速度快一些！！！
set hive.exec.mode.local.auto=true;

#从hdfs里加载数据上传到table student表对应的目录中去
load data inpath '/students.txt' into table student;

#从hdfs中加载数据到table student 表对应的目录中
load data local inpath '/mine/students.txt' into table student;

2.1、external关键字创建外部表

create external table student(id string,name string ,sex string ,age int,address string)
row format delimited fields terminated by ','
stored as textfile;

2.2、内部表&外部表：

未被external修饰的是内部表（managed table），被external修饰的为外部表（external table）

2.3、区别：

（1）内部表数据由Hive自身管理，外部表数据由HDFS管理；
（2）内部表数据存储的位置是hive.metastore.warehouse.dir（默认：/user/hive/warehouse），外部表数据的存储位置由自己制定；
（3）删除内部表会直接删除元数据（metadata）及存储数据；删除外部表仅仅会删除元数据，HDFS上的文件并不会被删除；

2.4、建表时指定存储数据的location

create table student(id string,name string ,sex string ,age int,address string)
row format delimited fields terminated by ','
stored as textfile
location '/student_data';

create external table external_student(id string,name string ,sex string ,age int,address string)
row format delimited fields terminated by ','
stored as textfile
location '/external_student_data';

此时可以看到hdfs目录上出现了空的/student.txt文件，

查看hive表最近的分区查看hive分区数_1024程序员节_15

查看hive表最近的分区查看hive分区数_查看hive表最近的分区_16

指定存储数据的location，

load data local inpath '/mine/students.txt' into table external_student;

查看hive表最近的分区查看hive分区数_数据_17

数据放在location里的。

建内部表、外部表时指定了location，就会创建location上，所传数据放在location上。

2.5、drop table时的现象

内部表会将location里的数据和文件夹都会删除；而外部表不会删除！！！

2.6、数据追加与覆盖

数据追加（into）：

load data local inpath '/mine/students.txt' into table external_student;
load data local inpath '/mine/students.txt' into table student;

查看hive表最近的分区查看hive分区数_hive_18

查看hive表最近的分区查看hive分区数_hive_19

数据覆盖（overwrite into）：

load data local inpath '/mine/students2.txt' overwrite into table external_student;
load data local inpath '/mine/students2.txt' overwrite into table student;

查看hive表最近的分区查看hive分区数_1024程序员节_20

查看hive表最近的分区查看hive分区数_hive_21

3、分区

动态分区参数设置：

set hive.exec.dynamic.partition.mode=nonstrict;

create external table external_student(id string,name string ,sex string ,age int)
partitioned by (address string)
row format delimited fields terminated by ','
stored as textfile;

3.1、插入数据分区

insert into table partition_student partition(address) select id,name,sex,age,address from student;
insert overwrite table partition_student partition(address) select id,name,sex,age,address from student;

3.2、Hive分区特点：

1.分区列尽量不要使用中文

2.分区尽量使用分区密度较为适中的列。比如说地区日期年龄，不适合的是性别姓名 ID号

3.分区尽量不要使用联合分区，也就是说分区列尽量设置成一个

4.分区列尽量不要过为零散也就是说分区跨度尽量不要过大

3.4、Hive分区优点：

1.加快查询速度在我们的HIVE中分区就是一种变相的索引有助于增加数据查询速度

2.让数据分布更线性将多而密的单个或多个数据文件，分散成以分区为模的文件矩阵中方式某个节点（datanode）由于数据集中造成热点征用问题

3.分区便于目录级别数据管理

4.静态分区可以大批量修改为同值数据

3.5、Hive分区缺点：

1.分区依据难以把握当出现多个分区可能备选项时容易产生错误选择从而导致影响速度

2.分区由于将数据再次分布所以对NAMENODE产生的压力会增加

4、分桶

Hive还可以把表或分区，组织成桶。将表或分区组织成桶有以下几个目的：

（1）第一个目的是为看取样更高效，因为在处理大规模的数据集时，在开发、测试阶段将所有的数据全部处理一遍可能不太现实，这时取样就必不可少。

（2）第二个目的是为了获得更好的查询处理效率。

Hive 建立分桶的意义：便于做join，不用做笛卡儿积，提高join效率。

桶为表提供了额外的结构，Hive在处理某些查询时利用这个结构，能够有效地提高查询效率。桶是通过对指定列进行哈希计算来实现的，通过哈希值将一个列名下的数据切分为一组桶，并使每个桶对应于该列名下的一个存储文件。

在建立桶之前，需要设置set hive.enforce.bucketing=true;，使得hive能识别桶。

4.1、建表语句

create external table external_student(id string,name string ,sex string ,age int)
clustered by (age) sorted by (age) into 4 buckets
row format delimited fields terminated by ','
stored as textfile;

查看hive表最近的分区查看hive分区数_Hive_22

5、hive的几种排序方式

order by ------------------------------ 在reduce 结果文件中使用排序，会强制进行全局排序
sort by ------------------------------ 在reduce 过程发生前进行排序
distribute by ------------------------------ 在map过程中进行分区
cluster by（distribute by + sort by）

6、hive的语句执行

hive -e 执行一条sql语句

hive -f 执行一个sql文件

7、ROW_NUMBER、RANK和DENSE_RANK的区别

row_number（）：按顺序编号，不留空位
rank（）：按顺序编号，相同的值编相同号，留空位
dense_rank（）：按顺序编号，相同的值编相同的号，不留空位

8、Hive 执行过程概述

（1） Hive 将 HQL 转换成一组操作符（Operator），比如 GroupByOperator, JoinOperator 等

（2）操作符 Operator 是 Hive 的最小处理单元

（3）每个操作符代表一个HDFS操作或者MapReduce作业

（4）Hive 通过 ExecMapper 和 ExecReducer 执行 MapReduce 程序，执行模式有本地模式和分布式两种模式

9、表的连接与表的子查询

9.1、表的连接

（1）inner join（join）

（2）left outer join（left join）

（3）right outer join（right join）

（4）full outer join （full join）

（5）left semi join 只保留join的左半边

9.2 、表的子查询

支持子查询，但不支持exists/in的子查询！！！

9.3、保存select查询结果的几种方式

（1）将查询结果放入新的hive表中

create table tmp_table
as 
select * from student;

（2）将查询结果保存在一个已经存在的hive表中

insert into table tmp_table select * from student;
insert overwrite table tmp_table select * from student;

（3）将查询结果保存到指定的目录里（可以时本地，也可以是hdfs）

insert overwrite  directory '/tmp_table' select * from student;
insert overwrite  local directory '/tmp_table' select * from student;

查看hive表最近的分区查看hive分区数_数据_23

10、join

10.1、数据：

a.txt：

1,a
2,b
3,c
4,d
7,y
8,u

b.txt：

2,bb
3,cc
7,yy
9,pp

10.2、建表和初始化数据

create table a(id string,name string)
row format delimited fields terminated by ','
stored as textfile;


create table b(id string,name string)
row format delimited fields terminated by ','
stored as textfile;

load data local inpath '/mine/a.txt' into table a;
load data local inpath '/mine/b.txt' into table b;

（1）inner join（join）

select * from a inner join b on a.id = b.id;

查看hive表最近的分区查看hive分区数_Hive_24

（2）left outer join（left join）

select * from a left outer join b on a.id = b.id;

查看hive表最近的分区查看hive分区数_Hive_25

（3）right outer join（right join）

select * from a right outer join b on a.id = b.id;

查看hive表最近的分区查看hive分区数_查看hive表最近的分区_26

（4）full outer join （full join）

select * from a full outer join b on a.id = b.id;

查看hive表最近的分区查看hive分区数_Hive_27

（5）left semi join 只保留join的左半边

select * from a left semi join b on a.id = b.id;

查看hive表最近的分区查看hive分区数_查看hive表最近的分区_28

11、 Hive内置函数、自定义函数和Transform

11.1、内置函数

1、substr()、concat()、split()：

select substr('123456',1,3);

查看hive表最近的分区查看hive分区数_查看hive表最近的分区_29

select concat ('a','b','c');
select concat ('a','b');

查看hive表最近的分区查看hive分区数_hive_30

查看hive表最近的分区查看hive分区数_1024程序员节_31

select split('a,b,c',',');
######split是按照分割符切成数组array
select split('a,b,c',',')[1];

查看hive表最近的分区查看hive分区数_hive_32

查看hive表最近的分区查看hive分区数_查看hive表最近的分区_33

split函数的第二个参数是正则表达式。

2、类型转换函数cast()：

select cast('5' as int);

查看hive表最近的分区查看hive分区数_hive_34

create table t_tmp
as
select cast('5' as int);

查看hive表最近的分区查看hive分区数_查看hive表最近的分区_35

select current_timestamp;

查看hive表最近的分区查看hive分区数_数据_36

select cast(current_timestamp as date);

查看hive表最近的分区查看hive分区数_查看hive表最近的分区_37

3、数学函数

round(NUMBER)、round(NUMBER,3)、ceil(NUMBER)、floor(NUMBER)、abs(NUMBER)

4、explode函数（将行转成列）表生成函数

create table student_subject(id int ,name string , subjects array<string>)
row format delimited fields terminated by ','
collection items terminated by ':';

subjects.dat：

1,zhangsan,化学:物理:数学:语文
2,lisi,化学:数学:生物:生理:卫生
3,wangwu,化学:语文:英语:体育:生物

load data local inpath '/mine/subjects.dat' into table student_subject;

select * from student_subject;

查看hive表最近的分区查看hive分区数_1024程序员节_38

select explode(subjects) from student_subject;

查看hive表最近的分区查看hive分区数_数据_39

explode()函数会产出一个表数据。

select distinct tmp.sub from (select explode(subjects) as sub from student_subject) tmp;

查看hive表最近的分区查看hive分区数_hive_40

lateral view：

select id,name,tmp.sub from
student_subject
lateral view
explode(subjects) tmp as sub;

查看hive表最近的分区查看hive分区数_hive_41

5、hql 编写 wordcount语句：

create table words(sequence string);

wordcount.txt：

hello world
hello kitty
hello jerry
tom
wtl
lover is love
like like liker

load data local inpath '/mine/wordcount.txt' into table words;

select * from words;

查看hive表最近的分区查看hive分区数_数据_42

select tmp.word,count(tmp.word) from (select explode(split(sequence,' ')) as word from words) tmp group by tmp.word;

查看hive表最近的分区查看hive分区数_查看hive表最近的分区_43

6、sort_array

select sort_array(array(2,4,1,5,3,8,0,9));

查看hive表最近的分区查看hive分区数_查看hive表最近的分区_44

array是构造函数。

7、条件控制函数

case when then end

select id,name,
case
when info.age <= 18 then '未成年人'
when info.age > 18 and info.age < 35 then '青年人'
else '中年人'
end
from t_user;

查看hive表最近的分区查看hive分区数_Hive_45

if()函数：

select movie_name,actors,first_show,
if(array_contains(actors,'吴刚'),'好电影','不好的电影')
from movie;

查看hive表最近的分区查看hive分区数_hive_46

8、over开窗函数

create table rn(id int , age int,name string,sex string)
row format delimited fields terminated by ',';

rn.dat：

1,18,a,male
2,19,b,male
3,22,c,female
4,16,d,female
5,30,e,male
6,26,f,female

load data local inpath '/mine/rn.dat' into table rn;

select * from rn;

查看hive表最近的分区查看hive分区数_Hive_47

select * from
(select id,age,name,
row_number() over(partition by sex order by age) as num
from rn) tmp
where num < 3;

查看hive表最近的分区查看hive分区数_hive_48

11.2、自定义函数类别

当Hive提供的内置函数无法满足你的业务处理需要时，此时就可以考虑使用用户自定义函数。

当写Hive UDF时，有两个选择：一是继承 UDF类，二是继承抽象类GenericUDF。这两种实现不同之处是：GenericUDF 可以处理复杂类型参数，并且继承GenericUDF更加有效率，因为UDF class 需要HIve使用反射的方式去实现。

GenericUDF实现比较复杂，需要先继承GenericUDF。这个API需要操作Object Inspectors，并且要对接收的参数类型和数量进行检查。GenericUDF需要实现以下三个方法：

11.3、GenericUDF开发实例

//这个方法只调用一次，并且在evaluate()方法之前调用。该方法接受的参数是一个ObjectInspectors数组。该方法检查接受正确的参数类型和参数个数。  
abstract ObjectInspector initialize(ObjectInspector[] arguments);  
  
//这个方法类似UDF的evaluate()方法。它处理真实的参数，并返回最终结果。  
abstract Object evaluate(GenericUDF.DeferredObject[] arguments);  
  
//这个方法用于当实现的GenericUDF出错的时候，打印出提示信息。而提示信息就是你实现该方法最后返回的字符串。  
abstract String getDisplayString(String[] children);

package application;


import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.Text;

import java.util.ArrayList;
import java.util.List;

/**
 * @author: wtl
 * @License: (C) Copyright 2020, wtl Corporation Limited.
 * @Date: 2020-10-18 11:33
 * @Version: 1.0
 * @Description:
 */
public class ToLower extends GenericUDF {
    private ListObjectInspector listObjectInspector;

    @Override
    public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
        listObjectInspector = (ListObjectInspector) arguments[0];

        ObjectInspector listElementObjectInspector = listObjectInspector.getListElementObjectInspector();

        if (!ObjectInspectorUtils.compareTypes(listElementObjectInspector,arguments[1])){
            throw new RuntimeException("参数有不匹配！！！");
        }

        return ObjectInspectorFactory.getStandardListObjectInspector(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
    }

    @Override
    public Object evaluate(DeferredObject[] arguments) throws HiveException {

        List<Text> list = (List<Text>) arguments[0].get();
        Text value = (Text) arguments[1].get();

        System.out.println(value);

        List<String> resultList = new ArrayList<>();

        for (Text v : list) {
            resultList.add(v.toString() + ":" + value.toString());
        }

        System.out.println(resultList);
        return resultList;
    }

    @Override
    public String getDisplayString(String[] children) {
        return "test";
    }
}

1、add jar /mine/udf-jar-with-dependencies.jar;

2、CREATE TEMPORARY FUNCTION hello AS "application.ToLower";

3、select hello(array('test,hello'),'hel');

explode函数：

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.hadoop.hive.ql.udf.generic;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.TaskExecutionException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;

/**
 * GenericUDTFExplode.
 *
 */
@Description(name = "explode",
    value = "_FUNC_(a) - separates the elements of array a into multiple rows,"
      + " or the elements of a map into multiple rows and columns ")
public class GenericUDTFExplode extends GenericUDTF {

  private transient ObjectInspector inputOI = null;
  @Override
  public void close() throws HiveException {
  }

  @Override
  public StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {
    if (args.length != 1) {
      throw new UDFArgumentException("explode() takes only one argument");
    }

    ArrayList<String> fieldNames = new ArrayList<String>();
    ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();

    switch (args[0].getCategory()) {
    case LIST:
      inputOI = args[0];
      fieldNames.add("col");
      fieldOIs.add(((ListObjectInspector)inputOI).getListElementObjectInspector());
      break;
    case MAP:
      inputOI = args[0];
      fieldNames.add("key");
      fieldNames.add("value");
      fieldOIs.add(((MapObjectInspector)inputOI).getMapKeyObjectInspector());
      fieldOIs.add(((MapObjectInspector)inputOI).getMapValueObjectInspector());
      break;
    default:
      throw new UDFArgumentException("explode() takes an array or a map as a parameter");
    }

    return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
        fieldOIs);
  }

  private transient final Object[] forwardListObj = new Object[1];
  private transient final Object[] forwardMapObj = new Object[2];

  @Override
  public void process(Object[] o) throws HiveException {
    switch (inputOI.getCategory()) {
    case LIST:
      ListObjectInspector listOI = (ListObjectInspector)inputOI;
      List<?> list = listOI.getList(o[0]);
      if (list == null) {
        return;
      }
      for (Object r : list) {
        forwardListObj[0] = r;
        forward(forwardListObj);
      }
      break;
    case MAP:
      MapObjectInspector mapOI = (MapObjectInspector)inputOI;
      Map<?,?> map = mapOI.getMap(o[0]);
      if (map == null) {
        return;
      }
      for (Entry<?,?> r : map.entrySet()) {
        forwardMapObj[0] = r.getKey();
        forwardMapObj[1] = r.getValue();
        forward(forwardMapObj);
      }
      break;
    default:
      throw new TaskExecutionException("explode() can only operate on an array or a map");
    }
  }

  @Override
  public String toString() {
    return "explode";
  }
}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：stackstorm怎么判断分支如何判断struts框架

下一篇：SocketIOClient可以存入到redis吗 socketio client

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯