hivetask运行流程 hivestack

转载

技术博客领航者 2023-12-04 21:05:43

文章标签 hivetask运行流程 hive 字符串 bc unix 文章分类 Hive 大数据

hive常用的内置函数实例解析

一、日期函数

1.1 日期>>>>时间戳：unix_timestamp()
1.2 时间戳>>>>日期：from_unixtime(bigint unixtime,string format)
1.3 获取当前日期：current_date()
1.4 获取当前日期 + 分钟：current_timestamp()
1.5 日期格式字符串转化为日期：date(string date)
1.6 获取日期格式字符串中具体的年、月、日、时、分、秒：year()、month()、day()、hour()、minute()、second()
1.7 日期比较函数，返回两个日期的差值：datediff(string date1,string date2)
1.8 日期减少函数，返回减少 x 天后的日期：date_sub(string date,int x)
1.9 日期增加函数,返回增加 x 天后的日期：date_add(string date, int x)
1.10 日期转周函数，获取一年当中的第几周：weekofyear(string date)
1.11 日期转天函数,获取当天日期是一周当中的第几天：dayofweek(string date)

二、字符串函数

2.1 获取字符串长度：length(string A)
2.2 字符串反转：reverse(string A)
2.3 字符串连接：concat(string A, string B…)
2.4 带分隔符的字符串连接：concat_ws((string SEP, string A, string B…)
2.5 字符串截取：substr,substring
2.6 字符串转大写： upper(string A) 、ucase(string A)
2.7 字符串转小写：lower(string A) 、lcase(string A)
2.8 去除字符串两边的空格：trim(string A)
2.9 去除字符串左边的空格：ltrim(string A)
2.10 去除字符串右边的空格：rtrim(string A)
2.11 使用正则表达式替换：regexp_replace(string A, string B, string C)
2.12 使用正则表达式匹配：regexp_extract(string subject, string pattern, int index)
2.13 URL 解析函数：parse_url(string urlString, string partToExtract [, string keyToExtract])
2.14 json解析函数：get_json_object(string json_string, string path)
2.15 空格字符串函数，返回长度为n的字符串：space(int n)
2.16 重复字符串函数，返回重复n次后的str字符串：repeat(string str, int n)
2.17 获取首字符ASCII值函数，返回字符串str第一个字符的ascii码：ascii(string str)
2.18 左补足函数，将str用pad进行左补足到len位：lpad(string str, int len, string pad)
2.19 右补足函数，将str用pad进行左补足到len位：rpad(string str, int len, string pad)
2.20 分割字符串函数：split(string str, string pat)
2.21 集合查找函数，返回str在strlist第一次出现的位置：find_in_set(string str, string strList)

三、集合统计函数

3.1 个数统计函数：count()
3.2 总和统计函数：sum(col), sum(DISTINCT col)
3.3 平均值统计函数：avg(col), avg(DISTINCT col)
3.4 最小值统计函数： min(col)
3.5 最大值统计函数： max(col)
3.6 非空集合总体变量函数：var_pop(col)
3.7 非空集合样本变量函数：var_samp (col)
3.8 总体标准偏离函数：stddev_pop(col)
3.9 样本标准偏离函数：stddev_samp(col)
3.10 中位数函数：percentile()
3.12 近似中位数函数：percentile_approx()
3.13 直方图：histogram_numeric(col, b)

四、数值计算

4.1 取整函数(四舍五入)：round()
4.2 向上取整：ceil(double a), ceiling(double a)
4.3 向下取整：floor(double a)
4.4 取随机函数：rand()
4.5 自然指数函数，返回自然对数e的a次方：exp(double a)
4.6 以10为底对数函数，返回以10为底的a的对数：log10(double a)
4.7 以2为底对数函数，返回以2为底的a的对数：log2(double a)
4.8 对数函数,返回以base为底的a的对数： log(double base, double a)
4.9 幂运算函数，返回a的p次幂：pow(double a, double p)， power(double a, double p)
4.10 开平方函数，返回a的平方根：sqrt(double a)
4.11 二进制函数，返回a的二进制码：bin(BIGINT a)
4.12 十六进制函数：hex(BIGINT a)
4.13 反转十六进制函数，返回该十六进制字符串所代码的字符串： unhex(string a)
4.14 进制转换函数，将数值num从from_base进制转化到to_base进制：conv(BIGINT num, int from_base, int to_base)
4.15 绝对值函数，返回数值a的绝对值： abs(double a) ， abs(int a)
4.16 取余函数，返回a除以b的余数：pmod(int a, int b),pmod(double a, double b)
4.17 正弦函数，返回a的正弦值： sin(double a)
4.18 反正弦函数，返回a的反正弦值：asin(double a)
4.19 余弦函数，返回a的余弦值：cos(double a)
4.20 反余弦函数，返回a的反余弦值：acos(double a)

五、数学运算

5.1 加法：+
5.2 减法：-
5.3 乘法：*
5.4 除法：/
5.5 取余：%
5.6 位与：&
5.7 位或：|
5.8 位异或：^
5.9 位取反：~

六、关系运算

6.1 等值比较： =
6.2 不等值比较： <>, !=
6.3 小于比较：<
6.4 小于等于比较：<=
6.5 大于比较：>
6.6 大于等于比较：>=
6.7 空值判断：is null
6.8 非空判断：is not null
6.9 like比较,匹配字符串A中符合B的部分：string A like string B
6.10 rlike比较，匹配字符串A中符合正则表达式B的部分：string A like string B
6.11 正则regexp匹配：string A regexp string B

七、逻辑运算

7.1 逻辑与：and
7.2 逻辑或：or
7.3 逻辑非：not

八、条件函数

8.1 if 函数
8.2 条件判断函数 case
8.3 非空查找函数：COALESCE(T v1, T v2, …)

九、表生成函数

9.1 map类型构建：map (key1, value1, key2, value2, …)
9.2 struct类型构建：struct(val1, val2, val3, …)
9.3 array类型构建：array(val1, val2, …)
9.4 将array 或 map 扩展成多行多列：explode(array())，explode(map())
9.5 将array转单列多行输出，在每行结果前增加序号：posexplode(array())
9.6 按指定需求，将array转多列多行，灵活输出：stack(int num,array())
9.7 根据元祖中josn字符串构建：json_tuple()
9.8 将分组中的某列转为一个数组返回：collect_list、collect_set

十、复杂类型访问操作

10.1 map类型访问：M[key]
10.2 struct类型访问: S.x
10.3 array类型访问： A[n]
10.4 Map类型长度函数，返回map类型的长度：size(map)
10.5 array类型长度函数，返回array类型的长度：size(Array)

十一、类型转换函数

11.1 类型转换函数，返回转换后的数据类型的结果：cast(expr as )

一、日期函数

1.1 日期>>>>时间戳：unix_timestamp()

返回值：bigint

①获取当前时间戳： unix_timestamp()

select unix_timestamp();      
--1565858389

②将日期格式字符串转为时间戳： unix_timestamp(string timestame)

输入的时间戳格式必须为 ‘yyyy-MM-dd HH:mm:ss’ ,如不符合则返回 null

select unix_timestamp('2019-08-15 16:40:00');   
--1565858400

select unix_timestamp('2019-08-15');  
--null

③将指定时间字符串格式字符串转化成unix时间戳： unix_timestamp(string date,string pattern) 如日期不符合对应格式则返回 null

select unix_timestamp('2019-08-15','yyyy-MM-dd');   
--1565798400

select unix_timestamp('2019-08-15','yyyy-MM-dd HH:mm:ss');
--null

select unix_timestamp('2019-08-15 16:40:00','yyyy-MM-dd HH:mm:ss');
--1565858400

1.2 时间戳>>>>日期：from_unixtime(bigint unixtime,string format)

返回值：string

①将时间戳秒数转化为UTC时间，并用字符串表示，可通过format规定的时间格式，指定输出的时间

select from_unixtime(1565858389,'yyyy-MM-dd HH:mm:ss');
--2019-08-15 16:39:49

select from_unixtime(1565858389,'yyyy-MM-dd');
-- 2019-08-15

②如果 unixtime 是13位的，需要先转成10位

select from_unixtime(cast(1553184000488/1000 as int),'yyyy-MM-dd HH:mm:ss'); 
-- 2019-03-22 00:00:00

--或者使用字符串函数 substr 从第一位截取10个数字，截取时间戳前10位 
select from_unixtime(cast(substr(1553184000488,1,10) as int),'yyyy-MM-dd HH:mm:ss');
-- 2019-03-22 00:00:00

1.3 获取当前日期：current_date()

返回值：string

select current_date();
-- 2020-09-28

1.4 获取当前日期 + 分钟：current_timestamp()

返回值：string

select current_timestamp();  
-- 2020-09-19 20:46:46.232

1.5 日期格式字符串转化为日期：date(string date)

返回值：string

select date('2020-07-25');  
-- 2020-07-25

select date('2020-07-25 00:00:00');  
-- 2020-07-25

1.6 获取日期格式字符串中具体的年、月、日、时、分、秒：year()、month()、day()、hour()、minute()、second()

返回值：int

select year('2020-09-22 15:16:02');
-- 2020

select month('2020-09-22 15:16:02');
-- 9

select day('2020-09-22 15:16:02');
-- 22

select hour('2020-09-22 15:16:02');
-- 15

select minute('2020-09-22 15:16:02');
-- 16

select second('2020-09-22 15:16:02');
-- 2

-- 注：字符串不带分钟，查询 时、分、秒 返回 null
select hour('2020-09-22');
-- null

1.7 日期比较函数，返回两个日期的差值：datediff(string date1,string date2)

返回值：int

函数输入参数的日期格式为：yyyy-MM-dd HH:mm:ss 、 yyyy-MM-dd

select datediff('2020-01-20','2020-01-10');
-- 10 

select datediff('2020-01-10','2020-01-20');
-- 返回-10，所以日期相减，应该是新的日期在前

1.8 日期减少函数，返回减少 x 天后的日期：date_sub(string date,int x)

返回值：string

select date_sub('2020-09-04',3);
-- 2020-09-01
select date_sub('2020-09-04 09:12:41',2); 
-- 2020-09-02

1.9 日期增加函数,返回增加 x 天后的日期：date_add(string date, int x)

返回值：string

select date_add('2020-09-04',3);
-- 2020-09-07
select date_add('2020-09-04 09:12:41',2); 
-- 2020-09-06

1.10 日期转周函数，获取一年当中的第几周：weekofyear(string date)

返回值：int

select weekofyear('2020-01-02 13:49:02');
-- 1
select weekofyear('2020-01-02'); 
-- 1

1.11 日期转天函数,获取当天日期是一周当中的第几天：dayofweek(string date)

返回值：int

select dayofweek('2020-09-22'); 
-- 返回3 (当前是星期二，但是一周是从星期天开始，星期天是1，所以星期二就是返回3)
select dayofweek('2020-09-22 13:49:02'); 
-- 3

二、字符串函数

2.1 获取字符串长度：length(string A)

返回值： int

select length("abcd");
-- 4

2.2 字符串反转：reverse(string A)

返回值： string

select reverse("abcd");
-- dcba

2.3 字符串连接：concat(string A, string B…)

返回值: string

select concat('abc','def','g');
-- abcdefg

2.4 带分隔符的字符串连接：concat_ws((string SEP, string A, string B…)

返回值: string
返回输入字符串连接后的结果，SEP可以指定各个字符串间的分隔符

select concat_ws(',','abc','def','gh');
-- abc,def,gh

注： concat函数和concat_ws函数的区别：

concat函数在连接字符串的时候，只要其中一个是NULL，那么将返回NULL

select concat('a','b');
-- ab
select concat('a','b',null);
NULL

concat_ws函数在连接字符串的时候，只要有一个字符串不是NULL，就不会返回NULL

select concat_ws('-','a','b');
-- ab
select concat_ws('-','a','b',null);
-- a-b
select concat_ws('','a','b',null);
-- ab

2.5 字符串截取：substr,substring

二者用法一致，返回值为 string

① 从指定位置 start 开始截取，截取到末尾，起始下标为 1： substr(string A, int start),substring(string A, int start)

select substr('abcdef',3);
-- cdef

select substring('abcdef',3);
-- cdef

-- 输入负值 -x，末尾往头部从 "倒数第x位" 反转截取，截取到末尾
select substr('abcdef',-3);
-- def

select substring('abcdef',-3);
-- def

②从指定位置 start 开始截取，截取长度为len的字符串，起始下标为 1：substr(string A, int start, int len),substring(string A, int start, int len)

select substr('abcdef',3,2);
-- cd

select substring('abcdef',3,2);
-- cd

-- 重倒数第3位开始截取，保留两个字符
select substr('abcdef',-3,2);
-- de

select substring('abcdef',-3,2);
-- de

2.6 字符串转大写： upper(string A) 、ucase(string A)

返回值: string

select upper('abcde');
-- ABCDE

select upper('aBcDe');
-- ABCDE

select ucase('abcde');
-- ABCDE

select ucase('aBcDe');
-- ABCDE

2.7 字符串转小写：lower(string A) 、lcase(string A)

返回值: string

select lower('ABCDE');
-- abcde

select lower('ABCde');
-- abcde

select lcase('ABCDE');
-- abcde

select lcase('ABCde');
-- abcde

2.8 去除字符串两边的空格：trim(string A)

返回值: string

select trim(' abc ');
-- abc

select trim(' a b c ');
-- 不会取出字符串中间的空格a b c

2.9 去除字符串左边的空格：ltrim(string A)

返回值: string

select ltrim(' abc ');
-- abc(空格) 

select length(ltrim(' abc '));
-- 4

2.10 去除字符串右边的空格：rtrim(string A)

返回值: string

select rtrim(' abc ');
-- (空格)abc

select length(rtrim(' abc '));
-- 4

2.11 使用正则表达式替换：regexp_replace(string A, string B, string C)

将 字符串A 中的符合 java正则表达式B 的部分替换为 字符串C
返回值: string

select regexp_replace('oneDay-2019-03-15','\\d{4}-\\d{2}-\\d{2}','2020-09-22');
-- oneDay-2020-09-22

2.12 使用正则表达式匹配：regexp_extract(string subject, string pattern, int index)

将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符
subject 是需要匹配的字符串
pattern 代表正则表达式，表达式中需包含一个或两个括号 字符串选择器 ()
index 包含0、1、2 三个数字，0代表匹配整个 pattern 中的内容，1代表匹配 第一个() 中的内容，2代表匹配 第二 () 中的内容
无论 subject 中有有几个符合匹配条件的元素，默认取第一个
返回值: string

例①：

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([0-9]+)([a-z]+)',0);
-- x=111abc

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([0-9]+)([a-z]+)',1);
-- 111

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([0-9]+)([a-z]+)',2);
-- abc

例②：

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([a-z]+)([0-9]+)',0);
-- x=ab123

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([a-z]+)([0-9]+)',1);
-- ab

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([a-z]+)([0-9]+)',2);
-- 123

例③：

select regexp_extract('https://detail.tmall.com/item.htm?spm=608.7065813.ne.1.Ni3rsN&id=522228774076&tracelog=fromnonactive','id=([0-9]+)',0);   
-- id=522228774076

select regexp_extract('https://detail.tmall.com/item.htm?spm=608.7065813.ne.1.Ni3rsN&id=522228774076&tracelog=fromnonactive','id=([0-9]+)',1);    
-- 522228774076

例④：

select regexp_extract('http://a.m.taobao.com/i41915173660.htm','i([0-9]+)',0);   
-- i41915173660

select regexp_extract('http://a.m.taobao.com/i41915173660.htm','i([0-9]+)',1);
-- 41915173660

例⑤：

select regexp_extract('123-basketball-#@','bas(.*?)(ball)',0);
-- basketball

select regexp_extract('123-basketball-#@','bas(.*?)(ball)',1);
-- ket

select regexp_extract('123-basketball-#@','bas(.*?)(ball)',2);
-- ball

2.13 URL 解析函数：parse_url(string urlString, string partToExtract [, string keyToExtract])

返回值：string
返回URL中指定的部分，partToExtract 的选项包含：[ host， path，query，ref，protocol，file， authority，userinfo ]

select parse_url('http://facebook.com/path/p1.php?query=1', 'PROTOCOL');   
-- http

select parse_url('http://facebook.com/path/p1.php?query=1', 'HOST');
-- facebook.com

select parse_url('http://facebook.com/path/p1.php?query=1', 'REF');
-- NULL

select parse_url('http://facebook.com/path/p1.php?query=1', 'PATH');
-- /path/p1.php

select parse_url('http://facebook.com/path/p1.php?query=1', 'QUERY');
-- query=1

select parse_url('http://facebook.com/path/p1.php?query=1', 'FILE');
-- /path/p1.php?query=1

select parse_url('http://facebook.com/path/p1.php?query=1', 'AUTHORITY');
-- facebook.com

select parse_url('http://facebook.com/path/p1.php?query=1', 'USERINFO');
-- NULL

2.14 json解析函数：get_json_object(string json_string, string path)

解析json的字符串json_string,返回path指定的内容
如果输入的json字符串无效，那么返回null
返回值：string

select  get_json_object(
'{
"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}},
"email":"amy@only_for_json_udf_test.net",
"owner":"amy"
}'
,'$.owner');
-- amy

2.15 空格字符串函数，返回长度为n的字符串：space(int n)

返回值：string

select space(10);
-- 返回10个空格，不可见

select length(space(10));
-- 10

2.16 重复字符串函数，返回重复n次后的str字符串：repeat(string str, int n)

返回值：string

select repeat('abc',5);
-- abcabcabcabcabc

2.17 获取首字符ASCII值函数，返回字符串str第一个字符的ascii码：ascii(string str)

返回值：int

select ascii('abcde');
-- 97

select ascii('123ABC');
-- 49

2.18 左补足函数，将str用pad进行左补足到len位：lpad(string str, int len, string pad)

返回值：string

select lpad('abc',10,'123');
 -- 1231231abc

2.19 右补足函数，将str用pad进行左补足到len位：rpad(string str, int len, string pad)

返回值：string

select rpad('abc',10,'123');
 -- abc1231231

2.20 分割字符串函数：split(string str, string pat)

将字符串 str 按照 pat 进行分割，返回分割后的字符串数组
返回值：array

select split('a1b1c1d1',"1");
-- ["a","b","c","d",""]

select split("a1 b1 c1 d1"," ");
-- ["a1","b1","c1","d1"]

select split("a,b,c,d",",");
-- ["a","b","c","d"]

2.21 集合查找函数，返回str在strlist第一次出现的位置：find_in_set(string str, string strList)

strlist是用逗号分割的字符串
起始下标从1开始，如果没有找该str字符，返回0
返回值：int

select find_in_set('ab','ab,ac,ad');
-- 1 

select find_in_set('ab','ab,ac,ad,ab');
-- 1 有多个重复的元素，默认返回第一个

select find_in_set('ad','ab,ac,ad');
-- 3

select find_in_set('abc','ab,ac,ad,ab');
-- 0  元素不存在，返回0

select find_in_set('ab','ab ac ad');
-- 0 用空格分割的字符串不识别，默认'ab ab ac ad'为一个整体

三、集合统计函数

建表：

create table if not exists student(
id int,
name string,
score int
)
;

插入数据：

insert into student values
(1,'a',100),
(2,'b',90),
(3,'c',80),
(4,'c',80),
(5,null,null)
;

查看表数据：

select * from student;

hivetask运行流程 hivestack_unix

3.1 个数统计函数：count()

参数：count(*), count(expr), count(DISTINCT expr[, expr_.])
count(*) 统计检索出的行的个数，包括NULL值的行；
count(expr) 返回指定字段的非空值的个数；
count(DISTINCT expr[, expr_.]) 返回指定字段的不同的非空值的个数(去重)
返回值: int

select count(*) from student;
-- 5

select count(1) from student;
-- 5  注：count(1)效率更高一些

select count(name) from student;
-- 4

select count(distinct name ) from student;
-- 3

3.2 总和统计函数：sum(col), sum(DISTINCT col)

sum(col) 统计结果集中col的相加的结果
sum(DISTINCT col) 统计结果中col不同值相加的结果
返回值：double

select sum(score) from student;
-- 350

select sum(distinct score) from student;
-- 270

3.3 平均值统计函数：avg(col), avg(DISTINCT col)

avg(col) 统计结果集中col的平均值
avg(DISTINCT col) 统计结果中col不同值相加的平均值
返回值：double

select avg(score) from student;
-- 87.5

select avg(distinct score) from student;
-- 90

3.4 最小值统计函数： min(col)

返回统计结果col字段的最小值
返回值：double

select min(score) from student;
-- 80

3.5 最大值统计函数： max(col)

返回统计结果col字段的最大值
返回值：double

select max(score) from student;
-- 100

3.6 非空集合总体变量函数：var_pop(col)

统计结果集中col非空集合的总体变量（忽略null）
返回值：double

select var_pop(score) from student;
-- 68.75

3.7 非空集合样本变量函数：var_samp (col)

统计结果集中col非空集合的样本变量（忽略null）
返回值：double

select var_samp(score) from student;
-- 91.66666666666667

3.8 总体标准偏离函数：stddev_pop(col)

计算总体标准偏离，并返回总体变量的平方根，其返回值与VAR_POP函数的平方根相同
返回值：double

select stddev_pop(score) from student;
-- 8.2915619758885

3.9 样本标准偏离函数：stddev_samp(col)

计算样本标准偏离
返回值：double

select stddev_samp(score) from student;
-- 9.574271077563381

3.10 中位数函数：percentile()

percentile(BIGINT col, p)
求准确的第pth个百分位数，p必须介于0和1之间，但是col字段目前只支持整数，不支持浮点数类型
返回值：double
percentile(BIGINT col, array(p1 [, p2]…))
返回值：array<double>

select percentile(score,0) from student;
-- 80.0

select percentile(score,0.5) from student;
-- 80.0

select percentile(score,1) from student;
-- 100.0

3.12 近似中位数函数：percentile_approx()

percentile_approx(DOUBLE col, p [, B])
求近似的第pth个百分位数，p必须介于0和1之间，返回类型为double，但是col字段支持浮点类型。参数B控制内存消耗的近似精度，B越大，结果的准确度越高。默认为10,000。当col字段中的distinct值的个数小于B时，结果为准确的百分位数
返回值：double
percentile_approx(DOUBLE col, array(p1 [, p2]…) [, B])
功能和上述类似，之后后面可以输入多个百分位数，返回类型也为array，其中为对应的百分位数
返回值：array<double>

3.13 直方图：histogram_numeric(col, b)

以b为基准计算col的直方图信息
返回值：array<struct {‘x’,‘y’}>

select histogram_numeric(score,5) from student;
-- [{"x":80.0,"y":2.0},{"x":90.0,"y":1.0},{"x":100.0,"y":1.0}]

四、数值计算

4.1 取整函数(四舍五入)：round()

round(double a) 去除小数部分，返回四舍五后的整数
返回值：bigint
round(double a, int d) ,返回指定精度d的double类型
返回值：double

select round(3.1415);
-- 3

select round(3.5);
-- 4

select round(3.1415,2);
-- 3.14

4.2 向上取整：ceil(double a), ceiling(double a)

返回大于或等于该double变量 a 的最小整数
返回值：bigint

select ceil(3.14);
-- 4

select ceil(3);
-- 3

select ceiling(3.14);
-- 4

select ceiling(3);
-- 3

4.3 向下取整：floor(double a)

返回小于或等于该double a 变量的最大整数
返回值：

select floor(3.14);
-- 3

select floor(3.99);
-- 3

select floor(3);
-- 3

4.4 取随机函数：rand()

rand() 返回一个0到1范围内的随机数
rand(int seed) 指定一个种子 seed，返回一个稳定的0到1范围内的随机数
返回值：double

select rand();
-- 0.06562041568797305

select rand();
-- 0.15920967950757114

select rand(1);
-- 0.7308781907032909
 
select rand(1);
-- 0.7308781907032909

4.5 自然指数函数，返回自然对数e的a次方：exp(double a)

返回值：double

select exp(1);
-- 2.718281828459045

select exp(2);
-- 7.38905609893065

4.6 以10为底对数函数，返回以10为底的a的对数：log10(double a)

返回值：double

select log10(10);
-- 1.0

select log10(100);
-- 2.0

4.7 以2为底对数函数，返回以2为底的a的对数：log2(double a)

返回值：double

select log2(2);
-- 1.0

select log2(4);
-- 2.0

select log2(8);
-- 3.0

4.8 对数函数,返回以base为底的a的对数： log(double base, double a)

返回值：double

select log(2,32);
-- 5.0

select log(3,10);
-- 2.095903274289385

4.9 幂运算函数，返回a的p次幂：pow(double a, double p)， power(double a, double p)

返回值：double

select pow(2,4);
-- 16.0

select pow(3,3);
-- 27.0

select power(2,4);
-- 16.0

select power(3,3);
-- 27.0

4.10 开平方函数，返回a的平方根：sqrt(double a)

返回值：double

select sqrt(9);
-- 3.0

4.11 二进制函数，返回a的二进制码：bin(BIGINT a)

返回值：string

select bin(10);
-- 1010

4.12 十六进制函数：hex(BIGINT a)

如果变量 a 是int类型，那么返回a的十六进制码
如果变量 a 是string类型，则返回该字符串的十六进制码
返回值：string

select hex(10);
-- A

select hex(16);
-- 10

select hex('abc');
-- 616263

4.13 反转十六进制函数，返回该十六进制字符串所代码的字符串： unhex(string a)

返回值：string

select unhex('A');
-- 不显示

select unhex('10');
-- 不显示

select unhex('616263');
-- abc

4.14 进制转换函数，将数值num从from_base进制转化到to_base进制：conv(BIGINT num, int from_base, int to_base)

返回值：string

select conv(18,10,16);
-- 12

select conv(18,10,2);
-- 10010

4.15 绝对值函数，返回数值a的绝对值： abs(double a) ， abs(int a)

返回值：double， int

select abs(-2.124);
-- 2.124

select abs(-2);
-- 2

4.16 取余函数，返回a除以b的余数：pmod(int a, int b),pmod(double a, double b)

返回值：int，double

①当 被除数a 和 除数b 都为正时，余数为正：

select pmod(5,3);
-- 2 

select pmod(9,4);
-- 1

②当 被除数a 为负， 除数b 为正时，余数为正，此时余数的计算公式为：
余数 = （(a÷b的商)取绝对值 + 1）x b + a

select pmod(-5,3);
-- 1   余数=((5/3)+1)*3-5=1 

select pmod(-9,4);
-- 3   余数=((9/4)+1)*4-9=3

③当 被除数a 为正， 除数b 为负时，余数为负，计算公式同②
余数 = （(a÷b的商)取绝对值 + 1）x b + a

select pmod(5,-3);
--  -1  余数=((5/3)+1)*(-3)+5=-1 

select pmod(9,-4);
--  -3  余数=((9/4)+1)*(-4)+9=-3

④当 被除数a 和 除数b 都为负时，余数为负，计算公式同①中常规计算，结果为负

select pmod(-5,-3);
--  -2 

select pmod(-9,-4);
--  -1

⑤当 a 可以整除 b 时，无论a、b正负，余数都为0

select pmod(6,2);
-- 0
select pmod(-6,2);
-- 0
select pmod(6,-2);
-- 0
select pmod(-6,-2);
-- 0

4.17 正弦函数，返回a的正弦值： sin(double a)

返回值：double

select sin(1);
-- 0.8414709848078965

4.18 反正弦函数，返回a的反正弦值：asin(double a)

返回值：double

select asin(0.8414709848078965);
-- 1.0

4.19 余弦函数，返回a的余弦值：cos(double a)

返回值：double

select cos(1);
-- 0.5403023058681398

4.20 反余弦函数，返回a的反余弦值：acos(double a)

返回值：double

select acos(0.5403023058681398);
-- 1.0

五、数学运算

5.1 加法：+

A + B
返回值：
int, int => int
int,double => double
double, double =>dooble

select 1 + 2;
-- 3

select 1 + 2.5;
-- 3.5

select 1.2 + 2.2;
-- 3.4000000000000004

5.2 减法：-

A - B
返回值：
int, int => int
int,double => double
double, double =>dooble

select 2 - 1;
-- 1

5.3 乘法：*

A * B
返回值：
int, int => int
int,double => double
double, double =>dooble
注：如果A乘以B的结果超过默认结果类型的数值范围，则需要通过cast将结果转换成范围更大的数值类型

select 4 * 5;
-- 20

select 2 * 2.2;
-- 4.4

5.4 除法：/

A / B
返回值：double

select 4 / 2;
-- 2.0

5.5 取余：%

A % B
返回值：
int, int => int
int,double => double
double, double =>dooble

select 5 % 3;
-- 2

select 5.2 % 3;
--  2.2

5.6 位与：&

A & B
A,B是int类型
返回值：int

select 1 & 2;
-- 0

5.7 位或：|

A | B
A,B是int类型
返回值：int

select 1 | 2;
-- 3

select 2 | 3;
-- 3

5.8 位异或：^

A + B
A,B是int类型
返回值：int

select 4 ^ 8;
-- 12

select 6 ^ 4;
-- 2

5.9 位取反：~

~A
A只能是正整数
返回值：int类型，返回 -(A+1)

select ~ 3;
-- -4

select ~ 4;
-- -5

select ~ 5;
-- -6

六、关系运算

6.1 等值比较： =

返回值：boolean

6.2 不等值比较： <>, !=

返回值：boolean

6.3 小于比较：<

返回值：boolean

6.4 小于等于比较：<=

返回值：boolean

6.5 大于比较：>

返回值：boolean

6.6 大于等于比较：>=

返回值：boolean

6.7 空值判断：is null

返回值：boolean

6.8 非空判断：is not null

返回值：boolean

6.9 like比较,匹配字符串A中符合B的部分：string A like string B

返回值：string

6.10 rlike比较，匹配字符串A中符合正则表达式B的部分：string A like string B

rlike 可以使用 java 正则表达式匹配
返回值：string

6.11 正则regexp匹配：string A regexp string B

功能与 rlike 相同
返回值：boolean

七、逻辑运算

7.1 逻辑与：and

A and B
返回值：boolean

7.2 逻辑或：or

A or B
返回值：boolean

7.3 逻辑非：not

not A
返回值：boolean

八、条件函数

8.1 if 函数

if(boolean testCondition, T valueTrue, T valueFalseOrNull)
当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull
返回值：T

select if(1=1,10,20);
-- 10

select if(1=2,10,20);
-- 20

8.2 条件判断函数 case

返回值：T
用法① ：case 字段a when 满足c1 then 输出r1 [ when 满足c2 then 输出r2 ] [ else 输出 r3 ] end
如果字段a满足条件c1,则输出r1；
如果满足c2则输出r2；
都不满足则输出r3
用法②：case when 满足c1 then 输出r1 [when 满足c2 then 输出r2] [else 输出r3] end
如果满足条件c1，则输出r1；
满足条件c2，则输出r2；
都不满足则输出r3；

-- 举例两条sql语句：
select *,case sex when 'male' then '男' when 'female' then '女' else '未知' from student;

select *, case when score >=90 then 'good' when score >=60 and score <90 then 'general' else 'bad' end as grade from student;

8.3 非空查找函数：COALESCE(T v1, T v2, …)

返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL
返回值：T

select COALESCE(null,'10','20');
-- 10

九、表生成函数

9.1 map类型构建：map (key1, value1, key2, value2, …)

根据输入的key和value对构建map类型

create table t1 as select map('1','a','2','b','3','c');

select * from t1;
-- {"1":"a","2":"b","3":"c"}

describe t1;
-- _c0                     map<string,string>

9.2 struct类型构建：struct(val1, val2, val3, …)

根据输入的参数构建结构体struct类型

create table t2 as select struct('a1','b1','c1');

select * from t2;
-- {"col1":"a1","col2":"b1","col3":"c1"}

describe t2;
-- _c0                     struct<col1:string ,col2:string,col3:string>

9.3 array类型构建：array(val1, val2, …)

根据输入的参数构建数组array类型

create table t3 as select array('a1','b1','c1');

select * from t3;
-- ["a1","b1","c1"]

describe t3;
-- _c0       array<string>

9.4 将array 或 map 扩展成多行多列：explode(array())，explode(map())

explode() 函数可以将一个 array 或者 map 展开：

explode(array)：array 转单列多行，使得结果中将array列表里的每个元素生成一行
explode(map)：map 转多行两列， 使得结果中将map里的每一对元素作为一行，key为一列，value为一列
explode() 函数一般结合侧视图 lateral view 一起使用

select explode(array(1,2,3,4));
+------+--+
| col  |
+------+--+
| 1    |
| 2    |
| 3    |
| 4    |
+------+--+

select explode(map('A','apple','O','orage'));
+------+--------+--+
| key  | value  |
+------+--------+--+
| A    | apple  |
| O    | orage  |
+------+--------+--+

-- 一般需要在 explode() 中嵌套 split() 函数分割某个字段，将其转成数组，然后在explode实现 “行转列”
select explode(split("a,b,c,d,e,f",","));
+------+--+
| col  |
+------+--+
| a    |
| b    |
| c    |
| d    |
| e    |
| f    |
+------+--+

-- 结合侧视图使用，不展示结果：
select id,name,sub
from student
later view explode(subject) subView as sub;

注：对于explode的限制，以下sql语句不支持：

不能和其他字段一起使用
错误： select name,explode(subject) from student;
不能和group by、sort by 和cluster by一起使用
错误： select explode(subject) from employees group by explode(subject);

9.5 将array转单列多行输出，在每行结果前增加序号：posexplode(array())

在扩展的行前增加序号，从0开始递增
参数只支持数组类型 array

select posexplode(array('a','b','c'));
+------+------+--+
| pos  | val  |
+------+------+--+
| 0    | a    |
| 1    | b    |
| 2    | c    |
+------+------+--+

9.6 按指定需求，将array转多列多行，灵活输出：stack(int num,array())

num 可以指定输出几列，但是 num 必须能被 array() 的长度 length 整除

select stack(1,'a','b','c','d');
+-------+-------+-------+-------+--+
| col0  | col1  | col2  | col3  |
+-------+-------+-------+-------+--+
| a     | b     | c     | d     |
+-------+-------+-------+-------+--+


select stack(2,'a','b','c','d');
+-------+-------+--+
| col0  | col1  |
+-------+-------+--+
| a     | b     |
| c     | d     |
+-------+-------+--+

select stack(4,'a','b','c','d');
+-------+--+
| col0  |
+-------+--+
| a     |
| b     |
| c     |
| d     |
+-------+--+

9.7 根据元祖中josn字符串构建：json_tuple()

select json_tuple('{"name":"zhangsan","age":"18"}','name','age');
+-----------+-----+--+
|    c0     | c1  |
+-----------+-----+--+
| zhangsan  | 18  |
+-----------+-----+--+

9.8 将分组中的某列转为一个数组返回：collect_list、collect_set

collect_list、collect_set 都是将分组中的某列转为一个数组返回，不同的是collect_list不去重，而collect_set去重

创建一张实验用表，存放用户每天点播视频的记录：

create table t_visit_video (
username string,
video_name string
) 
partitioned by (day string)
row format delimited fields terminated by ',';

在本地文件系统创建测试数据文件：

张三,大唐双龙传
李四,天下无贼
张三,神探狄仁杰
李四,霸王别姬
李四,霸王别姬
王五,机器人总动员
王五,放牛班的春天
王五,盗梦空间

将数据加载到Hive表：

load data local inpath '/root/hive/visit.data' into table t_visit_video partition (day='20180516');

hivetask运行流程 hivestack_bc_02

按用户分组，取出每个用户每天看过的所有视频的名字：

select username, collect_list(video_name) from t_visit_video group by username ;

hivetask运行流程 hivestack_unix_03

但是上面的查询结果有点问题，因为霸王别姬实在太好看了，所以李四这家伙看了两遍，这直接就导致得到的观看过视频列表有重复的，所以应该增加去重，使用collect_set，其与collect_list的区别就是会去重：

select username, collect_set(video_name) from t_visit_video group by username;

hivetask运行流程 hivestack_hivetask运行流程_04

李四的观看记录中霸王别姬只出现了一次，实现了去重效果

突破group by限制
还可以利用collect来突破group by的限制，Hive中在group by查询的时候要求出现在select后面的列都必须是出现在group by后面的，即select列必须是作为分组依据的列，但是有的时候我们想根据A进行分组然后随便取出每个分组中的一个B，代入到这个实验中就是按照用户进行分组，然后随便拿出一个他看过的视频名称即可：

select username, collect_list(video_name)[0] from t_visit_video group by username;

hivetask运行流程 hivestack_字符串_05

video_name不是分组列，依然能够取出这列中的数据

如果查询 collect_set 之前有多个字段 student_id , score，…，这些字段必须要group by

select student_id,score,concat_ws(",",collect_set(question_id)) 
from ex_exam_record 
where score = 1 
group by student_id,score;

详细用法参考：hive笔记之collect_list/collect_set(列转行)

十、复杂类型访问操作

10.1 map类型访问：M[key]

M为map类型，key为map中的key值
返回map类型M中，key值为指定值的value值

create table t1 as select map('1','a','2','b','3','c') as m;

select * from t1;
-- {"1":"a","2":"b","3":"c"}

select m['1'],m['2'],m['3'] from t1;
-- a	b	c

10.2 struct类型访问: S.x

S为struct类型
返回结构体S中的x字段

create table t2 as select struct('a1','b1','c1') as s;

select * from t2;
-- {"col1":"a1","col2":"b1","col3":"c1"}

select s.col1,s.col2,s.col3 from t2;
-- a1	b1	c1

10.3 array类型访问： A[n]

A为array类型，n为int类型
返回数组A中的第n个变量值
数组的起始下标为0

create table t3 as select array('a1','b1','c1') as arr;

select * from t3;
-- ["a1","b1","c1"]

select arr[0],arr[1],arr[2] from t3;
-- a1	b1	c1

10.4 Map类型长度函数，返回map类型的长度：size(map)

size(Map<k .V>)
返回值：int

select size(map('1','a','2','b','3','c'));
-- 3

10.5 array类型长度函数，返回array类型的长度：size(Array)

返回值：int

select size(array('a1','b1','c1'));
-- 3

十一、类型转换函数

11.1 类型转换函数，返回转换后的数据类型的结果：cast(expr as )

返回值：Expected “=” to follow “type”

select cast(1 as bigint);
-- 1

select cast (2.1 as int);
-- 2

select cast(2.99 as int);
-- 2

select cast('1.35' as int);
-- 1

select cast('1.35' as double);
-- 1.35

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：java项目之后 java项目在哪

下一篇：python中间件目录结构 python中间件有哪些

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯