1.Hive内嵌函数说明
官方地址:hive函数大全官方地址
Hive字符处理函数系列1:hive内嵌字符串函数1
String Functions
Return Type |
Name(Signature) |
Description |
---|---|---|
regexp_extract(string subject, string pattern, int index) |
Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.. 抽取字符串subject中符合正则表达式pattern的第index个部分的子字符串,注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格 hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) ---注意2是匹配分组的 'bar.' |
|
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) | Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc.. 按照Java正则表达式PATTERN将字符串INTIAL_STRING中符合条件的部分成REPLACEMENT所指定的字符串,如里REPLACEMENT这空的话,抽符合正则的部分将被去掉 如:regexp_replace("foobar", "oo|ar", "") = 'fb.' 注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格 |
|
string | replace(string A, string OLD, string NEW) | Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0). Example: select replace("ababab", "abab", "Z"); returns "Zab". |
split(string str, string pat) | Splits str around pat (pat is a regular expression).. 按照正则表达式pat来分割字符串str,并将分割后的数组字符串的形式返回 |
|
string |
translate(string|char|varchar input, string|char|varchar from, string|char|varchar to) |
Translates the input string by replacing the characters present in the Char/varchar support added as of Hive 0.14.0. 解释: hive>select translate('abcdefga','abc','wo') wodefgw 注意,结果不是wodefga, hive>select replace('abcdefga','abc','wo') wodefga 注意,两个结果,这就是replace和translate的区别。
|
map<string,string> |
str_to_map(text[, delimiter1, delimiter2]) |
Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2. 解释: 使用两个分隔符将文本拆分为键值对。 分隔符1将文本分成K-V对,分隔符2分割每个K-V对。对于分隔符1默认分隔符是 ',',对于分隔符2默认分隔符是 '='。 hive> select
hive> |
2.split,regexp_replace,regexp_extract的使用
2.1 split函数,支持使用正则表达式对字符串进行切割,返回值为数组
SELECT
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [0] AS A,
--对url进行使用?进行切割,返回值是数组,这里取?前面的值。
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\?') [1] AS A1,
SPLIT('http://facebook.com/index.html','\\?') [0] AS B ,
SPLIT('http://facebook.com/index.html','\\?') [1] AS B1,
SPLIT('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1','\\\w\\/') [0] AS C1
FROM FDM_SOR.T_PLPLFIS_TB_LOAN_APPLY_CEBANK_ED
注意:所有正则表达式中的预定义字符比如?,},|等需要在这里用\\进行反转义才能表达本意。比如正则表达式中\w表示匹配字母,所以也属于预定义字符,单独的\w表示匹配的是字母w,而\\\w才表示匹配字母。
2.2. regexp_replace函数,比较简单,难的是里面参数正则表达式的书写。
select
case when regexp_replace(uniscid,'[0-9A-HJ-NPQRTUWXY]{2}\\d{6}[0-9A-HJ-NPQRTUWXY]{10}','~~fbietl~~') = '~~fbietl~~' then uniscid
else null end uniscid,
from fdm_sor.aaaaaaaaaaaaaaa;
2.3 replace和translate的区别
- replace:字符串级别的代替
- translate:字符级别的代替
hive>select translate('abcdefga','abc','wo') wodefgw 注意,结果不是wodefga, hive>select replace('abcdefga','abc','wo') wodefga 注意,两个结果,这就是replace和translate的区别。