hive依赖的服务是 hive service

转载

数据大侠客 2023-07-12 22:24:57

文章标签 hive依赖的服务是 hive Hive java 文章分类 Hive 大数据

由于实验的须要，这两天就搭了个Hive，简单记录一下：

平台：OS：Ubuntu Kylin 14.04

　　　JAVA：Java 1.8.0_25

　　　HADOOP：Hadoop 2.4.0

　　　HIVE：Hive 0.14.0

　　有关Hive的安装这里就不说了，Hive配置好后，直接在安装目录下起动hive（记得先启动Hadoop哈，不然会报错），看着网上的教程简单的试用下；

1 create table test(key int, name string) row format delimited fields terminated by ',' lines terminated by '\n';
2 load data local inpath '/home/liang/test.txt' overwrite into table test;
3 select * from test;
4 show tables;
5 show databases;
6 desc test;

　　看着官网语言手册一顿乱试，感觉好数据库里的命令都好像啊！有关hive都支持什么样的语句及语句怎么写，官网语言手册上写的很详细，好像说从0.14开始支持ACID了，想学习还是啃下英文的手册比较好点，我只是大概看了下。

　　有事关机，回来重启，在用户主目录下启动hive，发现关机前create的表不见了，开始怀疑是不是自己的配置没配好呢，重新看了看配置，没错啊。还好在网上看到了hive使用derby作为元数据库找不到所创建表的原因这个帖子，原来是因为我使用的是默认元数据库derby，元数据库是存放在hive的启动目录的，其实如果你细心就会发现，在hive的启动目录下多了一个目录metastore_db和一个日志文件derby.log。当你换个目录启动hive时，derby在当前目录下就找不到以前的元数据库了，只好重建，所以会出现在HDFS上可以看到以前建的文件，但在hive中就是看不见的。所以，要想常用hive还是建议使用MySQL存放元数据吧，而且使用MySQL可以让多用户使用Hive。

　　下面进正题，说说使用JDBC连接hiveservice的事：

　　打开终端，先看看帮助文档是怎么说的，

1 liang@liang-pc:/opt/apache-hive-0.14.0-bin$ hive --help
 2 Usage ./hive <parameters> --service serviceName <service parameters>
 3 Service List: beeline cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version 
 4 Parameters parsed:
 5 --auxpath : Auxillary jars 
 6 --config : Hive configuration directory
 7 --service : Starts specific service/component. cli is default
 8 Parameters used:
 9 HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
10 HIVE_OPT : Hive options
11 For help on a particular service:
12 ./hive --service serviceName --help
13 Debug help: ./hive --debug --help

　　从上面的输出的第二行可以看到hive的用法是：./hive <parameters> --service serviceName <service parameters>

　　从上面的输出的第三行可以看到可用的service有很多，我们这次要用的是hiveserver和hiveserver2

　　从上面的输出的第12行可以看到，想了解特定service的详细帮助的命令是：./hive --service serviceName --help

　　在终端输入依次输入hive --service hiveserver --help和hive --service hiveserver2

1 liang@liang-pc:/opt/apache-hive-0.14.0-bin$ hive --service hiveserver --help
 2 Starting Hive Thrift Server
 3 usage: hiveserver
 4  -h,--help                        Print help information
 5     --hiveconf <property=value>   Use value for given property
 6     --maxWorkerThreads <arg>      maximum number of worker threads,
 7                                   default:2147483647
 8     --minWorkerThreads <arg>      minimum number of worker threads,
 9                                   default:100
10  -p <port>                        Hive Server port number, default:10000
11  -v,--verbose                     Verbose mode
12 
13 
14 liang@liang-pc:/opt/apache-hive-0.14.0-bin$ hive --service hiveserver2 --help
15 usage: hiveserver2
16     --deregister <versionNumber>   Deregister all instances of given
17                                    version from dynamic service discovery
18  -H,--help                         Print help information
19     --hiveconf <property=value>    Use value for given property

　　从上面的输出只能看到两个service的使用配置和一些默认值，那有关两者的区别和详细说明只好去官网找找HiveServer和HiveServer2了。果然说的比较详细，HiveServer2是HiveServer的升级，且在hive 0.15后，HiveServer将会被移除。原话是：

WARNING!

HiveServer cannot handle concurrent requests from more than one client. This is actually a limitation imposed by the Thrift interface that HiveServer exports, and can't be resolved by modifying the HiveServer code.

HiveServer2 is a rewrite of HiveServer that addresses these problems, starting with Hive 0.11.0. Use of HiveServer2 is recommended.

HiveServer is scheduled to be removed from Hive releases starting Hive 0.15. See HIVE-6977. Please switch over to HiveServer2.

　　hive服务启动后，就使用JDBC来写程序了，这个和数据库就更想了，直接上代码，

1 package hive;
 2 
 3 import java.sql.Connection;
 4 import java.sql.DriverManager;
 5 import java.sql.ResultSet;
 6 import java.sql.Statement;
 7 
 8 public class HiveService {
 9     
10     public static void main(String[] args) throws Exception
11     {
12         Connection conn = null;
13         /*
14          * 使用hiveserver时的配置
15          * 其中URL的写法是：jdbc:hive://ip:port/db
16          * ip是hiveserver所在的机器ip，port是hiveserver的服务端口，db是数据库名字，后面还有用户名和密码，
17          * 这都和其他的数据库的JDBC一样
18          */
19 //        Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
20 //        conn = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", "");
21         
22         /*
23          * 使用hiveserver2时的配置
24          * 与hiveserver比较，只有要加载的类和URL的写法有一点点的不同，其他的都不用改
25          */
26         Class.forName("org.apache.hive.jdbc.HiveDriver");
27         conn = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "", "");
28 
29         Statement stat = conn.createStatement();
30         ResultSet rs = stat.executeQuery("select * from test");
31 //        ResultSet rs = stat.executeQuery("show tables");
32 //        ResultSet rs = stat.executeQuery("show databases");
33 //        ResultSet rs = stat.executeQuery("desc test");
34         while (rs.next())
35         {
36             System.out.println(rs.getString(1) + "\t" + rs.getString(2));
37         }
38     }
39 }

　　所依赖的jar包有：${HIVE_HOME}/lib/hive-jdbc-0.14.0-standalone.jar和${HADOOP_HOME}/share/hadoop/common/hadoop-common-2.4.0.jar

　　当使用hiveserver时，可能是我的hive版本和hadoop版本不对应，会报以下错误，而hiveserver2却不会。

1 Exception in thread "main" java.lang.ExceptionInInitializerError
2 at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(LazySimpleSerDe.java:318)
3 ......
4 at hive.HiveService.main(HiveService.java:16)
5 Caused by: java.lang.RuntimeException: Could not load shims in class org.apache.hadoop.hive.shims.Hadoop23Shims
6 at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:138)
7 ......
8 Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.shims.Hadoop23Shims
9 at java.net.URLClassLoader$1.run(URLClassLoader.java:372)

　　由于hiveserver2可以正常运行，且hiveserver是将要被移除的，就没查找错误的具体原因，有了解的可给我说下。

　　还有一点要注意的是，在我的电脑上测试时，SQL语句后面不要以";"结尾，不然会报错，ResultSet.getString(String columnLabel)会出错。

更详细的有关JDBC的使用请参考：

HiveServer1：https://cwiki.apache.org/confluence/display/Hive/HiveClient

HiveServer2：https://cwiki.apache.org/confluence/display/Hive/HiveServer2%20Clients#HiveServer2Clients-UsingJDBC

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。