文章目录

  • 1、下载Hive
  • 2、安装mysql数据库
  • 3、修改配置文件
  • 3.1 hive-site.xml配置文件
  • 3.2 hive-env.sh配置文件
  • 4、新增mysql驱动到hive中
  • 5、将hive命令添加到环境变量中
  • 6、初始化hive操作
  • 7、启动Metastore服务
  • 8、开始测试
  • 8.1、hdsf目录创建
  • 8.2、将hive拷贝到集群的其他几台机器
  • 8.3、启动测试
  • 8.4、远程访问hive测试


1、下载Hive

本实例使用的是apache-hive-2.1.1,请根据需要将hive下载到本地并解压。下载地址:http://archive.apache.org/dist/hive/
解压后的路径:

[root@hadoop-master apache-hive-2.1.1]# pwd
/usr/local/hive/apache-hive-2.1.1

2、安装mysql数据库

Hive搭建分三种方式:
内嵌Derby方式:使用derby存储方式时,运行hive会在当前目录生成一个derby文件和一个metastore_db目录。这种存储方式的弊端是在同一个目录下同时只能有一个hive客户端能使用数据库。
本地模式:这种存储方式需要在本地运行一个mysql服务器,并作如下配置(下面两种使用mysql的方式,需要将mysql的jar包拷贝到$HIVE_HOME/lib目录下)。
多用户模式:这种存储方式需要在远端服务器运行一个mysql服务器,并且需要在Hive服务器启动meta服务。
三种方式归根到底就是元数据的存储位置不一样,本文采用的是多用户模式。

3、修改配置文件

3.1 hive-site.xml配置文件

首先进入下面这个目录,编辑hive-site.xml文件,没有就新加一个

[root@hadoop-master conf]# vi /usr/local/hive/apache-hive-2.1.1/conf/hive-site.xml

其中hive-site.xml内容:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <!--数据连接地址  -->
  <property> 
    <name>javax.jdo.option.ConnectionURL</name>  
    <value>jdbc:mysql://hadoop-master:3306/hive?createDatabaseIfNotExist=true</value>  
    <description>JDBC connect string for a JDBC metastore</description> 
  </property>
  <!-- 数据库驱动,这里使用mysql数据库驱动 -->
  <property> 
    <name>javax.jdo.option.ConnectionDriverName</name>  
    <value>com.mysql.jdbc.Driver</value>  
    <description>Driver class name for a JDBC metastore</description> 
  </property>  
  <!-- 数据库用户名,根据自己的数据库用户名填写 -->  
  <property> 
    <name>javax.jdo.option.ConnectionUserName</name>  
    <value>root</value>  
    <description>username to use against metastore database</description> 
  </property>  
  <!-- 数据库密码,根据自己的数据库密码填写 -->  
  <property> 
    <name>javax.jdo.option.ConnectionPassword</name>  
    <value>root</value>  
    <description>password to use against metastore database</description> 
  </property>
  <property>
      <name>datanucleus.autoCreateSchema</name>
      <value>true</value>
  </property>
  <property>
      <name>datanucleus.autoCreateTables</name>
      <value>true</value>
  </property>
  <property>
      <name>datanucleus.autoCreateColumns</name>
      <value>true</value>
  </property>
  
  
   <!-- 设置hive仓库在hdfs上的位置  -->
  <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/user/hive/warehouse</value>
  </property>
  
  <!--资源临时文件存放位置 -->
  <property>
      <name>hive.downloaded.resources.dir</name>
      <value>/usr/local/hive/tmp/resources</value>
      <description>Temporary local directory for added resources in the remote file system.</description>
   </property>
  
   <!-- Hive在0.9版本之前需要设置hive.exec.dynamic.partition为true, Hive在0.9版本之后默认为true -->
  <property>
      <name>hive.exec.dynamic.partition</name>
      <value>true</value>
  </property>
  <property>
      <name>hive.exec.dynamic.partition.mode</name>
      <value>nonstrict</value>
  </property>
  
  <!-- 修改日志位置 -->
  <property>
      <name>hive.exec.local.scratchdir</name>
      <value>/usr/local/hive/tmp/hiveJobsLog</value>
      <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
      <name>hive.downloaded.resources.dir</name>
      <value>/usr/local/hive/tmp/resourcesLog</value>
      <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
      <name>hive.querylog.location</name>
      <value>/usr/local/hive/tmp/hiveRunLog</value>
      <description>Location of Hive run time structured log file</description>
  </property>
  <property>
      <name>hive.server2.logging.operation.log.location</name>
      <value>/usr/local/hive/tmp/opertitionLog</value>
      <description>Top level directory where operation tmp are stored if logging functionality is enabled</description>
  </property>  
  
  <!-- 配置HWI接口 -->
  <property>  
      <name>hive.hwi.war.file</name>  
      <value>/usr/local/hive/apache-hive-2.1.1/lib/hive-hwi-2.1.1.jar</value>  
      <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>  
  </property>  
  <property>  
      <name>hive.hwi.listen.host</name>  
      <value>hadoop-master</value>  
      <description>This is the host address the Hive Web Interface will listen on</description>  
  </property>  
  <property>  
      <name>hive.hwi.listen.port</name>  
      <value>9999</value>  
      <description>This is the port the Hive Web Interface will listen on</description>  
  </property>
  <!-- Hiveserver2已经不再需要hive.metastore.local这个配置项了(hive.metastore.uris为空,则表示是metastore在本地,否则就是远程)远程的话直接配置hive.metastore.uris即可 -->
  <property>
      <name>hive.metastore.uris</name>
      <value>thrift://hadoop-master:9083</value>
      <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>
  <property>
      <name>hive.server2.thrift.bind.host</name>
      <value>hadoop-master</value>
  </property>
  <property>
      <name>hive.server2.thrift.port</name>
      <value>10000</value>
  </property>
  <property>
      <name>hive.server2.thrift.http.port</name>
      <value>10001</value>
  </property>
  <property>
      <name>hive.server2.thrift.http.path</name>
      <value>cliservice</value>
  </property>
  <!-- HiveServer2的WEB UI -->
  <property>
      <name>hive.server2.webui.host</name>
      <value>hadoop-master</value>
  </property>
  <property>
      <name>hive.server2.webui.port</name>
      <value>10002</value>
  </property>
  <property>
      <name>hive.scratch.dir.permission</name>
      <value>755</value>
  </property>
  <!-- 使用Hive on spark时,若不设置下列该配置会出现内存溢出异常 -->
  <property>
      <name>spark.driver.extraJavaOptions</name>
      <value>-XX:PermSize=128M -XX:MaxPermSize=512M</value>
  </property>

  <!-- 是否验证配置,此处设置未NONE,暂时不需要验证,测试用 -->
  <property>
    <name>hive.server2.authentication</name>
    <value>NONE</value>
  </property>


  <property>
      <name>hive.metastore.schema.verification</name>
      <value>false</value>
      <description>
        Enforce metastore schema version consistency.
        True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
              schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
              proper metastore schema migration. (Default)
        False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
      </description>
   </property>
</configuration>

配置中的文件目录记得要创建:

[root@hadoop-master tmp]# pwd
/usr/local/hive/tmp
[root@hadoop-master tmp]# ll
总用量 40
drwxr-xr-x. 2 root root     6 2月  21 22:47 hiveJobsLog
drwxr-xr-x. 2 root root     6 2月  21 21:20 hiveRunLog
drwxr-xr-x. 2 root root     6 2月  21 21:20 opertitionLog
drwxr-xr-x. 2 root root     6 2月  21 21:19 resources
drwxr-xr-x. 2 root root     6 2月  21 21:20 resourcesLog

3.2 hive-env.sh配置文件

首先,进入hive的配置目录

cd /usr/local/hive/apache-hive-2.1.1/conf

然后,将需要的配置文件拷贝,即去掉后面的template

cp hive-env.sh.template hive-env.sh
cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
cp hive-log4j2.properties.template hive-log4j2.properties

之后的文件如图所示:

hive集群部署原理 hive集群安装_hadoop


在hive-env.sh文件结尾增加如下内容:

JAVA_HOME=/usr/local/jdk/jdk1.8.0_261
HADOOP_HOME=/usr/local/hadoop/apps/hadoop-2.7.3
HIVE_HOME=/usr/local/hive/apache-hive-2.1.1
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_AUX_JARS_PATH=$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$HADOOP_HOME/lib:$HIVE_HOME/lib
export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS"

4、新增mysql驱动到hive中

在以下目录添加mysql-connector-java-5.1.30.jar

[root@hadoop-master lib]# pwd
/usr/local/hive/apache-hive-2.1.1/lib

并且chomd 777 mysql-connector-java-5.1.30.jar 赋权限

hive集群部署原理 hive集群安装_mysql_02

5、将hive命令添加到环境变量中

[root@hadoop-master bin]# vi /etc/profile

添加HIVE_HOME到环境变量中

#java环境变量
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_261
export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
export PATH=$PATH:${JAVA_HOME}/bin

#配置Hadoop环境变量
export HADOOP_HOME=/usr/local/hadoop/apps/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

#配置hive环境变量
export HIVE_HOME=/usr/local/hive/apache-hive-2.1.1
export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/sbin

添加完成之后执行刷新:

[root@hadoop-master bin]# source /etc/profile

6、初始化hive操作

选用MySQLysqlDerby二者之一为元数据库
注意:先查看MySQL中是否有残留的Hive元数据,若有,需先删除

schematool -dbType mysql -initSchema    ## MySQL作为元数据库

其中mysql表示用mysql做为存储hive元数据的数据库, 若不用mysql做为元数据库, 则执行

schematool -dbType derby -initSchema    ## Derby作为元数据库

hive集群部署原理 hive集群安装_hive_03

7、启动Metastore服务

执行Hive前, 须先启动metastore服务, 否则会报错。客户端连接metastore服务,metastore再去连接MySQL数据库来存取元数据。有了metastore服务,就可以有多个客户端同时连接,而且这些客户端不需要知道MySQL数据库的用户名和密码,只需要连接metastore 服务即可。

hive --service metastore &

hive集群部署原理 hive集群安装_mysql_04

启动hiveserver2服务,这样远程才能访问到

hive --service hiveserver2 &

远程访问地址:

jdbc:hive2://hadoop-master:10000

同时可以从web页面上查看

http://hadoop-master:10002/

hive集群部署原理 hive集群安装_hadoop_05

8、开始测试

8.1、hdsf目录创建

首先如果hadoop还未启动的先保证hadoop正常启动。然后执行hive的目录创建:

hdfs dfs -mkdir -p /user/hive/warehouse

8.2、将hive拷贝到集群的其他几台机器

将hadoop-master上的hive拷贝到其他几台机器上

scp -r /usr/local/hive/* hadoop-slave3:/usr/local/hive/
scp -r /usr/local/hive/* hadoop-slave2:/usr/local/hive/
scp -r /usr/local/hive/* hadoop-slave1:/usr/local/hive/

并且注意修改其他几台机器的配置文件:/etc/profile

#java环境变量
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_261
export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
export PATH=$PATH:${JAVA_HOME}/bin

#配置Hadoop环境变量
export HADOOP_HOME=/usr/local/hadoop/apps/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

#配置hive环境变量
export HIVE_HOME=/usr/local/hive/apache-hive-2.1.1
export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/sbin

然后,刷新环境变量配置文件:

source /etc/profile

8.3、启动测试

在hadoop-slave1上直接输入hive,然后进行如下测试

[root@hadoop-slave1 hive]# hive
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/jdk/jdk1.8.0_261/bin:/usr/local/hadoop/apps/hadoop-2.7.3/bin:/usr/local/hadoop/apps/hadoop-2.7.3/sbin::/usr/local/hive/apache-hive-2.1.1/bin:/usr/local/hive/apache-hive-2.1.1/sbin::/root/bin:/usr/local/jdk/jdk1.8.0_261/bin:/usr/local/hadoop/apps/hadoop-2.7.3/bin:/usr/local/hadoop/apps/hadoop-2.7.3/sbin:/usr/local/hive/apache-hive-2.1.1/bin:/usr/local/hive/apache-hive-2.1.1/sbin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/apps/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/local/hive/apache-hive-2.1.1/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> create database hive_test;
OK
Time taken: 0.29 seconds
hive> show databases;
OK
default
hive_test
Time taken: 0.026 seconds, Fetched: 2 row(s)
hive> use hive_test;
OK
Time taken: 0.042 seconds
hive> create table book (id bigint, name string) row format delimited fields terminated by '\t';
OK
Time taken: 0.408 seconds
hive> show tables;
OK
book
Time taken: 0.027 seconds, Fetched: 1 row(s)
hive> select * from book;
OK
Time taken: 1.111 seconds
hive>

8.4、远程访问hive测试

使用beeline方式连接

[root@hadoop-master hive]# beeline
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/jdk/jdk1.8.0_261/bin:/usr/local/hadoop/apps/hadoop-2.7.3/bin:/usr/local/hadoop/apps/hadoop-2.7.3/sbin:/usr/local/hive/apache-hive-2.1.1/bin:/usr/local/hive/apache-hive-2.1.1/sbin:/usr/local/git/git-2.9.5/bin:/usr/local/go/bin:/root/bin)
Beeline version 2.1.1 by Apache Hive
beeline> !connect jdbc:hive2://hadoop-master:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/apps/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hadoop-master:10000
Enter username for jdbc:hive2://hadoop-master:10000: root
Enter password for jdbc:hive2://hadoop-master:10000: ****
Connected to: Apache Hive (version 2.1.1)
Driver: Hive JDBC (version 2.1.1)
21/06/27 11:26:43 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoop-master:10000> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| hive_test      |
+----------------+--+
2 rows selected (0.385 seconds)
0: jdbc:hive2://hadoop-master:10000>