Apache Sqoop系列文章

1、Apache Sqoop介绍及部署

2、sqoop导入(RMDB-mysql、sybase到HDFS-hive)

3、Sqoop导出(HDFS到RMDB-mysql)

4、Sqoop job作业


文章目录

  • Apache Sqoop系列文章
  • 一、Apache Sqoop
  • 二、sqoop安装
  • 1、下载安装文件
  • 2、解压
  • 3、修改配置
  • 三、验证


本文分为三个部分,即sqoop介绍、部署及验证。
本文使用的前提是已经有可用的mysql数据库。

一、Apache Sqoop

官网地址:https://sqoop.apache.org/

Apache Sqoop是在Hadoop生态体系和RDBMS体系之间传送数据的一种工具。

Sqoop工作机制是将导入或导出命令翻译成mapreduce程序来实现。在翻译出的mapreduce中主要是对inputformat和outputformat进行定制。

Hadoop生态系统包括:HDFS、Hive、Hbase等

RDBMS体系包括:Mysql、Oracle、DB2等

Sqoop可以理解为:“SQL 到 Hadoop 和 Hadoop 到SQL”。

1、Apache Sqoop介绍及部署_big data


数据的导入导出:

Import:数据导入。RDBMS----->Hadoop

Export:数据导出。Hadoop---->RDBMS

二、sqoop安装

安装sqoop的前提是已经具备java和hadoop的环境。
安装的版本是1.4.6

1、下载安装文件

安装文件下载地址:http://archive.apache.org/dist/sqoop/1.4.6/

1、Apache Sqoop介绍及部署_hadoop_02

2、解压

解压目录视自己的环境而定。

cd /usr/local/bigdata
tar -xvzf sqoop-1.4.6.bin__hadoop-0.23.tar.gz -C /usr/local/bigdata

[alanchan@server3 sqoop-1.4.6.bin__hadoop-0.23]$ ll
总用量 1820
drwxr-xr-x 2 alanchan root    4096 5月   9 00:43 bin
-rw-r--r-- 1 alanchan root   59010 4月  27 2015 build.xml
-rw-r--r-- 1 alanchan root   35614 4月  27 2015 CHANGELOG.txt
-rw-r--r-- 1 alanchan root    9880 4月  27 2015 COMPILING.txt
drwxr-xr-x 2 alanchan root    4096 5月   9 00:43 conf
drwxr-xr-x 5 alanchan root    4096 5月   9 00:43 docs
drwxr-xr-x 2 alanchan root    4096 5月   9 00:43 ivy
-rw-r--r-- 1 alanchan root   17031 4月  27 2015 ivy.xml
drwxr-xr-x 2 alanchan root    4096 5月   9 00:43 lib
-rw-r--r-- 1 alanchan root   15319 4月  27 2015 LICENSE.txt
-rw-r--r-- 1 alanchan root     505 4月  27 2015 NOTICE.txt
-rw-r--r-- 1 alanchan root   18772 4月  27 2015 pom-old.xml
-rw-r--r-- 1 alanchan root    1096 4月  27 2015 README.txt
-rw-r--r-- 1 alanchan root 1014150 4月  27 2015 sqoop-1.4.6.jar
-rw-r--r-- 1 alanchan root    6554 4月  27 2015 sqoop-patch-review.py
-rw-r--r-- 1 alanchan root  632600 4月  27 2015 sqoop-test-1.4.6.jar
drwxr-xr-x 7 alanchan root    4096 4月  27 2015 src
drwxr-xr-x 4 alanchan root    4096 5月   9 00:43 testdata

3、修改配置

配置文件修改:

# 到安装目录下,找到conf目录
cd $SQOOP_HOME/conf

[alanchan@server3 conf]$ cat sqoop-env-template.sh 
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*

# Set Hadoop-specific environment variables here.

#Set path to where bin/hadoop is available
#export HADOOP_COMMON_HOME=

#Set path to where hadoop-*-core.jar is available
#export HADOOP_MAPRED_HOME=

#set the path to where bin/hbase is available
#export HBASE_HOME=

#Set the path to where bin/hive is available
#export HIVE_HOME=

#Set the path for where zookeper config dir is
#export ZOOCFGDIR=

# 修改其中的配置
# 重命名sqoop-env-template.sh为sqoop-env.sh
mv sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
export HADOOP_COMMON_HOME= /usr/local/bigdata/hadoop-3.1.4 
export HADOOP_MAPRED_HOME= /usr/local/bigdata/hadoop-3.1.4
export HIVE_HOME= /usr/local/bigdata/apache-hive-3.1.2-bin
# 保存退出

以mysql为示例,加入mysql的jdbc驱动包

# 或者直接上传mysql-connector-java-5.1.40.jar到/usr/local/bigdata/sqoop-1.4.6.bin__hadoop-0.23/lib

cp /apache-hive-3.1.2-bin/lib/mysql-connector-java-5.1.40.jar $SQOOP_HOME/lib/

三、验证

查询数据库的列表

sqoop list-databases \
--connect jdbc:mysql://192.168.51.65:3306/ \
--username bigdata --password 888888

[opsadm@bd-node-04 conf]$ sqoop list-databases --connect jdbc:mysql://192.168.51.65:3306/ --username bigdata --password 888888

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
23/05/09 08:52:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7-cdh6.2.1
23/05/09 08:52:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
23/05/09 08:52:47 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Tue May 09 08:52:48 CST 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
cdh6
hive
hue
mysql
oozie
performance_schema
sys

本命令会列出所有mysql的数据库。
到这里,整个Sqoop安装工作完成。