目录

  • 一、Sqoop服务架构
  • 二、Sqoop import讲解
  • (1)准备数据
  • (2)import相关参数详解
  • 三、Sqoop export讲解
  • 四、Sqoop import Hive和Sqoop export Hive
  • 五、options-file讲解


一、Sqoop服务架构

Apache Sqoop™ is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

简单点说sqoop就是一个连接关系型数据库和hadoop的桥梁,底层跑MR,主要有两个方面(导入和导出):

A. 将关系型数据库的数据导入到Hadoop 及其相关的系统中,如 Hive和HBase

B. 将数据从Hadoop 系统里抽取并导出到关系型数据库

java调用so库文件 out参数_hdfs

二、Sqoop import讲解

在进行import导入的时候,会出现:

java调用so库文件 out参数_mysql_02


这是因为sqoop缺少java-json.jar包.去http://www.java2s.com/Code/Jar/j/Downloadjavajsonjar.htm下载,然后放到sqoop/lib目录即可。

(1)准备数据

1.准备数据,创建mysql表并加载数据

CREATE TABLE  user_info (
  id int(11) NOT NULL AUTO_INCREMENT,
  username varchar(20) DEFAULT NULL,
  password varchar(20) DEFAULT NULL,
  PRIMARY KEY (id)
);
insert into user_info values(1,'admin','admin');
insert into user_info values(2,'wang','111111');
insert into user_info values(3,'zhang','000000');
insert into user_info values(4,'lili','000000');
insert into user_info values(5,'henry','000000');
insert into user_info values(6,'cherry','000000');
insert into user_info values(7,'ben','111111');
insert into user_info values(8,'leo','000000');
insert into user_info values(9,'test','test');
insert into user_info values(10,'system','000000');
insert into user_info values(11,'xiao','111111');

(2)import相关参数详解

可以查看相关import的参数

sqoop import --help

将mysql中的数据导入到hdfs上,默认的map数是4个,默认地址是hdfs:/user/caizhengjie

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info

将mysql中的数据导入到hdfs上,如果文件存在就给删除掉,指定的map数是1个,指定的地址是hdfs:/user/kfk/sqoop/import/user_info

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1  \
--target-dir /user/kfk/sqoop/import/user_info
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
1,admin,admin
2,wang,111111
3,zhang,000000
4,lili,000000
5,henry,000000
6,cherry,000000
7,ben,111111
8,leo,000000
9,test,test
10,system,000000
11,xiao,111111

使用 --direct,可以让传输更快

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1  \
--target-dir /user/kfk/sqoop/import/user_info \
--direct

上传到hdfs上的文件格式默认是用逗号隔开的,我们可以自己定义分隔格式

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1  \
--target-dir /user/kfk/sqoop/import/user_info  \
--fields-terminated-by '\t'
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
1       admin   admin
2       wang    111111
3       zhang   000000
4       lili    000000
5       henry   000000
6       cherry  000000
7       ben     111111
8       leo     000000
9       test    test
10      system  000000
11      xiao    111111

定义存储在hdfs上的parquet文件格式

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1  \
--target-dir /user/kfk/sqoop/import/user_info  \
--fields-terminated-by '\t' \
--as-parquetfile

定义snappy压缩格式

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1  \
--target-dir /user/kfk/sqoop/import/user_info  \
--fields-terminated-by '\t' \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec

将id > 5的’username,password’导入到hdfs

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--columns 'username,password' \
--table user_info \
--where 'id > 5' \
--delete-target-dir \
--num-mappers 1  \
--target-dir /user/kfk/sqoop/import/user_info  \
--fields-terminated-by '\t'
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
cherry  000000
ben     111111
leo     000000
test    test
system  000000
xiao    111111

在原表的基础上增加,但是需要指定列名–check-column ‘id’

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--columns 'username,password' \
--table user_info \
--where 'id > 5' \
--num-mappers 1  \
--target-dir /user/kfk/sqoop/import/user_info  \
--fields-terminated-by '\t' \
--check-column 'id' \
--incremental append
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
cherry  000000
ben     111111
leo     000000
test    test
system  000000
xiao    111111
cherry  000000
ben     111111
leo     000000
test    test
system  000000
xiao    111111

在原表的基础上增加,从第10个开始,但是不包含10

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--columns 'username,password' \
--table user_info \
--num-mappers 1  \
--target-dir /user/kfk/sqoop/import/user_info  \
--fields-terminated-by '\t' \
--check-column 'id' \
--incremental append \
--last-value 10
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
cherry  000000
ben     111111
leo     000000
test    test
system  000000
xiao    111111
cherry  000000
ben     111111
leo     000000
test    test
system  000000
xiao    111111
xiao    111111
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--columns 'username,password' \
--num-mappers 1  \
--target-dir /user/kfk/sqoop/import/user_info  \
--fields-terminated-by '\t' \
--check-column 'id' \
--incremental append \
--query 'select * from user_info where id > 5 and $CONDITIONS'
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
cherry  000000
ben     111111
leo     000000
test    test
system  000000
xiao    111111
cherry  000000
ben     111111
leo     000000
test    test
system  000000
xiao    111111
xiao    111111
6       cherry
7       ben
8       leo
9       test
10      system
11      xiao

三、Sqoop export讲解

首先在mysql上准备数据表

CREATE TABLE  user_info_export (
  id int(11) NOT NULL AUTO_INCREMENT,
  username varchar(20) DEFAULT NULL,
  password varchar(20) DEFAULT NULL,
  PRIMARY KEY (id)
);

提前在hdfs上准备好数据,我的数据放在/user/kfk/sqoop/export/user_info,默认是逗号隔开,如果hdfs上文件使用’\t’分割,则可以设置参数–fields-terminated-by ‘\t’

将hdfs的数据导入到mysql中

bin/sqoop export \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--num-mappers 1  \
--table user_info_export \
--export-dir /user/kfk/sqoop/export/user_info
mysql> select * from user_info_export;
+----+----------+----------+
| id | username | password |
+----+----------+----------+
|  1 | admin    | admin    |
|  2 | wang     | 111111   |
|  3 | zhang    | 000000   |
|  4 | lili     | 000000   |
|  5 | henry    | 000000   |
|  6 | cherry   | 000000   |
|  7 | ben      | 111111   |
|  8 | leo      | 000000   |
|  9 | test     | test     |
| 10 | system   | 000000   |
| 11 | xiao     | 111111   |
+----+----------+----------+

指定特定的列名

bin/sqoop export \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--num-mappers 1  \
--table user_info_export \
--export-dir /user/kfk/sqoop/export/user_info \
--columns 'username,password'
mysql> select * from user_info_export;
+----+----------+----------+
| id | username | password |
+----+----------+----------+
|  1 | admin    | admin    |
|  2 | wang     | 111111   |
|  3 | zhang    | 000000   |
|  4 | lili     | 000000   |
|  5 | henry    | 000000   |
|  6 | cherry   | 000000   |
|  7 | ben      | 111111   |
|  8 | leo      | 000000   |
|  9 | test     | test     |
| 10 | system   | 000000   |
| 11 | xiao     | 111111   |
| 12 | 1        | admin    |
| 13 | 2        | wang     |
| 14 | 3        | zhang    |
| 15 | 4        | lili     |
| 16 | 5        | henry    |
| 17 | 6        | cherry   |
| 18 | 7        | ben      |
| 19 | 8        | leo      |
| 20 | 9        | test     |
| 21 | 10       | system   |
| 22 | 11       | xiao     |
+----+----------+----------+
22 rows in set (0.00 sec)

四、Sqoop import Hive和Sqoop export Hive

首先在hive上创建数据表

CREATE TABLE user_info (
  id int,
  username string,
  password string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

将mysql的数据导入到hive表

bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--num-mappers 1  \
--table user_info \
--fields-terminated-by ',' \
--delete-target-dir \
--target-dir /user/hive/warehouse/db_hive.db/user_info
hive (db_hive)> select * from user_info;
OK
user_info.id    user_info.username      user_info.password
1       admin   admin
2       wang    111111
3       zhang   000000
4       lili    000000
5       henry   000000
6       cherry  000000
7       ben     111111
8       leo     000000
9       test    test
10      system  000000
11      xiao    111111
Time taken: 0.059 seconds, Fetched: 11 row(s)
mysql-table   ->  /user/hive/warehouse/db_hive.db/user_info

将hive中的数据导出到mysql中

bin/sqoop export \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--num-mappers 1  \
--table user_info_export \
--input-fields-terminated-by ',' \
--export-dir /user/hive/warehouse/db_hive.db/user_info
mysql> mysql> select * from user_info_export;
+----+----------+----------+
| id | username | password |
+----+----------+----------+
|  1 | admin    | admin    |
|  2 | wang     | 111111   |
|  3 | zhang    | 000000   |
|  4 | lili     | 000000   |
|  5 | henry    | 000000   |
|  6 | cherry   | 000000   |
|  7 | ben      | 111111   |
|  8 | leo      | 000000   |
|  9 | test     | test     |
| 10 | system   | 000000   |
| 11 | xiao     | 111111   |
+----+----------+----------+
11 rows in set (0.00 sec)

五、options-file讲解

出了命令行的方式,我们也可以通过options-file脚本来导入数据:

vim import.txt
import
--connect
jdbc:mysql://bigdata-pro-m01:3306/db_sqoop
--username
root
--password
199911
--table
user_info
--delete-target-dir
--num-mappers 
1
--target-dir
/user/kfk/sqoop/import/user_info
--fields-terminated-by
'\t'
sqoop --options-file import.txt

以上内容仅供参考学习,如有侵权请联系我删除!
如果这篇文章对您有帮助,左下角的大拇指就是对博主最大的鼓励。
您的鼓励就是博主最大的动力!