目录
- 一、Sqoop服务架构
- 二、Sqoop import讲解
- (1)准备数据
- (2)import相关参数详解
- 三、Sqoop export讲解
- 四、Sqoop import Hive和Sqoop export Hive
- 五、options-file讲解
一、Sqoop服务架构
Apache Sqoop™ is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
简单点说sqoop就是一个连接关系型数据库和hadoop的桥梁,底层跑MR,主要有两个方面(导入和导出):
A. 将关系型数据库的数据导入到Hadoop 及其相关的系统中,如 Hive和HBase
B. 将数据从Hadoop 系统里抽取并导出到关系型数据库
二、Sqoop import讲解
在进行import导入的时候,会出现:
这是因为sqoop缺少java-json.jar包.去http://www.java2s.com/Code/Jar/j/Downloadjavajsonjar.htm下载,然后放到sqoop/lib目录即可。
(1)准备数据
1.准备数据,创建mysql表并加载数据
CREATE TABLE user_info (
id int(11) NOT NULL AUTO_INCREMENT,
username varchar(20) DEFAULT NULL,
password varchar(20) DEFAULT NULL,
PRIMARY KEY (id)
);
insert into user_info values(1,'admin','admin');
insert into user_info values(2,'wang','111111');
insert into user_info values(3,'zhang','000000');
insert into user_info values(4,'lili','000000');
insert into user_info values(5,'henry','000000');
insert into user_info values(6,'cherry','000000');
insert into user_info values(7,'ben','111111');
insert into user_info values(8,'leo','000000');
insert into user_info values(9,'test','test');
insert into user_info values(10,'system','000000');
insert into user_info values(11,'xiao','111111');
(2)import相关参数详解
可以查看相关import的参数
sqoop import --help
将mysql中的数据导入到hdfs上,默认的map数是4个,默认地址是hdfs:/user/caizhengjie
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info
将mysql中的数据导入到hdfs上,如果文件存在就给删除掉,指定的map数是1个,指定的地址是hdfs:/user/kfk/sqoop/import/user_info
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1 \
--target-dir /user/kfk/sqoop/import/user_info
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
1,admin,admin
2,wang,111111
3,zhang,000000
4,lili,000000
5,henry,000000
6,cherry,000000
7,ben,111111
8,leo,000000
9,test,test
10,system,000000
11,xiao,111111
使用 --direct,可以让传输更快
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1 \
--target-dir /user/kfk/sqoop/import/user_info \
--direct
上传到hdfs上的文件格式默认是用逗号隔开的,我们可以自己定义分隔格式
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1 \
--target-dir /user/kfk/sqoop/import/user_info \
--fields-terminated-by '\t'
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
1 admin admin
2 wang 111111
3 zhang 000000
4 lili 000000
5 henry 000000
6 cherry 000000
7 ben 111111
8 leo 000000
9 test test
10 system 000000
11 xiao 111111
定义存储在hdfs上的parquet文件格式
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1 \
--target-dir /user/kfk/sqoop/import/user_info \
--fields-terminated-by '\t' \
--as-parquetfile
定义snappy压缩格式
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--table user_info \
--delete-target-dir \
--num-mappers 1 \
--target-dir /user/kfk/sqoop/import/user_info \
--fields-terminated-by '\t' \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec
将id > 5的’username,password’导入到hdfs
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--columns 'username,password' \
--table user_info \
--where 'id > 5' \
--delete-target-dir \
--num-mappers 1 \
--target-dir /user/kfk/sqoop/import/user_info \
--fields-terminated-by '\t'
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
cherry 000000
ben 111111
leo 000000
test test
system 000000
xiao 111111
在原表的基础上增加,但是需要指定列名–check-column ‘id’
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--columns 'username,password' \
--table user_info \
--where 'id > 5' \
--num-mappers 1 \
--target-dir /user/kfk/sqoop/import/user_info \
--fields-terminated-by '\t' \
--check-column 'id' \
--incremental append
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
cherry 000000
ben 111111
leo 000000
test test
system 000000
xiao 111111
cherry 000000
ben 111111
leo 000000
test test
system 000000
xiao 111111
在原表的基础上增加,从第10个开始,但是不包含10
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--columns 'username,password' \
--table user_info \
--num-mappers 1 \
--target-dir /user/kfk/sqoop/import/user_info \
--fields-terminated-by '\t' \
--check-column 'id' \
--incremental append \
--last-value 10
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
cherry 000000
ben 111111
leo 000000
test test
system 000000
xiao 111111
cherry 000000
ben 111111
leo 000000
test test
system 000000
xiao 111111
xiao 111111
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--columns 'username,password' \
--num-mappers 1 \
--target-dir /user/kfk/sqoop/import/user_info \
--fields-terminated-by '\t' \
--check-column 'id' \
--incremental append \
--query 'select * from user_info where id > 5 and $CONDITIONS'
[caizhengjie@bigdata-pro-m01 hadoop]$ bin/hdfs dfs -text /user/kfk/sqoop/import/user_info/par*
cherry 000000
ben 111111
leo 000000
test test
system 000000
xiao 111111
cherry 000000
ben 111111
leo 000000
test test
system 000000
xiao 111111
xiao 111111
6 cherry
7 ben
8 leo
9 test
10 system
11 xiao
三、Sqoop export讲解
首先在mysql上准备数据表
CREATE TABLE user_info_export (
id int(11) NOT NULL AUTO_INCREMENT,
username varchar(20) DEFAULT NULL,
password varchar(20) DEFAULT NULL,
PRIMARY KEY (id)
);
提前在hdfs上准备好数据,我的数据放在/user/kfk/sqoop/export/user_info,默认是逗号隔开,如果hdfs上文件使用’\t’分割,则可以设置参数–fields-terminated-by ‘\t’
将hdfs的数据导入到mysql中
bin/sqoop export \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--num-mappers 1 \
--table user_info_export \
--export-dir /user/kfk/sqoop/export/user_info
mysql> select * from user_info_export;
+----+----------+----------+
| id | username | password |
+----+----------+----------+
| 1 | admin | admin |
| 2 | wang | 111111 |
| 3 | zhang | 000000 |
| 4 | lili | 000000 |
| 5 | henry | 000000 |
| 6 | cherry | 000000 |
| 7 | ben | 111111 |
| 8 | leo | 000000 |
| 9 | test | test |
| 10 | system | 000000 |
| 11 | xiao | 111111 |
+----+----------+----------+
指定特定的列名
bin/sqoop export \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--num-mappers 1 \
--table user_info_export \
--export-dir /user/kfk/sqoop/export/user_info \
--columns 'username,password'
mysql> select * from user_info_export;
+----+----------+----------+
| id | username | password |
+----+----------+----------+
| 1 | admin | admin |
| 2 | wang | 111111 |
| 3 | zhang | 000000 |
| 4 | lili | 000000 |
| 5 | henry | 000000 |
| 6 | cherry | 000000 |
| 7 | ben | 111111 |
| 8 | leo | 000000 |
| 9 | test | test |
| 10 | system | 000000 |
| 11 | xiao | 111111 |
| 12 | 1 | admin |
| 13 | 2 | wang |
| 14 | 3 | zhang |
| 15 | 4 | lili |
| 16 | 5 | henry |
| 17 | 6 | cherry |
| 18 | 7 | ben |
| 19 | 8 | leo |
| 20 | 9 | test |
| 21 | 10 | system |
| 22 | 11 | xiao |
+----+----------+----------+
22 rows in set (0.00 sec)
四、Sqoop import Hive和Sqoop export Hive
首先在hive上创建数据表
CREATE TABLE user_info (
id int,
username string,
password string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
将mysql的数据导入到hive表
bin/sqoop import \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--num-mappers 1 \
--table user_info \
--fields-terminated-by ',' \
--delete-target-dir \
--target-dir /user/hive/warehouse/db_hive.db/user_info
hive (db_hive)> select * from user_info;
OK
user_info.id user_info.username user_info.password
1 admin admin
2 wang 111111
3 zhang 000000
4 lili 000000
5 henry 000000
6 cherry 000000
7 ben 111111
8 leo 000000
9 test test
10 system 000000
11 xiao 111111
Time taken: 0.059 seconds, Fetched: 11 row(s)
mysql-table -> /user/hive/warehouse/db_hive.db/user_info
将hive中的数据导出到mysql中
bin/sqoop export \
--connect jdbc:mysql://bigdata-pro-m01:3306/db_sqoop \
--username root \
--password 199911 \
--num-mappers 1 \
--table user_info_export \
--input-fields-terminated-by ',' \
--export-dir /user/hive/warehouse/db_hive.db/user_info
mysql> mysql> select * from user_info_export;
+----+----------+----------+
| id | username | password |
+----+----------+----------+
| 1 | admin | admin |
| 2 | wang | 111111 |
| 3 | zhang | 000000 |
| 4 | lili | 000000 |
| 5 | henry | 000000 |
| 6 | cherry | 000000 |
| 7 | ben | 111111 |
| 8 | leo | 000000 |
| 9 | test | test |
| 10 | system | 000000 |
| 11 | xiao | 111111 |
+----+----------+----------+
11 rows in set (0.00 sec)
五、options-file讲解
出了命令行的方式,我们也可以通过options-file脚本来导入数据:
vim import.txt
import
--connect
jdbc:mysql://bigdata-pro-m01:3306/db_sqoop
--username
root
--password
199911
--table
user_info
--delete-target-dir
--num-mappers
1
--target-dir
/user/kfk/sqoop/import/user_info
--fields-terminated-by
'\t'
sqoop --options-file import.txt
以上内容仅供参考学习,如有侵权请联系我删除!
如果这篇文章对您有帮助,左下角的大拇指就是对博主最大的鼓励。
您的鼓励就是博主最大的动力!