s3和cdh hdfs之间数据迁移,参考

http://bdlabs.edureka.co/static/help/topics/cdh_admin_distcp_data_cluster_migrate.html

 emr相关的guide

https://d1.awsstatic.com/whitepapers/amazon_emr_migration_guide.pdf

s3命令,参考:

https://docs.aws.amazon.com/zh_cn/cli/latest/userguide/welcome-versions.html

1.安装awscli

pip install awscli

版本

aws --version
aws-cli/1.18.143 Python/3.6.2 Linux/4.4.0-165-generic botocore/1.18.2

有些在CDH的s3自动安装的awscli版本可能会过低,导致一些命令不支持,比如

aws --version
aws-cli/1.4.2 Python/3.4.2 Linux/4.9.0-0.bpo.6-amd64

在~/.aws目录下配置region和账号密码

~/.aws$ ls
config  credentials

config

[default]
region = ap-northeast-1

credentials

[default]
aws_access_key_id = XXXX
aws_secret_access_key = XXXX

2.查看文件夹

aws s3 ls s3://xxxxx/logs/

3. 递归删除s3文件夹

aws s3 rm --recursive s3://xxxxx/logs/test