s3和cdh hdfs之间数据迁移,参考
http://bdlabs.edureka.co/static/help/topics/cdh_admin_distcp_data_cluster_migrate.html
emr相关的guide
https://d1.awsstatic.com/whitepapers/amazon_emr_migration_guide.pdf
s3命令,参考:
https://docs.aws.amazon.com/zh_cn/cli/latest/userguide/welcome-versions.html
1.安装awscli
pip install awscli
版本
aws --version aws-cli/1.18.143 Python/3.6.2 Linux/4.4.0-165-generic botocore/1.18.2
有些在CDH的s3自动安装的awscli版本可能会过低,导致一些命令不支持,比如
aws --version aws-cli/1.4.2 Python/3.4.2 Linux/4.9.0-0.bpo.6-amd64
在~/.aws目录下配置region和账号密码
~/.aws$ ls config credentials
config
[default] region = ap-northeast-1
credentials
[default] aws_access_key_id = XXXX aws_secret_access_key = XXXX
2.查看文件夹
aws s3 ls s3://xxxxx/logs/
3. 递归删除s3文件夹
aws s3 rm --recursive s3://xxxxx/logs/test