其实,这是一个懒人去研究的东西,因为如果冬天喜欢去实验室或者机房,不呆在宿舍或者家里,就没有远程的问题了,但是,总有不巧的时候,这个时候你就只有在远程命令行里去看一切和操作一切了。远程操作的第一步从配置ssh远程访问集群开始:
通过ssh远程访问集群
有2个前提:
- 集群中需要有机器在公网路由配置了DHCP转发
- 配置了DHCP转发的机器需要开启了SSH服务,SSH的服务端口是22
例如,我一般是将master(NameNode/JobTracker)配置DHCP转发,在路由处的配置:外部端口:aa IP:masterIP 内部端口:22;然后我们在任何地方都可以通过类似:ssh -P aa hadoop@机器的公网IP 来访问集群。其中hadoop是机器master的用户名,意思就是通过aa端口将请求转发到机器的公网IP的22号端口,然后以hadoop用户名登录。
~/.ssh/authorized_keys里面,这样以后直接一个命令,密码都不用输入就可以登录集群了,登录上master,然后master到其他集群机器也是免密码登录的。
集群管理的一些好工具
- pdsh
这是一个神器,可以在一台机器上去执行分布式shell操作整个集群的机器
$ pdsh -h
Usage: pdsh [-options] command ...
-S return largest of remote command return values
-h output usage menu and quit
-V output version information and quit
-q list the option settings and quit
-b disable ^C status feature (batch mode)
-d enable extra debug information from ^C status
-l user execute remote commands as user
-t seconds set connect timeout (default is 10 sec)
-u seconds set command timeout (no default)
-f n use fanout of n nodes
-w host,host,... set target node list on command line
-x host,host,... set node exclusion list on command line
-R name set rcmd module to name
-M name,... select one or more misc modules to initialize first
-N disable hostname: labels on output lines
-L list info on all loaded modules and exit
-g query,... target nodes using genders query
-X query,... exclude nodes using genders query
-F file use alternate genders file `file'
-i request alternate or canonical hostnames if applicable
-a target all nodes except those with "pdsh_all_skip" attribute
-A target all nodes listed in genders database
available rcmd modules: ssh,rsh,exec (default: rsh)
pdsh -w ssh:brix-[00-09],lbt,gbt uptime
上面这条命令可以在brix-00到brix-09以及lbt和gbt所有机器上执行uptime命令,并会在当前机器上打印出来。但是,我这里将pdsh定义了下别名,常规情况下这样应该执行会报错,将pdsh替换成下面这样就可以了
alias pdsh='PDSH_RCMD_TYPE=ssh pdsh'
然后执行结果如下:
gbt: 17:33:21 up 2:31, 1 user, load average: 0.00, 0.01, 0.05
lbt: 17:33:18 up 2:27, 2 users, load average: 0.00, 0.02, 0.05
brix-02: 17:33:21 up 2:31, 0 users, load average: 0.00, 0.01, 0.05
brix-01: 17:33:21 up 2:31, 0 users, load average: 0.03, 0.02, 0.05
brix-00: 17:33:21 up 2:33, 4 users, load average: 0.08, 0.05, 0.09
brix-03: 17:33:20 up 2:31, 0 users, load average: 0.00, 0.01, 0.05
brix-04: 17:33:21 up 2:31, 0 users, load average: 0.01, 0.04, 0.05
brix-08: 17:33:21 up 2:31, 0 users, load average: 0.04, 0.06, 0.05
brix-09: 17:33:20 up 2:31, 0 users, load average: 0.10, 0.06, 0.06
brix-07: 17:33:21 up 2:31, 0 users, load average: 0.03, 0.06, 0.05
brix-05: 17:33:21 up 2:31, 0 users, load average: 0.08, 0.04, 0.05
brix-06: 17:33:21 up 2:31, 0 users, load average: 0.05, 0.04, 0.05
pdsh -w ssh:brix-[00-09],lbt,gbt scp brix-00:~/HadoopInstall/test.txt ~/HadoopInstall/
上面这条示例可以将brix-00上的test.txt文件拷贝到brix-09以及lbt和gbt机器上。
- scp
上面的命令已经展示了scp如何结合pdsh一起来使用了,这里不再细说,下面贴上scp的一些指令。
usage: scp [-12346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]
[-l limit] [-o ssh_option] [-P port] [-S program]
[[user@]host1:]file1 ... [[user@]host2:]file2
如何通过命令行查看HDFS上文件的健康情况和数据块分布
$ hadoop fsck /ftTest/totalWiki -files -blocks -locations
Warning: $HADOOP_HOME is deprecated.
FSCK started by hadoop from /192.168.1.230 for path /ftTest/totalWiki at Wed Nov 18 17:42:27 CST 2015
/ftTest/totalWiki 3259108351 bytes, 25 block(s): OK
0. blk_-3539743872639772968_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.63:50010, 192.168.1.235:50010]
1. blk_-7700661535252568451_1003 len=134217728 repl=3 [192.168.1.231:50010, 192.168.1.232:50010, 192.168.1.238:50010]
2. blk_-3214646852454192434_1003 len=134217728 repl=3 [192.168.1.237:50010, 192.168.1.236:50010, 192.168.1.238:50010]
3. blk_-8860437510624268282_1003 len=134217728 repl=3 [192.168.1.63:50010, 192.168.1.239:50010, 192.168.1.235:50010]
4. blk_-1765246693355320434_1003 len=134217728 repl=3 [192.168.1.239:50010, 192.168.1.66:50010, 192.168.1.232:50010]
5. blk_9063781070378080202_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.234:50010]
6. blk_8687961040692226467_1003 len=134217728 repl=3 [192.168.1.234:50010, 192.168.1.237:50010, 192.168.1.239:50010]
7. blk_-5717347662754027031_1003 len=134217728 repl=3 [192.168.1.236:50010, 192.168.1.232:50010, 192.168.1.63:50010]
8. blk_-5624359065285533759_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.231:50010]
9. blk_622948206607478459_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.63:50010, 192.168.1.236:50010]
10. blk_-4154428280295153090_1003 len=134217728 repl=3 [192.168.1.232:50010, 192.168.1.235:50010, 192.168.1.63:50010]
11. blk_6638201995439663469_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.63:50010, 192.168.1.237:50010]
12. blk_-3282418422086241856_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.233:50010]
13. blk_2802846523093904336_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.239:50010, 192.168.1.237:50010]
14. blk_-7425405918846384842_1003 len=134217728 repl=3 [192.168.1.239:50010, 192.168.1.66:50010, 192.168.1.234:50010]
15. blk_-8997936298966969491_1003 len=134217728 repl=3 [192.168.1.237:50010, 192.168.1.235:50010, 192.168.1.238:50010]
16. blk_-827035362476515573_1003 len=134217728 repl=3 [192.168.1.239:50010, 192.168.1.63:50010, 192.168.1.235:50010]
17. blk_-5734389503841877028_1003 len=134217728 repl=3 [192.168.1.231:50010, 192.168.1.235:50010, 192.168.1.66:50010]
18. blk_1446125973144404377_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.238:50010, 192.168.1.235:50010]
19. blk_-7161959344923757995_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.238:50010, 192.168.1.234:50010]
20. blk_-2171786920309180709_1003 len=134217728 repl=3 [192.168.1.63:50010, 192.168.1.66:50010, 192.168.1.237:50010]
21. blk_7184760167274632839_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.233:50010]
22. blk_1315507788295151463_1003 len=134217728 repl=3 [192.168.1.63:50010, 192.168.1.239:50010, 192.168.1.233:50010]
23. blk_5923416026032542888_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.239:50010, 192.168.1.236:50010]
24. blk_-8960096699099874150_1003 len=37882879 repl=3 [192.168.1.234:50010, 192.168.1.233:50010, 192.168.1.63:50010]
Status: HEALTHY
Total size: 3259108351 B
Total dirs: 0
Total files: 1
Total blocks (validated): 25 (avg. block size 130364334 B)
Minimally replicated blocks: 25 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 11
Number of racks: 2
FSCK ended at Wed Nov 18 17:42:27 CST 2015 in 3 milliseconds
The filesystem under path '/ftTest/totalWiki'
上面列出来文件的数据块的分布以及文件的一些健康情况。如果要继续看数据块在哪个机架,可以下面这样加一个-racks
$ hadoop fsck /ftTest/totalWiki -files -blocks -locations -racks
Warning: $HADOOP_HOME is deprecated.
FSCK started by hadoop from /192.168.1.230 for path /ftTest/totalWiki at Wed Nov 18 17:43:08 CST 2015
/ftTest/totalWiki 3259108351 bytes, 25 block(s): OK
0. blk_-3539743872639772968_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.235:50010]
1. blk_-7700661535252568451_1003 len=134217728 repl=3 [/rack1/192.168.1.231:50010, /rack1/192.168.1.232:50010, /rack2/192.168.1.238:50010]
2. blk_-3214646852454192434_1003 len=134217728 repl=3 [/rack1/192.168.1.237:50010, /rack1/192.168.1.236:50010, /rack2/192.168.1.238:50010]
3. blk_-8860437510624268282_1003 len=134217728 repl=3 [/rack2/192.168.1.63:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.235:50010]
4. blk_-1765246693355320434_1003 len=134217728 repl=3 [/rack2/192.168.1.239:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.232:50010]
5. blk_9063781070378080202_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.234:50010]
6. blk_8687961040692226467_1003 len=134217728 repl=3 [/rack1/192.168.1.234:50010, /rack1/192.168.1.237:50010, /rack2/192.168.1.239:50010]
7. blk_-5717347662754027031_1003 len=134217728 repl=3 [/rack1/192.168.1.236:50010, /rack1/192.168.1.232:50010, /rack2/192.168.1.63:50010]
8. blk_-5624359065285533759_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.231:50010]
9. blk_622948206607478459_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.236:50010]
10. blk_-4154428280295153090_1003 len=134217728 repl=3 [/rack1/192.168.1.232:50010, /rack1/192.168.1.235:50010, /rack2/192.168.1.63:50010]
11. blk_6638201995439663469_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.237:50010]
12. blk_-3282418422086241856_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.233:50010]
13. blk_2802846523093904336_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.237:50010]
14. blk_-7425405918846384842_1003 len=134217728 repl=3 [/rack2/192.168.1.239:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.234:50010]
15. blk_-8997936298966969491_1003 len=134217728 repl=3 [/rack1/192.168.1.237:50010, /rack1/192.168.1.235:50010, /rack2/192.168.1.238:50010]
16. blk_-827035362476515573_1003 len=134217728 repl=3 [/rack2/192.168.1.239:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.235:50010]
17. blk_-5734389503841877028_1003 len=134217728 repl=3 [/rack1/192.168.1.231:50010, /rack1/192.168.1.235:50010, /rack2/192.168.1.66:50010]
18. blk_1446125973144404377_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.238:50010, /rack1/192.168.1.235:50010]
19. blk_-7161959344923757995_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.238:50010, /rack1/192.168.1.234:50010]
20. blk_-2171786920309180709_1003 len=134217728 repl=3 [/rack2/192.168.1.63:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.237:50010]
21. blk_7184760167274632839_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.233:50010]
22. blk_1315507788295151463_1003 len=134217728 repl=3 [/rack2/192.168.1.63:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.233:50010]
23. blk_5923416026032542888_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.236:50010]
24. blk_-8960096699099874150_1003 len=37882879 repl=3 [/rack1/192.168.1.234:50010, /rack1/192.168.1.233:50010, /rack2/192.168.1.63:50010]