1、更改网卡的 ip 地址
2、更改 /etc/hosts 中的 master 对应的 ip (这个一定要写对啊)
3、hdfs namenode -format之前的准备阶段(删除一些文件)
(1)、删除hdpdata文件(里面包含需要删除的dfs文件夹)
(2)、删除 logs文件夹
4、hdfs namenode -format
------------------------------下面是转载的相关内容(更改ip地址的问题)----------------------------------------------
在Apache Hadoop2.0版本中,测试如果DataNode更改HostName或者IP地址,会引起什么样的情况发生。
1 测试环境
操作系统:CentOS 6.2
Hadoop版本:Apache Hadoop2.0.2
Block副本数:2个
节点部署:
NodeType | HostName | IP |
NameNode | sdc1 | 10.28.169.121 |
NameNode | sdc2 | 10.28.169.122 |
DataNode2 | Tdatanode0 | 10.28.169.126 |
DataNode1 | sdc2 | 10.28.169.122 |
DataNode0 | datanode0 | 10.28.169.225 |
2测试类别
由于机器的限制,在环境中只用了三个DataNode,其中为了方便测试,在DataNode0和DataNode2节点上只部署数据节点,在测试中修改DataNode的HostName和IP只针对DataNode0节点。我们分为两种情况进行测试,在Hadoop集群其中之后:第一种,修改DataNode0的HostName,观察和分析HDFS集群的状态变化;第二种,修改DataNode0的IP地址,观察和分析HDFS集群的状态变化。
2.1 更改DataNode的HostName
在修改DataNode0节点的HostName之前,首先记录下NN端和DN(DataNode0节点)端的VERSION文件中的版本ID,便于对比。
DN端有两个VERSION文件:
(1)DN-VERSION文件1:
所在路径:
/data/hadoop2.0_dn/current/BP-2147169311-10.28.169.122-1355378443940/current
内容:
namespaceID=1728141100
cTime=0
blockpoolID=BP-2147169311-10.28.169.122-1355378443940
layoutVersion=-40
(2)DN-VERSION文件2:
所在路径:
/data/hadoop2.0_dn/current
内容:
storageID=DS-382431371-10.28.169.225-50010-1355324528657
clusterID=hadoop2.0
cTime=0
storageType=DATA_NODE
layoutVersion=-40
NN端也有一个版本文件:
(1) NN-VERSION文件1:
路径:
/data/hadoop2.0_nn_edits/current
内容:
namespaceID=1728141100
clusterID=hadoop2.0
cTime=0
storageType=NAME_NODE
blockpoolID=BP-2147169311-10.28.169.122-1355378443940
layoutVersion=-40
(2) NN-VERSION文件2:
路径:
/data/hadoop2.0_nn_fsimage/current
内容:
namespaceID=1728141100
clusterID=hadoop2.0
cTime=0
storageType=NAME_NODE
blockpoolID=BP-2147169311-10.28.169.122-1355378443940
layoutVersion=-40
更改DataNode0的HostName之前,首先来看下在HDFS的web界面中向NameNode注册的数据节点信息,截图如下:
可以看到三个DataNode都正常提供服务。
现在,在DataNode0节点上,利用hostname命令更改节点的hostname,并且更改/etc/hosts文件中的hostname列表,新的hostname为ddd。更改之后,web界面上,以及DataNode0本机的日志均未有异常出现,web界面上的数据节点名字未改变还是原来的DataNode名字:
hadoop-daemon.sh stop datanode命令停止其服务,之后:
(1)若在web界面上等待约630秒之后,DN判断DataNode0节点已经死去,Live数据节点个数变为2。
hadoop-daemon.sh start datanode命令重新开启数据节点服务。DataNode0端无异常情况出现,HDFS的web界面发生变化:
此时,datanode0节点的名字变为ddd,并且NN有如下日志打印,判定DN死亡的时候,会remove掉,DN服务重启的时候,会增加进来:
2012-12-17 17:21:37,697 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/10.28.169.231:50010
2012-12-17 17:21:37,697 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/10.28.169.231:50010
2012-12-17 17:21:37,698 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.28.169.231:50010
2012-12-17 17:21:57,756 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from DatanodeRegistration(10.28.169.231, storageID=DS-382431371-10.28.169.225-50010-1355324528657, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=hadoop2.0;nsid=1728141100;c=0), blocks: 1480, processing time: 15 msecs
可见,在DataNode服务重启之后,NN端注册的DataNode的名字会发生变化,但是块的数量等不会变化。
(2)若立即重启DN的数据节点服务,则不会触发hdfs发生块移动的操作,并且在web端显示的datanode0的名字也变为ddd。
并且,以上两种情况,通过查看NN和DN端的四个VERSION文件可知,四个VERSION文件均未发生变化。StorageID、BlockPoolID、ClusterID均未变化。
2.2 更改DataNode的IP地址
在更改DataNode0端的IP地址之前,先看下web界面中的DN节点:
/etc/hosts文件中的ip地址,将10.28.169.225改为10.28.169.231。并且,更改/etc/sysconfig/network-scripts/ifcfg-eth1文件中的IP地址,也改为10.28.169.231。改完之后,重启网络服务:service network restart。
从重启网络服务的时刻起,NN就和DataNode0失去心跳连接,DN端的Last Contact会随着时间增长:
直到Last Contact达到630,NN判断DN已经死亡(NN判断DN死掉的超时时间为630秒):
而实际上,此时在DN端DataNode的进程依然存在,我们只是更改了DN的IP地址,并未对DataNode的服务进行更改。此时在客户端如果对HDFS进行查看操作时,可以正常进行;若进行上传文件操作,则在客户端会爆出异常。
当修改IP地址,对网络服务重启之后,立即对DN端的数据节点服务重启,则不会发生数据库的移动。
当NN判断DN死亡之后,此时,NN会把死亡DN节点上的block向其他数据节点进行数据块的转移,以达到相应的replica的副本数目,在NN端会有如下类似日志打印:
2012-12-17 15:53:58,336 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/10.28.169.225:50010
2012-12-17 15:54:01,219 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.28.169.122:50010 to replicate blk_970296822908430116_4695 to datanode(s) 10.28.169.126:50010
2012-12-17 15:54:01,220 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.28.169.122:50010 to replicate blk_-9223121983761508339_4691 to datanode(s) 10.28.169.126:50010
2012-12-17 15:54:01,220 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.28.169.122:50010 to replicate blk_-7669541053393372310_4689 to datanode(s) 10.28.169.126:50010
2012-12-17 15:54:01,220 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.28.169.122:50010 to replicate blk_7265027769022057795_4687 to datanode(s) 10.28.169.126:50010
在DN(126节点)会有如下日志打印:
2012-12-17 15:54:02,344 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block BP-2147169311-10.28.169.122-1355378443940:blk_970296822908430116_4695 src: /10.28.169.122:54664 dest: /10.28.169.126:50010
2012-12-17 15:54:02,349 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block BP-2147169311-10.28.169.122-1355378443940:blk_-9223121983761508339_4691 src: /10.28.169.122:54663 dest: /10.28.169.126:50010
2012-12-17 15:54:02,354 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block BP-2147169311-10.28.169.122-1355378443940:blk_970296822908430116_4695 src: /10.28.169.122:54664 dest: /10.28.169.126:50010 of size 6323
2012-12-17 15:54:02,356 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block BP-2147169311-10.28.169.122-1355378443940:blk_-9223121983761508339_4691 src: /10.28.169.122:54663 dest: /10.28.169.126:50010 of size 8810
2012-12-17 15:54:05,302 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block BP-2147169311-10.28.169.122-1355378443940:blk_7265027769022057795_4687 src: /10.28.169.122:54665 dest: /10.28.169.126:50010
2012-12-17 15:54:05,303 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block BP-2147169311-10.28.169.122-1355378443940:blk_-7669541053393372310_4689 src: /10.28.169.122:54666 dest: /10.28.169.126:50010
在修改IP的DN节点端(datanode0),当重启网络服务之后,DN端会中断二十分钟左右(原因待定,无日志打印),之后,开始打印日志,并爆出如下错误:
2012-12-16 02:53:54,636 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2147169311-10.28.169.122-1355378443940:blk_970296822908430116_4695, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
--------- 中断时间,无日志打印
2012-12-16 03:17:04,863 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
java.io.IOException: Failed on local exception: java.io.IOException: Connection timed out; Host Details : local host is: "datanode0/10.28.169.225"; destination host is: "sdc2":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760)
at org.apache.hadoop.ipc.Client.call(Client.java:1168)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.sendHeartbeat(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.sendHeartbeat(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:441)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:159)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:388)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:276)
at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:886)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:817)
2012-12-16 03:17:04,876 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand action: DNA_REGISTER
2012-12-16 03:17:04,881 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2147169311-10.28.169.122-1355378443940 (storage id DS-382431371-10.28.169.225-50010-1355324528657) service to sdc2/10.28.169.122:9000 beginning handshake with NN
2012-12-16 03:17:05,877 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
java.io.IOException: Failed on local exception: java.io.IOException: Connection timed out; Host Details : local host is: "datanode0/10.28.169.225"; destination host is: "sdc1":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760)
at org.apache.hadoop.ipc.Client.call(Client.java:1168)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.sendHeartbeat(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.sendHeartbeat(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:441)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:159)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:388)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:276)
at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:985)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:886)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:817)
2012-12-16 03:17:14,956 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block pool BP-2147169311-10.28.169.122-1355378443940 (storage id DS-382431371-10.28.169.225-50010-1355324528657) service to sdc2/10.28.169.122:9000 successfully registered with NN
2012-12-16 03:17:14,957 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Took 10081ms to process 1 commands from NN
2012-12-16 03:17:14,958 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand action from standby: DNA_REGISTER
2012-12-16 03:17:14,969 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-2147169311-10.28.169.122-1355378443940 (storage id DS-382431371-10.28.169.225-50010-1355324528657) service to sdc1/10.28.169.121:9000 beginning handshake with NN
2012-12-16 03:17:15,005 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 1480 blocks took 6 msec to generate and 42 msecs for RPC and NN processing
2012-12-16 03:17:15,006 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@189a2557
2012-12-16 03:17:24,983 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block pool BP-2147169311-10.28.169.122-1355378443940 (storage id DS-382431371-10.28.169.225-50010-1355324528657) service to sdc1/10.28.169.121:9000 successfully registered with NN
2012-12-16 03:17:24,984 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Took 10026ms to process 1 commands from NN
2012-12-16 03:17:25,073 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 1480 blocks took 8 msec to generate and 81 msecs for RPC and NN processing
2012-12-16 03:17:25,073 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed command:null
DN向两个NN发起连接请求,由于IP地址发生变化,所以首次连接会发生WrapException,之后,RPC连接成功,向NN注册和握手,发送块报告获得NN的返回命令。
可见,数据节点端的块的数量和修改DN的ip之前相比,只有datanode0没有发生变化。
hadoop-daemon.sh start/stop datanode,重启datanode0数据节点的DN服务。重启之后,在datanode0端会有如下日志打印:
2012-12-16 03:28:20,845 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@1cecd92c
2012-12-16 03:28:20,847 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-2147169311-10.28.169.122-1355378443940.
2012-12-16 03:28:20,879 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added bpid=BP-2147169311-10.28.169.122-1355378443940 to blockPoolScannerMap, new size=1
重启之后,会发送块报告,并且启动块扫描,对块池中的上一次启动之后新增的所有块进行验证。
2147169311-10.28.169.122-1355378443940:blk_-2110417446587413716_4193
2012-12-16 03:28:32,628 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-2147169311-10.28.169.122-1355378443940:blk_-4904216641349090176_3919
2012-12-16 03:28:32,629 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-2147169311-10.28.169.122-1355378443940:blk_-2305285786919610835_4205
2012-12-16 03:28:32,630 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-2147169311-10.28.169.122-1355378443940:blk_7458965476558187997_3659
2012-12-16 03:28:32,631 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-2147169311-10.28.169.122-1355378443940:blk_8899992160584874259_3671
2012-12-16 03:28:32,642 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-2147169311-10.28.169.122-1355378443940:blk_-7669541053393372310_4689
并且,在更改DataNode0节点的IP地址到最后块数量达到一致的时间段内,DN和NN端的VERSION文件都不会发生变化,StorageID、ClusterID、BlockPoolID均不会发生变化。
3 结果和分析
通过以上测试,发现:
改变hostname和ip,均不会导致DN端的StorageID、ClusterID、BlockPoolID发生变化
ClusterID和BlockPoolID是在集群格式化的时候创建的,除非集群再次格式化,否则是不会再发生变化的;StorageID是在DataNode启动的时候创建,在DataStorage类中由方法createStorageID实现。
在创建StorageID号之前,会首先读取本地的VERSION文件(如果该文件存在的话),若id号为空,则创建新的StorageID号,否则就延用之前的StorageID号。这一步骤的实现是由DataStorage类中的setFieldsFromProperties方法完成。
由于所修改hostname或者IP的DN在修改之前都是正常运行的,所有VERSION文件是正常存在的,所以修改之后,DN仍沿用之前的ID号。
(2)若修改DN的hostname,则在DN的数据节点服务重启之后才有效,才能在HDFS的web界面中显示出来
DN节点只有在服务启动的时候才会获取本节点的hostname,然后通过和NN的握手,把hostname发给NN注册,创建新的DatanodeID,此功能在DataNode类中的createBPRegistration方法中实现。
(3)修改DN的IP地址之后,会首先出现WrapException异常,之后DN会重新和NN握手、注册,接收NN返回的命令
改变IP地址之前,是利用ip1和namenode建立的rpc连接,改变ip之后,DN的ip地址为ip2,已建立好的rpc连接不能正确返回结果,故此会抛出WrapException异常。此后,DN会重新和两个NN建立rpc连接,连接之后,会接受NN返回的命令,进行相应的处理。
(4)
1)若改变IP地址,在超时时间(630秒)后NN判断DN死亡,此时无论是否再重启DN端的数据节点服务,都会导致块未死亡数据节点出现block的移动;
2)若改变hostname或者IP地址,在超时时间(630秒)内,重启DN端的数据节点服务,不会导致未死亡数据节点出现block的移动。
在Hadoop2.0中,NN判断DN心跳超时的时间间隔为:
heartbeatExpireInterval
10 * 1000 * heartbeatIntervalSeconds
其中,heartbeatRecheckInterval 的默认值为5*60*1000,heartbeatIntervalSeconds的默认值为3,所以超时时间为:
10分钟+30秒=630秒
在超过630秒的超时时间后,NN和DN仍未建立好心跳关系,NN判定DN已经死亡,由于每个Block都有一定数量的副本数,当DN死亡之后,有些Block的副本数目会发生变化,NN检测到之后,会从Live的DataNode节点上进行块的复制,来保证所有的Block的副本数不会因为某个DN的死亡而发生变化。
当在超时时间内,就重启DN服务之后,DN与NN会重新建立连接,DN为Live状态,所以Block的副本数量不会发生变化,不会出现从一个DN节点向另一个DN节点进行块的复制传输操作出现。
(5)改变IP地址之后,需要等待17分钟左右之后,才会在DN端有日志打印出来
CentOS Linux设定的TCP连接超时是1分钟,当连接失败后会尝试连接15次,其中超时时间在/proc/sys/net/ipv4/tcp_fin_timeout文件中设定,重新尝试连接的次数在/proc/sys/net/ipv4/tcp_retries2文件中设定。当重试次数达到15之后,Linux系统就把失效的TCP连接清除掉。同时,RCP连接中等待应答是阻塞的,期间没有日志打出,而且响应超时为1分钟,再加上网络的原因、操作的原因等,所以DN端会有20分钟左右的时间处于“停滞”状态都属于正常,不会有心跳方面的日志打出。之后,会重新建立新的心跳连接,恢复正常。
改变IP之前的TCP连接情况:
[root@datanode0 ~]# netstat -anptl | grep 9000
tcp 0 382 10.28.169.231:42116 10.28.169.121:9000 ESTABLISHED 7546/java
tcp 0 382 10.28.169.231:60975 10.28.169.122:9000 ESTABLISHED 7546/java
改变IP之后的TCP连接情况:
[root@datanode0 ~]# netstat -anptl | grep 9000
tcp 0 0 10.28.169.225:55198 10.28.169.121:9000 ESTABLISHED 7546/java
tcp 0 0 10.28.169.225:33945 10.28.169.122:9000 ESTABLISHED 7546/java