周一发现hadoop集群down掉了
发现由于磁盘已满100%
删除无用文件后重启集群,发现还是起不来,错误如下:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = SFserver141.localdomain/192.168.15.141
- STARTUP_MSG: args = []
- STARTUP_MSG: version = 0.20.3-SNAPSHOT
- STARTUP_MSG: build = -r ; compiled by 'root' on Wed Jun 8 12:43:33 CST 2011
- ************************************************************/
- 2012-10-22 08:50:42,096 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000
- 2012-10-22 08:50:42,104 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: SFserver141.localdomain/192.168.15.141:9000
- 2012-10-22 08:50:42,112 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
- 2012-10-22 08:50:42,113 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
- 2012-10-22 08:50:42,169 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
- 2012-10-22 08:50:42,169 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroupsupergroup=supergroup
- 2012-10-22 08:50:42,169 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
- 2012-10-22 08:50:42,187 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
- 2012-10-22 08:50:42,188 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
- 2012-10-22 08:50:42,248 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 799968
- 2012-10-22 08:50:47,535 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 13
- 2012-10-22 08:50:47,540 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 102734547 loaded in 5 seconds.
- 2012-10-22 08:50:48,131 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /data/java/hadoop020/data/dfs.name.dir/current/edits of size 2749136 edits # 17772 loaded in 0 seconds.
- 2012-10-22 08:50:48,801 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: ""
- at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
- at java.lang.Integer.parseInt(Integer.java:470)
- at java.lang.Short.parseShort(Short.java:120)
- at java.lang.Short.parseShort(Short.java:78)
- at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.java:1311)
- at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:541)
- at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1011)
- at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:826)
- at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
- at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
- at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:315)
- at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:296)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:205)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:283)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:986)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:995)
- 2012-10-22 08:50:48,802 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at SFserver141.localdomain/192.168.15.141
- ************************************************************/
大致是因为edits这个文件出现问题;
上网查了不少文档,但由于没有设置secondarynamenode;所以没有edits的镜像文件
之后发现一篇文章写:
printf "\xff\xff\xff\xee\xff" > edits
把上面一段字符串写到edits文件中
重启正常
注:dfs.name.dir/current文件夹下还出现了edits.new的文件,我是删除的 不知道有没有影响