hadoop - could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation

Question

Welcome To Ask or Share your Answers For Others

hadoop - could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

hadoop - could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation

I don't know how to fix this error:

Vertex failed, vertexName=initialmap, vertexId=vertex_1449805139484_0001_1_00, diagnostics=[Task failed, taskId=task_1449805139484_0001_1_00_000003, diagnostics=[AttemptID:attempt_1449805139484_0001_1_00_000003_0 Info:Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/gridmix-kon/input/_temporary/1/_temporary/attempt_14498051394840_0001_m_000003_0/part-m-00003/segment-121 could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2010)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1561)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)

Any idea what's the case?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:30:33+0000

This error occurs in BlockManager::chooseTarget4NewBlock() (I am referring to the latest code) code. Specific piece of code, which causes this is:

final DatanodeStorageInfo[] targets = blockplacement.chooseTarget(src,
    numOfReplicas, client, excludedNodes, blocksize, 
    favoredDatanodeDescriptors, storagePolicy);

if (targets.length < minReplication) {
  throw new IOException("File " + src + " could only be replicated to "
      + targets.length + " nodes instead of minReplication (="
      + minReplication + ").  There are "
      + getDatanodeManager().getNetworkTopology().getNumOfLeaves()
      + " datanode(s) running and "
      + (excludedNodes == null? "no": excludedNodes.size())
      + " node(s) are excluded in this operation.");
}

This occurs, when the BlockManager tries to choose a target host for storing new block of data and can not find a single host (targets.length < minReplication). minReplication is set to 1 (configuration parameter: dfs.namenode.replication.min) in hdfs-site.xml file.

This could occur due to one of the following reasons:

Data Node instances are not running
Data Node instances are unable to contact the Name Node
Data Nodes have run out of space, hence no new block of data can be allocated to them

But, in your case, error message also contains following information:

There are 4 datanode(s) running and no node(s) are excluded in this operation.

It means, there are 4 Data Nodes running and all the 4 Data Nodes were considered for placement of data, for this operation.

So, possible suspect is disk space on the Data Nodes. You can check the disk space on your Data Nodes, using the following command:

hdfs dfsadmin -report

It gives report for each of your Live Data Nodes. For e.g. in my case, I got the following:

Live datanodes (1):

Name: 192.168.56.1:50010 (192.168.56.1)
Hostname: 192.168.56.1
Decommission Status : Normal
Configured Capacity: 648690003968 (604.14 GB)
DFS Used: 193849055737 (180.54 GB)
Non DFS Used: 186164975111 (173.38 GB)
DFS Remaining: 268675973120 (250.22 GB)
DFS Used%: 29.88%
DFS Remaining%: 41.42%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Dec 13 17:17:34 IST 2015

Check the "DFS-Remaining" and "DFS-Remaining%". That should give you an idea about the remaining space on your Data Nodes.

You can also refer to the wiki here: https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo, which describes the reasons for this error and ways to mitigate it.

Categories

hadoop - could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation

hadoop - could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags