I have setup a Dataproc cluster on Google Cloud.
It is sup and running and I can access HDFS and copy files from the SSH 'in browser" console. So the problem is not on the Dataproc side.
I am now using Pentaho (ELT software) to copy files.
Pentaho needs to access the Master and the Data Nodes.
I have the following error message :
456829 [Thread-143] WARN org.apache.hadoop.hdfs.DataStreamer - Abandoning BP-1097611520-10.132.0.7- 1611589405814:blk_1073741911_1087
456857 [Thread-143] WARN org.apache.hadoop.hdfs.DataStreamer - Excluding datanode DatanodeInfoWithStorage[10.132.0.9:9866,DS-6586e84b-cdfd-4afb-836a-25348a5080cb,DISK]
456870 [Thread-143] WARN org.apache.hadoop.hdfs.DataStreamer - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/jmonteilx/pentaho-shim-test-file.test could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1819)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2569)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
The IP address used in the log is the internal IP of my firs datanode in Dataproc.
I need to use the External IP.
My question is the following,
Anything to change in the config files in the client file to do so ?
I have tried :
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
Without success,
Many thanks,
question from:
https://stackoverflow.com/questions/65938388/google-dataproc-writing-from-a-client-app-uses-clusters-internal-ip-for-datanod 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…