I am getting the following excpetion in my reducers:
EMFILE: Too many open files
at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Per reducer around 10,000 files are being created. Is there a way I can set the ulimit of each box.
I tried using the following command as a bootstrap script:
ulimit -n 1000000
But this did not help at all.
I also tried the following in bootstrap action to replace the ulimit command in /usr/lib/hadoop/hadoop-daemon.sh:
#!/bin/bash
set -e -x
sudo sed -i -e "/^ulimit /s|.*|ulimit -n 134217728|" /usr/lib/hadoop/hadoop-daemon.sh
But even then when we log into master node I can see that ulimit -n returns : 32768.
I also confirmed that there was the desired change made in /usr/lib/hadoop/hadoop-daemon.sh and it had : ulimit -n 134217728.
Do we have any hadoop configurations for this?
Or is there a workaround for this?
My main aim is to split out records into files according to the ids of each record, and there are 1.5 billion records right now which can certainly increase.
Any way to edit this file before this daemon is run on each slave?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…