When performing a shuffle my Spark job fails and says "no space left on device", but when I run df -h it says I have free space left! Why does this happen, and how can I fix it?
df -h
By default Spark uses the /tmp directory to store intermediate data. If you actually do have space left on some device -- you can alter this by creating the file SPARK_HOME/conf/spark-defaults.conf and adding the line. Here SPARK_HOME is wherever you root directory for the spark install is.
Spark
/tmp
SPARK_HOME/conf/spark-defaults.conf
SPARK_HOME
spark.local.dir SOME/DIR/WHERE/YOU/HAVE/SPACE
2.1m questions
2.1m answers
60 comments
57.0k users