Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
559 views
in Technique[技术] by (71.8m points)

pyspark - Spark-submit can't locate local file

I've written a very simple python script for testing my spark streaming idea, and plan to run it on my local machine to mess around a little bit. Here is the command line:

spark-submit spark_streaming.py localhost 9999

But the terminal threw me an error:

Error executing Jupyter command '<the/spark_streaming.py/file/path>': [Errno 2] No such file or directory

I have no idea why this would happen, and I'm sure the .py file does exist.

EDIT: there's no issue running it with python instead of spark-submit

And also, the lines added in the .bashrc file:

export PATH="/usr/local/spark/bin:$PATH"
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export SPARK_LOCAL_IP=localhost
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Supposing you want to spark-submit to YARN a Python script located at /home/user/scripts/spark_streaming.py, the correct syntax is as follows:

spark-submit --master yarn --deploy-mode client /home/user/scripts/spark_streaming.py

You can interchange the ordering of the various flags, but the script itself must be at the end; if your script accepts arguments, they should follow the script name (e.g. see this example for calculating pi with 10 decimal digits).

For executing locally with, say, 2 cores, you should use --master local[2] - use --master local[*] for all available local cores (no deploy-mode flag in both cases).

Check the docs for more info (although admittedly they are rather poor in pyspark demonstrations).

PS The mention of Jupyter, as well the path shown in your error message are extremely puzzling...

UPDATE: Seems that PYSPARK_DRIVER_PYTHON=jupyter messes up everything, funneling the execution through Jupyter (which is undesirable here, and it may explain the weird error message). Try modifying the environment variables in your .bashrc as follows:

export SPARK_HOME="/usr/local/spark"  # do not include /bin
export PYSPARK_PYTHON=python
export PYSPARK_DRIVER_PYTHON=python
export PYSPARK_DRIVER_PYTHON_OPTS=""

and source .bashrc.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...