Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
756 views
in Technique[技术] by (71.8m points)

apache spark - How do I run pyspark with jupyter notebook?

I am trying to fire the jupyter notebook when I run the command pyspark in the console. When I type it now, it only starts and interactive shell in the console. However, this is not convenient to type long lines of code. Is there are way to connect the jupyter notebook to pyspark shell? Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I'm assuming you already have spark and jupyter notebooks installed and they work flawlessly independent of each other.

If that is the case, then follow the steps below and you should be able to fire up a jupyter notebook with a (py)spark backend.

  1. Go to your spark installation folder and there should be a bin directory there: /path/to/spark/bin

  2. Create a file, let's call it start_pyspark.sh

  3. Open start_pyspark.sh and write something like:

        #!/bin/bash
    
    export PYSPARK_PYTHON=/path/to/anaconda3/bin/python
    export PYSPARK_DRIVER_PYTHON=/path/to/anaconda3/bin/jupyter
    export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"
    
    pyspark "$@"
    

Replace the /path/to ... with the path where you have installed your python and jupyter binaries respectively.

  1. Most probably this step is already done, but just in case
    Modify your ~/.bashrc file by adding the following lines

        # Spark
        export PATH="/path/to/spark/bin:/path/to/spark/sbin:$PATH"
        export SPARK_HOME="/path/to/spark"
        export SPARK_CONF_DIR="/path/to/spark/conf"
    

Run source ~/.bashrc and you are set.

Go ahead and try start_pyspark.sh.
You could also give arguments to the script, something like start_pyspark.sh --packages dibbhatt:kafka-spark-consumer:1.0.14.

Hope it works out for you.

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...