Reading at this and this makes me think it is possible to have a python file be executed by spark-submit
however I couldn't get it to work.
My setup is a bit complicated. I require several different jars to be submitted together with my python files in order for everything to function. My pyspark
command which works is the following:
IPYTHON=1 ./pyspark --jars jar1.jar,/home/local/ANT/bogoyche/dev/rhine_workspace/env/Scala210-1.0/runtime/Scala2.10/scala-library.jar,jar2.jar --driver-class-path jar1.jar:jar2.jar
from sys import path
path.append('my-module')
from my-module import myfn
myfn(myargs)
I have packaged my python files inside an egg, and the egg contains the main file, which makes the egg executable by calling python myegg.egg
I am trying now to form my spark-submit
command and I can't seem to get it right. Here's where I am:
./spark-submit --jars jar1.jar,jar2.jar --py-files path/to/my/egg.egg arg1 arg
Error: Cannot load main class from JAR file:/path/to/pyspark/directory/arg1
Run with --help for usage help or --verbose for debug output
Instead of executing my .egg file, it is taking the first argument of the egg and considers it a jar file and tries to load a class from it? What am I doing wrong?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…