pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver

Question

Welcome To Ask or Share your Answers For Others

pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver

I use docker image sequenceiq/spark on my Mac to study these spark examples, during the study process, I upgrade the spark inside that image to 1.6.1 according to this answer, and the error occurred when I start the Simple Data Operations example, here is what happened:

when I run df = sqlContext.read.format("jdbc").option("url",url).option("dbtable","people").load() it raise a error, and the full stack with the pyspark console is as followed:

Python 2.6.6 (r266:84292, Jul 23 2015, 15:22:56)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
16/04/12 22:45:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_   version 1.6.1
      /_/

Using Python version 2.6.6 (r266:84292, Jul 23 2015 15:22:56)
SparkContext available as sc, HiveContext available as sqlContext.
>>> url = "jdbc:mysql://localhost:3306/test?user=root;password=myPassWord"
>>> df = sqlContext.read.format("jdbc").option("url",url).option("dbtable","people").load()
16/04/12 22:46:05 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/12 22:46:06 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/12 22:46:11 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/04/12 22:46:11 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/04/12 22:46:16 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/12 22:46:17 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/spark/python/pyspark/sql/readwriter.py", line 139, in load
    return self._df(self._jreader.load())
  File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/usr/local/spark/python/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o23.load.
: java.sql.SQLException: No suitable driver
    at java.sql.DriverManager.getDriver(DriverManager.java:278)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:50)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:50)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:49)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:120)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
    at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:259)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:744)

>>>

Here is what I have tried till now:

Download mysql-connector-java-5.0.8-bin.jar, and put it in to /usr/local/spark/lib/. It still the same error.

Create t.py like this:

from pyspark import SparkContext  
from pyspark.sql import SQLContext  

sc = SparkContext(appName="PythonSQL")  
sqlContext = SQLContext(sc)  
df = sqlContext.read.format("jdbc").option("url",url).option("dbtable","people").load()  

df.printSchema()  
countsByAge = df.groupBy("age").count()  
countsByAge.show()  
countsByAge.write.format("json").save("file:///usr/local/mysql/mysql-connector-java-5.0.8/db.json")

then, I tried spark-submit --conf spark.executor.extraClassPath=mysql-connector-java-5.0.8-bin.jar --driver-class-path mysql-connector-java-5.0.8-bin.jar --jars mysql-connector-java-5.0.8-bin.jar --master local[4] t.py. The result is still the same.

Then I tried pyspark --conf spark.executor.extraClassPath=mysql-connector-java-5.0.8-bin.jar --driver-class-path mysql-connector-java-5.0.8-bin.jar --jars mysql-connector-java-5.0.8-bin.jar --master local[4] t.py, both with and without the following t.py, still the same.

During all of this, the mysql is running. And here is my os info:

# rpm --query centos-release  
centos-release-6-5.el6.centos.11.2.x86_64

And the hadoop version is 2.6.

Now I don't where to go next, so I hope some one can help give some advice, thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T01:15:10+0000

I ran into "java.sql.SQLException: No suitable driver" when I tried to have my script write to MySQL.

Here's what I did to fix that.

In script.py

df.write.jdbc(url="jdbc:mysql://localhost:3333/my_database"
                  "?user=my_user&password=my_password",
              table="my_table",
              mode="append",
              properties={"driver": 'com.mysql.jdbc.Driver'})

Then I ran spark-submit this way

SPARK_HOME=/usr/local/Cellar/apache-spark/1.6.1/libexec spark-submit --packages mysql:mysql-connector-java:5.1.39 ./script.py

Note that SPARK_HOME is specific to where spark is installed. For your environment this https://github.com/sequenceiq/docker-spark/blob/master/README.md might help.

In case all the above is confusing, try this:
In t.py replace

sqlContext.read.format("jdbc").option("url",url).option("dbtable","people").load()

with

sqlContext.read.format("jdbc").option("dbtable","people").option("driver", 'com.mysql.jdbc.Driver').load()

And run that with

spark-submit --packages mysql:mysql-connector-java:5.1.39 --master local[4] t.py

Categories

pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver

pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags