I'm doing something about the combination of spark_with_hadoop2.7 (2.4.3), hadoop (3.2.0) and Ceph luminous. When I tried to use spark to access ceph (for example, start spark-sql
on shell), exception like below shows:
INFO impl.MetricsSystemImpl: s3a-file-system metrics system started
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredentialProviders(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/Class;)Lorg/apache/hadoop/conf/Configuration;
at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:740)
at org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider.<init>(SimpleAWSCredentialsProvider.java:58)
at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:600)
For NoSuchMethodError
, it's most likely because the compiled class version is different from running class version according to how-do-i-fix-a-nosuchmethoderror.
To access Ceph
, aws related jars aws-java-sdk-bundle-1.11.375.jar
and hadoop-aws-3.2.0.jar
under $HADOOP_HOME/share/hadoop/tools/lib
are actually used. I did operations below:
1, Copy those two jars to $SPARK_HOME/jars
2, Modify $HADOOP_HOME/etc/hadoop/hadoop-env.sh
to add statements below:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*
By doing steps above, I can start hdfs to access ceph, for example, I can use hdfs dfs -ls
to list folders under ceph bucket. It proves that the aws related jars works fine.(Just as per my understanding).
But why exceptions about aws s3a throw when I invoke spark?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…