Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
541 views
in Technique[技术] by (71.8m points)

scala - java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ while running TwitterPopularTags

I am a beginner in Spark streaming and Scala. For a project requirement I was trying to run TwitterPopularTags example present in github. As SBT assembly was not working for me and I was not familiar with SBT I am trying to use Maven for building. After a lot of initial hiccups, I was able to create the jar file. But while trying to execute it I am getting the following error. Can anybody help me in resolving this?

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$
    at TwitterPopularTags$.main(TwitterPopularTags.scala:43)
    at TwitterPopularTags.main(TwitterPopularTags.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterUtils$
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 9 more

I have added following dependencies Spark-streaming_2.10:1.1.0 Spark-core_2.10:1.1.0 Spark-streaming-twitter_2.10:1.1.0

I even tried the 1.2.0 for Spark-streaming-twitter but that also was giving me the same error.

Thanks for the help in advance.

Regards, vpv

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Thank you for giving your suggestion. I was able to resolve this issue by using SBT assembly only. Following is the details regarding how I did this.

Spark - Already present in Cloudera VM Scala - Not sure if this is present in Cloudera, if not we can install it SBT - This also needs to be installed. I did both the installs on my local machine and transferred the Jar to the VM. For installing this I used the following link

https://gist.github.com/visenger/5496675

1) Once all these are created. We have to create the parent folder for our project. I created a folder called Twitter.

2) Create another folder with the following structure Twitter/src/main/scala and created the .scala file in this folder with the name TwitterPopularTags.scala. This has slight changes from the code which we got from the github. I had to change the import statements

import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkConf

3) After this, create another folder under the parent folder with the following name

Twitter/project

and create a file with the name assembly.sbt . This has the path for the assembly plugin. Following is the full code present in the file.

resolvers += Resolver.url("sbt-plugin-releases-scalasbt", url("http://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/"))

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")

4) Once the above two are created, create a file in the parent directory of the project (Twitter) with the name build.sbt. This is where we need to provide the name of the .Jar file we need to create and also the dependencies. Please note that even the blank lines between the codes in this file are important.

name := "TwitterPopularTags"

version := "1.0"

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
   {
    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first
   }
}

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.1.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" % "1.2.0" 

libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3" 

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

5) Finally we have to open the terminal and go to the parent folder of the project (Twitter). From here enter the following command:

sbt assembly

This will download the dependencies and create the jar file we need.

6) In order to run the program we need a twitter app created under our ID and provide the auth token and other details. The detailed step on how to create this is present in following link.

http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html

7) Once we have all the above done we can use the spark-submit command from VM to run the job. Example command is

./bin/spark-submit 
  --class TwitterPopularTags 
  --master local[4] 
  /path/to/TwitterPopilarTags.jar 
  comsumerkey consumersecret accesstoken accesssecret 

8) This prints the output to the console so to monitor the output it is better to reduce the frequency by adjusting the code.

Please let me know if any more details are required.

Thanks & Regards,

VPV


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...