scala - Apache spark - java.lang.NoClassDefFoundError -


i have maven based mixed scala/java application can submit spar jobs. application jar "myapp.jar" has nested jars inside lib folder. 1 of "common.jar". have defined class-path attribute in manifest file class-path: lib/common.jar. spark executor throws java.lang.noclassdeffounderror:com/myapp/common/myclass error when submitting application in yarn-client mode. class(com/myapp/common/myclass.class) , jar(common.jar) there , nested inside main myapp.jar. fat jar created using spring-boot-maven plugin nest other jars inside lib folder of parent jar. prefer not create shaded flat jar create other issues. anyway spark executor jvm can load nested jars here?

edit spark (jvm classloader) can find classes flat inside myapp.jar itself. i.e. com/myapp/abc.class, com/myapp/xyz.class etc.

edit2 spark executor classloader can find classes nested jar throws noclassdeffounderror other classes in same nested jar! here's error:

caused by: org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 0.0 failed 4 times, recent failure: lost task 0.3 in stage 0.0 (tid 3, host4.local): java.lang.noclassdeffounderror: com/myapp/common/myclass     @ com.myapp.userprofilerdd$.parse(userprofilerddinit.scala:111)     @ com.myapp.userprofilerddinit$$anonfun$generateuserprofilerdd$1.apply(userprofilerddinit.scala:87)     @ com.myapp.userprofilerddinit$$anonfun$generateuserprofilerdd$1.applyuserprofilerddinit.scala:87)     @ scala.collection.iterator$$anon$11.next(iterator.scala:328)     @ org.apache.spark.storage.memorystore.unrollsafely(memorystore.scala:249)     @ org.apache.spark.cachemanager.putinblockmanager(cachemanager.scala:172)     @ org.apache.spark.cachemanager.getorcompute(cachemanager.scala:79)     @ org.apache.spark.rdd.rdd.iterator(rdd.scala:242)     @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:61)     @ org.apache.spark.scheduler.task.run(task.scala:64)     @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:203)     @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1145)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:615)     @ java.lang.thread.run(thread.java:744) caused by: java.lang.classnotfoundexception:  com.myapp.common.myclass     @ java.net.urlclassloader$1.run(urlclassloader.java:366)     @ java.net.urlclassloader$1.run(urlclassloader.java:355)     @ java.security.accesscontroller.doprivileged(native method)     @ java.net.urlclassloader.findclass(urlclassloader.java:354)     @ java.lang.classloader.loadclass(classloader.java:425)     @ java.lang.classloader.loadclass(classloader.java:358)     ... 14 more 

i submit myapp.jar sparkconf.setjar(string[] {"myapp.jar"}) , tried setting on spark.yarn.executor.extraclasspath

edit 3 workaround, extracted myapp.jar , set sparkconf.setjar(string[] {"myapp.jar","lib/common.jar"}) manually , error went away have nested jar not desirable.

you can use --jars options, give comma separated list of jars while starting spark application.

something like

spark-submit --jars lib/abc.jar,lib/xyz.jar --class <classname> myapp.jar 

Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -