I have a Spark Master and Worker running on a remote Ubuntu Linux machine.
I'm trying to run the JavaDirectKafkaWordCount example. But when I submit from my Windows machine to the Linux cluster I get
C:/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --master spark://10.123.45.67:6066 --deploy-mode cluster --class com.company.spark.app.JavaDirectKafkaWordCount "C:/Dev/spark-app/target/spark-app-1.0-SNAPSHOT.jar" kafka-server:9092 topic1
Running Spark using the REST application submission protocol. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/02/11 11:53:20 INFO RestSubmissionClient: Submitting a request to launch an application in spark://10.123.45.67:6066.
16/02/11 11:53:21 INFO RestSubmissionClient: Submission successfully created as driver-20160211115129-0009. Polling submission state...
16/02/11 11:53:21 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160211115129-0009 in spark://10.123.45.67:6066.
16/02/11 11:53:21 INFO RestSubmissionClient: State of driver driver-20160211115129-0009 is now ERROR.
16/02/11 11:53:21 INFO RestSubmissionClient: Driver is running on worker worker-20160211111114-172.18.0.8-59825 at 172.18.0.8:59825.
16/02/11 11:53:21 ERROR RestSubmissionClient: Exception from the cluster:
java.io.IOException: No FileSystem for scheme: C
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:79) 16/02/11 11:53:21 INFO RestSubmissionClient: Server responded with
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20160211115129-0009",
"serverSparkVersion" : "1.6.0",
"submissionId" : "driver-20160211115129-0009",
"success" : true
It looks like Spark is taking C: as the schema. Try changing to
C:/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --master spark://10.123.45.67:6066 --deploy-mode cluster --class com.company.spark.app.JavaDirectKafkaWordCount "file:///C:/Dev/spark-app/target/spark-app-1.0-SNAPSHOT.jar" kafka-server:9092 topic1
file:/// to the file path you are telling Spark that the schema is
file, so it will not get confused and use
C as the schema. Jars submitted to spark using file should get hosted by spark so the cluster can see them.
The application jar file you input at the submit command use the windows location, but according for the spark official document:
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
So the file must be exist or accessible from each of the cluster member, if use the local file system, you must make sure the file exist at very node.
At my local environment, I use the file:///opt/spark-2.0.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.1.jar, it can pass the test.