Submit a spark application from Windows to a Linux cluster

DarVar Source

I have a Spark Master and Worker running on a remote Ubuntu Linux machine.

I'm trying to run the JavaDirectKafkaWordCount example. But when I submit from my Windows machine to the Linux cluster I get

C:/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --master spark:// --deploy-mode cluster --class "C:/Dev/spark-app/target/spark-app-1.0-SNAPSHOT.jar" kafka-server:9092 topic1

Running Spark using the REST application submission protocol. Using Spark's default log4j profile: org/apache/spark/
16/02/11 11:53:20 INFO RestSubmissionClient: Submitting a request to launch an application in spark://
16/02/11 11:53:21 INFO RestSubmissionClient: Submission successfully created as driver-20160211115129-0009. Polling submission state...
16/02/11 11:53:21 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160211115129-0009 in spark://
16/02/11 11:53:21 INFO RestSubmissionClient: State of driver driver-20160211115129-0009 is now ERROR.
16/02/11 11:53:21 INFO RestSubmissionClient: Driver is running on worker worker-20160211111114- at
16/02/11 11:53:21 ERROR RestSubmissionClient: Exception from the cluster: No FileSystem for scheme: C
org.apache.spark.deploy.worker.DriverRunner$$anon$ 16/02/11 11:53:21 INFO RestSubmissionClient: Server responded with
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20160211115129-0009",
"serverSparkVersion" : "1.6.0",
"submissionId" : "driver-20160211115129-0009",
"success" : true



answered 3 years ago Michael Lloyd Lee mlk #1

It looks like Spark is taking C: as the schema. Try changing to

C:/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --master spark:// --deploy-mode cluster --class "file:///C:/Dev/spark-app/target/spark-app-1.0-SNAPSHOT.jar" kafka-server:9092 topic1

By adding file:/// to the file path you are telling Spark that the schema is file, so it will not get confused and use C as the schema. Jars submitted to spark using file should get hosted by spark so the cluster can see them.

answered 2 years ago Joshua #2

The application jar file you input at the submit command use the windows location, but according for the spark official document:

application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.

So the file must be exist or accessible from each of the cluster member, if use the local file system, you must make sure the file exist at very node.

At my local environment, I use the file:///opt/spark-2.0.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.1.jar, it can pass the test.

comments powered by Disqus