spark.master configuration via REST job submission in standalone cluster is ignored

kans Source

I have a Standalone spark cluster in HA mode (2 masters) and couple of workers registered there.

I submitted the spark job via REST interface with following details,

{
    "sparkProperties": {
        "spark.app.name": "TeraGen3",
        "spark.default.parallelism": "40",
        "spark.executor.memory": "512m",
        "spark.driver.memory": "512m",
        "spark.task.maxFailures": "3",
        "spark.jars": "file:///tmp//test//spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar",
        "spark.eventLog.enabled": "false",
        "spark.submit.deployMode": "cluster",
        "spark.driver.supervise": "true",
        "spark.master": "spark://spark-hn0:7077,spark-hn1:7077"
    },
    "mainClass": "com.github.ehiggs.spark.terasort.TeraGen",
    "environmentVariables": {
        "SPARK_ENV_LOADED": "1"
    },
    "action": "CreateSubmissionRequest",
    "appArgs": ["4g", "file:///tmp/data/teradata4g/"],
    "appResource": "file:///tmp//test//spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar",
    "clientSparkVersion": "2.1.1"
}

This request is submitted to the Active Spark Master via REST interface(http://spark-hn1:6066/v1/submissions/create).

When the driver got launched, -Dspark.master is set to "spark://spark-hn1:7077" instead of the value passed in sparkProperties, which is "spark://spark-hn0:7077,spark-hn1:7077".

Logs from the worker node where driver is running

17/12/18 13:29:49 INFO worker.DriverRunner: Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/bin/java" "-Dhdp.version=2.6.99.200-0" "-cp" "/usr/hdp/current/spark2-client/conf/:/usr/hdp/current/spark2-client/jars/*:/etc/hadoop/conf/" "-Xmx512M" "-Dspark.driver.memory=51
2m" "-Dspark.master=spark://spark-hn1:7077" "-Dspark.executor.memory=512m" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=TeraGen3" "-Dspark.default.parallelism=40" "-Dspark.jars=file:///tmp//test//spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar" "-Dspark.ta
sk.maxFailures=3" "-Dspark.driver.supervise=true" "-Dspark.eventLog.enabled=false" "org.apache.spark.deploy.worker.DriverWrapper" "spark://[email protected]:40803" "/var/spark/work/driver-20171218132949-0001/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar" "com.git
hub.ehiggs.spark.terasort.TeraGen" "4g" "file:///tmp/data/teradata4g/"

This is causing problem to me when the active master goes down during the job execution and the other master become active. Since the driver knows only one master (old one) it is not able to reach new master and continue the job execution (since spark.driver.supervise=true)

What is the right way of passing the multiple master urls in Spark REST interface.

apache-sparkapache-spark-standalone

Answers

answered 9 months ago kans #1

Looks like this is a bug in RestServer implementation, where the spark.master is being replaced. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L147

We can still workaround this by setting spark.master in spark.driver.extraJavaOptions while submitting the job via REST interface as follows

"sparkProperties": {
        "spark.app.name": "TeraGen3",
        ...
        "spark.driver.extraJavaOptions": "-Dspark.master=spark://spark-hn0:7077,spark-hn1:7077"
    }

This worked for me.

comments powered by Disqus