java - How do I set an environment variable in a YARN Spark job? -


Using an AccumuloInputFormat , trying to reach a job (written in Java) I am newAPIHadoopRDD . To do this, you have to ask me to locate the ZooKeeper by calling AccumuloInputFormat by calling the setZooKeeperInstance method. This method takes a client configuration object which specifies the various relevant properties.

I considered the ClientConfiguration object static loadDefault method to look for a client.conf file in different locations on this method , Which is loading its default. This is such a place which is going to see $ ACCUMULO_CONF_DIR / client.conf .

Therefore, I am trying to set the ACCUMULO_CONF_DIR environment variable in this way that there is a way that when the spark works, it will appear (for reference, i yarn -cluster trying to run in deployment mode) I have not found any way to do this yet.

So far I have tried:

  • Calling setExecutorEnv ("ACCUMULO_CONF_DIR", "/ etc
  • export spark-env.sh in ACCUMULO_CONF_DIR
  • setting Spark.executorEnv.ACCUMULO_CONF_DIR in spark-defaults.conf

    None of them have worked. When I set setZooKeeperInstance < If I print the environment before calling on / code>, ACCUMULO_CONF_DIR is not visible.

    If this is relevant, then I Here's an example of what I'm trying to do (import and exception handling left for concise and concise behavior):

     < Code> Public Square MySpacz {Public Static Zero Main (String [] args) {SparkConf Spark Confe = New Spark Confe; sparkConf.setAppName ("MySparkJob"); SparkConf.setExecutorEnv ("ACcUMULO_CONF_DIR", "/ etc / accumulo / Conf "); JavaSparkContext sc = new JavaSparkContext (sparkConf); Job accumuloJob = Job.getInstance (sc.hadoop configuration ()); // Foreach loop to print the environment, no ACCUMULO_CONF_DIR client does not show configuration cache Configuration = Client Configuration. Load default (); AccumuloInputFormat.setZooKeeperInstance (accumuloJob, Deposit Configuration); // AccumuloInputFormat Other calls for static functions To configure it correctly, JavaPairRDD & lt; Key, value & gt; AccumuloRDD = sc.newAPIHadoopRDD (accumuloJob.getConfiguration (), AccumuloInputFormat.class, Key.class, Value.class); }}    

    So I answered the answer to the question (sorry) , Reputation seekers). The problem is that the CDH5 uses Spark 1.0.0, and that I was running a job through the yarn. Clearly, the Yarn mode ignores the executable environment and instead uses the environmental variable SPARK_YARN_USER_ENV to control its environment, then visible in the environment at the point of view in the source of the question SPARK_YARN_USER_ENV in ACCUMULO_CONF_DIR = / etc / accumulo / conf works, and ACCUMULO_CONF_DIR ensures example.

    How the standalone mode and yarn mode work is different, which is fixed in SPARC 1.1.0.

Comments

Popular posts from this blog

java - ImportError: No module named py4j.java_gateway -

python - Receiving "KeyError" after decoding json result from url -

java - Session timeout does't work vaadin -