java - How do I set an environment variable in a YARN Spark job? -
Using an AccumuloInputFormat , trying to reach a job (written in Java) I am
newAPIHadoopRDD . To do this, you have to ask me to locate the ZooKeeper by calling
AccumuloInputFormat by calling the
setZooKeeperInstance method. This method takes a
client configuration object which specifies the various relevant properties.
I considered the
ClientConfiguration object static
loadDefault
method to look for a
client.conf file in different locations on this method , Which is loading its default. This is such a place which is going to see
$ ACCUMULO_CONF_DIR / client.conf .
Therefore, I am trying to set the
ACCUMULO_CONF_DIR environment variable in this way that there is a way that when the spark works, it will appear (for reference, i
yarn -cluster trying to run in deployment mode) I have not found any way to do this yet.
So far I have tried:
- Calling
setExecutorEnv ("ACCUMULO_CONF_DIR", "/ etc
export
spark-env.sh in
ACCUMULO_CONF_DIR
setting
Spark.executorEnv.ACCUMULO_CONF_DIR in
spark-defaults.conf
None of them have worked. When I set
setZooKeeperInstance < If I print the environment before calling on / code>,
ACCUMULO_CONF_DIR is not visible.
If this is relevant, then I Here's an example of what I'm trying to do (import and exception handling left for concise and concise behavior):
< Code> Public Square MySpacz {Public Static Zero Main (String [] args) {SparkConf Spark Confe = New Spark Confe; sparkConf.setAppName ("MySparkJob"); SparkConf.setExecutorEnv ("ACcUMULO_CONF_DIR", "/ etc / accumulo / Conf "); JavaSparkContext sc = new JavaSparkContext (sparkConf); Job accumuloJob = Job.getInstance (sc.hadoop configuration ()); // Foreach loop to print the environment, no ACCUMULO_CONF_DIR client does not show configuration cache Configuration = Client Configuration. Load default (); AccumuloInputFormat.setZooKeeperInstance (accumuloJob, Deposit Configuration); // AccumuloInputFormat Other calls for static functions To configure it correctly, JavaPairRDD & lt; Key, value & gt; AccumuloRDD = sc.newAPIHadoopRDD (accumuloJob.getConfiguration (), AccumuloInputFormat.class, Key.class, Value.class); }}
So I answered the answer to the question (sorry) , Reputation seekers). The problem is that the CDH5 uses Spark 1.0.0, and that I was running a job through the yarn. Clearly, the Yarn mode ignores the executable environment and instead uses the environmental variable
SPARK_YARN_USER_ENV to control its environment, then visible in the environment at the point of view in the source of the question
SPARK_YARN_USER_ENV in
ACCUMULO_CONF_DIR = / etc / accumulo / conf works, and
ACCUMULO_CONF_DIR ensures example.
How the standalone mode and yarn mode work is different, which is fixed in SPARC 1.1.0.
Comments
Post a Comment