[jira] [Created] (MAHOUT-1762) Pick up $SPARK_HOME/conf/spark-defaults.conf on startup

Discussion:

[jira] [Created] (MAHOUT-1762) Pick up $SPARK_HOME/conf/spark-defaults.conf on startup

Sergey Tryuber (JIRA)

2015-08-10 15:18:45 UTC

Sergey Tryuber created MAHOUT-1762:
--------------------------------------

Summary: Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Wish
Components: spark
Reporter: Sergey Tryuber

[spark-defaults.conf|http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties] is aimed to contain global configuration for Spark cluster. For example, in our HDP2.2 environment it contains:
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.

This happens because [org.apache.mahout.sparkbindings.shell.Main|https://github.com/apache/mahout/blob/master/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala] is executed directly in [initialization script|https://github.com/apache/mahout/blob/master/bin/mahout]:
{code}
"$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" "org.apache.mahout.sparkbindings.shell.Main" $@
{code}
In contrast, in Spark shell is indirectly invoked through spark-submit in [spark-shell|https://github.com/apache/spark/blob/master/bin/spark-shell] script:
{code}
"$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main "$@"
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).

So there are two possible solutions:
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Dmitriy Lyubimov

2015-08-10 16:56:13 UTC

This is very reasonable. Please feel free to do a PR to this Jira. Please
put MAHOUT-1762 in the description of the PR. It is quite ok to PR a WIP.

Post by Sergey Tryuber (JIRA)
--------------------------------------
Summary: Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Wish
Components: spark
Reporter: Sergey Tryuber
[spark-defaults.conf|
http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties]
is aimed to contain global configuration for Spark cluster. For example, in
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0â2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0â2041
{noformat}
and there are many other good things. Actually it is expected that when a
user starts Spark Shell, it will be working fine. Unfortunately this does
not happens with Mahout Spark Shell, because it ignores spark configuration
and user has to copy-past lots of options into _MAHOUT_OPTS_.
This happens because [org.apache.mahout.sparkbindings.shell.Main|
https://github.com/apache/mahout/blob/master/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala]
is executed directly in [initialization script|
{code}
"$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH"
{code}
In contrast, in Spark shell is indirectly invoked through spark-submit in
[spark-shell|https://github.com/apache/spark/blob/master/bin/spark-shell]
{code}
{code}
[SparkSubmit|
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
contains an additional initialization layer for loading properties file
(see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|
https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Pat Ferrel (JIRA)

2015-11-05 17:47:27 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992093#comment-14992093 ]

Pat Ferrel commented on MAHOUT-1762:
------------------------------------

Very good point. We need to move to spark-submit and away from directly creating the Spark context IMHO. I'd vote to put reworking the launcher code for the shell and drivers on the roadmap for 0.12.0.

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Wish
Components: spark
Reporter: Sergey Tryuber
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Pat Ferrel (JIRA)

2015-11-05 17:49:27 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pat Ferrel updated MAHOUT-1762:
-------------------------------
Fix Version/s: 0.12.0

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Wish
Components: spark
Reporter: Sergey Tryuber
Fix For: 0.12.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Pat Ferrel (JIRA)

2015-11-05 18:00:31 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pat Ferrel updated MAHOUT-1762:
-------------------------------
Fix Version/s: (was: 0.12.0)
1.0.0

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Wish
Components: spark
Reporter: Sergey Tryuber
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Suneel Marthi (JIRA)

2015-11-05 18:17:27 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1762:
----------------------------------
Issue Type: Improvement (was: Wish)

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Suneel Marthi (JIRA)

2015-11-06 23:17:11 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1762:
----------------------------------
Assignee: Pat Ferrel

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Pat Ferrel
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Jonathan Kelly (JIRA)

2016-02-08 22:51:39 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137921#comment-15137921 ]

Jonathan Kelly commented on MAHOUT-1762:
----------------------------------------

Just curious, has any work been done on this yet? If not, what is the timeline for Mahout 0.12.0? Thanks!

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Pat Ferrel
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Suneel Marthi (JIRA)

2016-03-17 14:17:33 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199583#comment-15199583 ]

Suneel Marthi commented on MAHOUT-1762:
---------------------------------------

[~pferrel] I don't think we would have time to address this for 0.12.0 and should be punned to a subsequent release. But we really need this fixed in the minor release following 0.12.0.

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Pat Ferrel
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Pat Ferrel (JIRA)

2016-03-17 15:06:33 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199672#comment-15199672 ]

Pat Ferrel commented on MAHOUT-1762:
------------------------------------

I agree with the reasoning for this but the drivers have a pass-through to Spark for arbitrary key=value pairs and switching to sparksubmit was voted down so it was never done. If you are using Mahout as a lib you can set anything in the SparkConf that you want so not sure what is remaining here but a more than reasonable complaint about how the launcher scripts are structured.

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Pat Ferrel
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Pat Ferrel (JIRA)

2016-03-17 15:18:33 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199694#comment-15199694 ]

Pat Ferrel commented on MAHOUT-1762:
------------------------------------

Do you know of something that is blocked by this? Not sure what is being asked for.

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Pat Ferrel
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Jonathan Kelly (JIRA)

2016-03-17 15:21:33 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199696#comment-15199696 ]

Jonathan Kelly commented on MAHOUT-1762:
----------------------------------------

Why was using spark-submit voted down? (And where? On a JIRA or on the mailing list?) Was it only voted down for now (e.g., due to a time constraint), or are you not planning on switching ever?

I think using spark-submit is Spark's recommended way of invoking Spark, even for something like Mahout on Spark. Zeppelin and spark-jobserver used to do something similar to what Mahout on Spark is doing now but have long since switched to using spark-submit. I'm not too familiar with Hive on Spark, but it looks from a quick glance at the source that it is also using spark-submit.

In short, I'd really suggest using spark-submit for Mahout as well, at least in order to match what most other apps are doing and in order to follow best practices.

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Pat Ferrel
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Pat Ferrel (JIRA)

2016-03-17 15:22:33 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pat Ferrel resolved MAHOUT-1762.
--------------------------------
Resolution: Won't Fix

We don't know of anything this blocks and moving to using sparksubmit was voted down, which only applies to Mahout CLI drivers anyway. All CLI drivers support passthrough of arbitrary key=value pairs, which go into the SparkConf and when using Mahout as a Lib you can create any arbitrary SparkConf.

Will not fix unless someone can explain the need.

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Pat Ferrel
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Suneel Marthi (JIRA)

2016-03-17 15:36:33 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1762:
----------------------------------
Comment: was deleted

(was: [~pferrel] I don't think we would have time to address this for 0.12.0 and should be punned to a subsequent release. But we really need this fixed in the minor release following 0.12.0. )

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Pat Ferrel
Fix For: 1.0.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Suneel Marthi (JIRA)

2016-03-17 15:57:33 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1762:
----------------------------------
Fix Version/s: (was: 1.0.0)
0.12.0

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Pat Ferrel
Fix For: 0.12.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Trevor Grant (JIRA)

2017-03-06 04:57:32 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant reassigned MAHOUT-1762:
------------------------------------

Assignee: Trevor Grant (was: Pat Ferrel)
Fix Version/s: (was: 0.12.0)
0.13.0

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Trevor Grant
Fix For: 0.13.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

ASF GitHub Bot (JIRA)

2017-03-06 05:33:32 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896764#comment-15896764 ]

ASF GitHub Bot commented on MAHOUT-1762:
----------------------------------------

GitHub user rawkintrevo opened a pull request:

https://github.com/apache/mahout/pull/292

MAHOUT-1762 Utilize spark-submit in bin/mahout script

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rawkintrevo/mahout mahout-1762

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #292

----
commit c5451287b54d55ce586ecfbb340c9bb023385765
Author: rawkintrevo <***@gmail.com>
Date: 2017-03-06T05:31:42Z

MAHOUT-1762 Utilize spark-submit in bin/mahout script

----

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Trevor Grant
Fix For: 0.13.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Andrew Palumbo (JIRA)

2017-03-06 06:02:32 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo reassigned MAHOUT-1762:
--------------------------------------

Assignee: Andrew Palumbo (was: Trevor Grant)

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Andrew Palumbo
Fix For: 0.13.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Andrew Palumbo (JIRA)

2017-03-06 06:03:32 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo reassigned MAHOUT-1762:
--------------------------------------

Assignee: Trevor Grant (was: Andrew Palumbo)

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Trevor Grant
Fix For: 0.13.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

ASF GitHub Bot (JIRA)

2017-03-06 17:01:32 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897648#comment-15897648 ]

ASF GitHub Bot commented on MAHOUT-1762:
----------------------------------------

Github user pferrel commented on the issue:

https://github.com/apache/mahout/pull/292

@rawkintrevo the real problem is missing classes from the dependency-reduced jar, right? How does this solve that?

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Trevor Grant
Fix For: 0.13.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

ASF GitHub Bot (JIRA)

2017-03-06 23:05:32 UTC

[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898336#comment-15898336 ]

ASF GitHub Bot commented on MAHOUT-1762:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/292

@pferrel no, the real problem was that the classes were not being shipped out to the cluster. This solves by leveraging spark-submit to ship jars.

Post by Sergey Tryuber (JIRA)
Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
-------------------------------------------------------
Key: MAHOUT-1762
URL: https://issues.apache.org/jira/browse/MAHOUT-1762
Project: Mahout
Issue Type: Improvement
Components: spark
Reporter: Sergey Tryuber
Assignee: Trevor Grant
Fix For: 0.13.0
{noformat}
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
{noformat}
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
{code}
{code}
{code}
{code}
[SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
* use proper Spark-like initialization logic
* use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

20 Replies
2 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Sergey Tryuber (JIRA) 2015-08-10 15:18:45 UTC

Dmitriy Lyubimov 2015-08-10 16:56:13 UTC

Pat Ferrel (JIRA) 2015-11-05 17:47:27 UTC

Pat Ferrel (JIRA) 2015-11-05 17:49:27 UTC

Pat Ferrel (JIRA) 2015-11-05 18:00:31 UTC

Suneel Marthi (JIRA) 2015-11-05 18:17:27 UTC

Suneel Marthi (JIRA) 2015-11-06 23:17:11 UTC

Jonathan Kelly (JIRA) 2016-02-08 22:51:39 UTC

Suneel Marthi (JIRA) 2016-03-17 14:17:33 UTC

Pat Ferrel (JIRA) 2016-03-17 15:06:33 UTC

Pat Ferrel (JIRA) 2016-03-17 15:18:33 UTC

Jonathan Kelly (JIRA) 2016-03-17 15:21:33 UTC

Pat Ferrel (JIRA) 2016-03-17 15:22:33 UTC

Suneel Marthi (JIRA) 2016-03-17 15:36:33 UTC

Suneel Marthi (JIRA) 2016-03-17 15:57:33 UTC

Trevor Grant (JIRA) 2017-03-06 04:57:32 UTC

ASF GitHub Bot (JIRA) 2017-03-06 05:33:32 UTC

Andrew Palumbo (JIRA) 2017-03-06 06:02:32 UTC

Andrew Palumbo (JIRA) 2017-03-06 06:03:32 UTC

ASF GitHub Bot (JIRA) 2017-03-06 17:01:32 UTC

ASF GitHub Bot (JIRA) 2017-03-06 23:05:32 UTC

about - legalese

Loading...