Discussion:
[jira] [Created] (MAHOUT-1894) Add support for Spark 2x backend
Suneel Marthi (JIRA)
2016-12-12 20:31:59 UTC
Permalink
Suneel Marthi created MAHOUT-1894:
-------------------------------------

Summary: Add support for Spark 2x backend
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.12.0
Reporter: Suneel Marthi
Fix For: 1.0.0


add support for Spark 2.x as backend execution engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Trevor Grant (JIRA)
2017-02-01 23:04:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant updated MAHOUT-1894:
---------------------------------
Sprint: Jan/Feb-2017
Affects Version/s: (was: 0.12.0)
0.13.0
Fix Version/s: (was: 0.13.1)
0.13.0
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-02 02:33:52 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849309#comment-15849309 ]

Trevor Grant commented on MAHOUT-1894:
--------------------------------------

When building Mahout against spark 2.0.2 I get one warning
```
[WARNING] /home/rawkintrevo/gits/mahout-local/spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala:127: warning: constructor SQLContext in class SQLContext is deprecated: Use SparkSession.builder instead
[WARNING] val sqlContext= new org.apache.spark.sql.SQLContext(sc)
```

And then in the tests a lot of whining about
```
Exception encountered when invoking run on a nested suite - requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs *** ABORTED ***
java.lang.IllegalArgumentException: requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs
```
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-02 02:55:52 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849309#comment-15849309 ]

Trevor Grant edited comment on MAHOUT-1894 at 2/2/17 2:55 AM:
--------------------------------------------------------------

When building Mahout against spark 2.0.2 I get one warning
```
[WARNING] /home/rawkintrevo/gits/mahout-local/spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala:127: warning: constructor SQLContext in class SQLContext is deprecated: Use SparkSession.builder instead
[WARNING] val sqlContext= new org.apache.spark.sql.SQLContext(sc)
```

And then in the tests a lot of whining about
```
Exception encountered when invoking run on a nested suite - requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs *** ABORTED ***
java.lang.IllegalArgumentException: requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs
```


Hard to say for sure because I had to skip the tests re: above issue, but if you DskipTests, mahout compiles up until the shell. Taking the resulting jars to zeppelin and running some basic funtions checks like linear regression- it seems to work.




was (Author: rawkintrevo):
When building Mahout against spark 2.0.2 I get one warning
```
[WARNING] /home/rawkintrevo/gits/mahout-local/spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala:127: warning: constructor SQLContext in class SQLContext is deprecated: Use SparkSession.builder instead
[WARNING] val sqlContext= new org.apache.spark.sql.SQLContext(sc)
```

And then in the tests a lot of whining about
```
Exception encountered when invoking run on a nested suite - requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs *** ABORTED ***
java.lang.IllegalArgumentException: requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs
```


Hard to say for sure because I had to skip the tests re: above issue, but if you `-DskipTests`, mahout compiles up until the shell. Taking the resulting jars to zeppelin and running some basic funtions checks like linear regression- it seems to work.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-02 02:55:52 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849309#comment-15849309 ]

Trevor Grant edited comment on MAHOUT-1894 at 2/2/17 2:55 AM:
--------------------------------------------------------------

When building Mahout against spark 2.0.2 I get one warning
```
[WARNING] /home/rawkintrevo/gits/mahout-local/spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala:127: warning: constructor SQLContext in class SQLContext is deprecated: Use SparkSession.builder instead
[WARNING] val sqlContext= new org.apache.spark.sql.SQLContext(sc)
```

And then in the tests a lot of whining about
```
Exception encountered when invoking run on a nested suite - requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs *** ABORTED ***
java.lang.IllegalArgumentException: requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs
```


Hard to say for sure because I had to skip the tests re: above issue, but if you `-DskipTests`, mahout compiles up until the shell. Taking the resulting jars to zeppelin and running some basic funtions checks like linear regression- it seems to work.




was (Author: rawkintrevo):
When building Mahout against spark 2.0.2 I get one warning
```
[WARNING] /home/rawkintrevo/gits/mahout-local/spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala:127: warning: constructor SQLContext in class SQLContext is deprecated: Use SparkSession.builder instead
[WARNING] val sqlContext= new org.apache.spark.sql.SQLContext(sc)
```

And then in the tests a lot of whining about
```
Exception encountered when invoking run on a nested suite - requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs *** ABORTED ***
java.lang.IllegalArgumentException: requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs
```


Hard to say for sure because I had to skip the tests re: above issue, but if you -DskipTests, mahout compiles up until the shell. Taking the resulting jars to zeppelin and running some basic funtions checks like linear regression- it seems to work.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-02 02:55:52 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849309#comment-15849309 ]

Trevor Grant edited comment on MAHOUT-1894 at 2/2/17 2:54 AM:
--------------------------------------------------------------

When building Mahout against spark 2.0.2 I get one warning
```
[WARNING] /home/rawkintrevo/gits/mahout-local/spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala:127: warning: constructor SQLContext in class SQLContext is deprecated: Use SparkSession.builder instead
[WARNING] val sqlContext= new org.apache.spark.sql.SQLContext(sc)
```

And then in the tests a lot of whining about
```
Exception encountered when invoking run on a nested suite - requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs *** ABORTED ***
java.lang.IllegalArgumentException: requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs
```


Hard to say for sure because I had to skip the tests re: above issue, but if you -DskipTests, mahout compiles up until the shell. Taking the resulting jars to zeppelin and running some basic funtions checks like linear regression- it seems to work.




was (Author: rawkintrevo):
When building Mahout against spark 2.0.2 I get one warning
```
[WARNING] /home/rawkintrevo/gits/mahout-local/spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala:127: warning: constructor SQLContext in class SQLContext is deprecated: Use SparkSession.builder instead
[WARNING] val sqlContext= new org.apache.spark.sql.SQLContext(sc)
```

And then in the tests a lot of whining about
```
Exception encountered when invoking run on a nested suite - requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs *** ABORTED ***
java.lang.IllegalArgumentException: requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs
```
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-02 03:04:52 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849318#comment-15849318 ]

Trevor Grant commented on MAHOUT-1894:
--------------------------------------

Spark 2.1.0 works the same.

Warning re SQL Context, errors in shell, but compiled jars pass basic functions test (fails on no MAHOUT_HOME otherwise)
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-02 06:33:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849507#comment-15849507 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

GitHub user rawkintrevo opened a pull request:

https://github.com/apache/mahout/pull/271

[MAHOUT-1894] Add Support for Spark 2.x

As long as we're sticking to Scala 2.10, running mahout on spark 2.x is simply a matter of

`mvn clean package -Dspark.version=2.0.2`
or
`mvn clean package -Dspark.version=2.1.0`

The trouble comes with the shell...

I checked Apache Zeppelin to see how they handle multiple spark/scala versions...
[a brief preview of the descent into hell that is having a shell that handles multiple spark/scala versions](https://github.com/apache/zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java)

So I took an alternate root. I dropped the Mahout shell all together, changed the mahout bin file to load the spark shell directly, and pass a scala script that takes care of our imports.

When building there is a single deprecation warning regarding the sqlContext and how it is created in the spark-bindings.

I think we should add binaries for Spark 2.0 and Spark 2.1 as a matter of convenience and the Zeppelin integration.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rawkintrevo/mahout mahout-1894

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #271

----
commit 867cdd0c04d629eaf44a0e2031f447d03bf67bcc
Author: rawkintrevo <***@gmail.com>
Date: 2017-02-02T06:18:21Z

MAHOUT-1894 Add support for spark 2.x

MAHOUT-1894 Add support for spark 2.x

----
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-02 13:40:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant reassigned MAHOUT-1894:
------------------------------------

Assignee: Trevor Grant
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-02 13:40:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on MAHOUT-1894 started by Trevor Grant.
--------------------------------------------
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-02 18:02:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850244#comment-15850244 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user pferrel commented on the issue:

https://github.com/apache/mahout/pull/271

I'm soooo into dropping a special Mahout shell, do your comments mean we just run Mahout classes in the Spark shell for Spark 2.x? Does this work with and without (@andrewpalumbo 's case) Zeppelin?

IF we can compile Mahout with Scala 2.11 fairly easily (excluding the shell) and IF we can run Mahout with some helper scripts in the Spark Shell, we can drop the Mahou Shell code and get all the advantages of using the plain Spark Shell with our extensions. Can/should this be done?

I realize I've asked these before but this seems the best forum.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-04 06:48:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852652#comment-15852652 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/271

@pferrel In short yes. The idea here is we entirely drop the Mahout Shell. It was also the blocker for upgrading to Spark 2.x.

The Zeppelin integration, for all intents and purposes is a spark shell + some imports and setting up the distributed context.

So that is what we're doing here.

Hopefully removing the shell will also clear the way for the Scala 2.11 upgrade / profile.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-05 00:15:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853002#comment-15853002 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/271

hmm.. just tried to launch into `local[4]` and blew it up:

```
AP-RE-X16743C45L:mahout apalumbo$ MASTER=local[4] mahout spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_102)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
Loading /Users/apalumbo/sandbox/mahout/bin/load-shell.scala...
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.sparkbindings._
sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = ***@73e0c775

_ _
_ __ ___ __ _| |__ ___ _ _| |_
'_ ` _ \ / _` | '_ \ / _ \| | | | __|
| | | | (_| | | | | (_) | |_| | |_
_| |_| |_|\__,_|_| |_|\___/ \__,_|\__| version 0.13.0



Exception in thread "main" java.io.FileNotFoundException: spark-shell (Is a directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at scala.reflect.io.File.inputStream(File.scala:97)
at scala.reflect.io.File.inputStream(File.scala:82)
at scala.reflect.io.Streamable$Chars$class.reader(Streamable.scala:93)
at scala.reflect.io.File.reader(File.scala:82)
at scala.reflect.io.Streamable$Chars$class.bufferedReader(Streamable.scala:98)
at scala.reflect.io.File.bufferedReader(File.scala:82)
at scala.reflect.io.Streamable$Chars$class.bufferedReader(Streamable.scala:97)
at scala.reflect.io.File.bufferedReader(File.scala:82)
at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:103)
at scala.reflect.io.File.applyReader(File.scala:82)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkILoop.scala:677)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:677)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:677)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$savingReplayStack(SparkILoop.scala:162)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply$mcV$sp(SparkILoop.scala:676)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply(SparkILoop.scala:676)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply(SparkILoop.scala:676)
at org.apache.spark.repl.SparkILoop.savingReader(SparkILoop.scala:167)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$interpretAllFrom(SparkILoop.scala:675)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:740)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:739)
at org.apache.spark.repl.SparkILoop.withFile(SparkILoop.scala:733)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loadCommand(SparkILoop.scala:739)

{...}

```
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-05 00:37:52 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853005#comment-15853005 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/271

Possibly a regression last night when I moved the location/ changed name of load.scala -> bin/load-shell.scala
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-05 00:50:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853011#comment-15853011 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/271

Confirmed shell explosion- fixed by deleting $MAHOUT_HOME/bin/metastore_db

My shell explosion was a slightly different flavor though. Can you try the above?
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-05 03:42:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853092#comment-15853092 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/271

On a CentOS based ec2 spark standalone instance with 3 workers, I'm not even getting a launch of the `$SPARK_HOME/bin/spark-shell`:
```
***@ip-47-108-23-12 spark]$ mahout spark-shell
/vol0/mahout/bin/mahout: line 294: /bin/spark-shell: No such file or directory
```
not sure why that would be.. Its possible that i
```
294 $SPARK_HOME/bin/spark-shell -classpath "$CLASSPATH" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo
```
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-05 05:48:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853123#comment-15853123 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

https://github.com/apache/mahout/pull/271#discussion_r99481926

--- Diff: distribution/pom.xml ---
@@ -211,10 +207,6 @@
</dependency>
--- End diff --
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-05 07:29:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853151#comment-15853151 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/271

rebuilt everything from scratch on linux
```
mahout spark-shell
```
fails with:

```
root mahout]$ mahout spark-shell
/vol0/mahout/bin/mahout: line 294: /bin/spark-shell: No such file or directory
root mahout]$
```

Spark 1.6.1.. maybe `-i` is buggy in that version?
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-05 07:56:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853158#comment-15853158 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/271

wait - I'm getting from the shell on the current master so the errors may be in `findMahoutJars()`

@rawkintrevo heve you been working off of the current master? was the shell working for you?

```
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/repl/SparkILoop
at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.repl.SparkILoop
at java.net.URLClassLoader$1.run(URLClassLoader.java:359)
at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
```
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-05 22:54:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853399#comment-15853399 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/271

Rebuilding with this on a remote with:

```
mvn clean package install -Pviennacl-omp -Phadoop2 -Dspark.version=2.1.0 -DskipTests
```
geting:
```
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4.1:single (job) on project mahout-mr: Failed to create assembly: Error c
reating assembly archive job: Problem creating jar: jar:file:/vol0/mahout/mr/target/mahout-mr-0.13.0-SNAPSHOT.jar!/org/apache/mahout/cf/taste/impl/simila
rity/precompute/MultithreadedBatchItemSimilarities$Output.class: JAR entry org/apache/mahout/cf/taste/impl/similarity/precompute/MultithreadedBatchItemSi
milarities$Output.class not found in /vol0/mahout/mr/target/mahout-mr-0.13.0-SNAPSHOT.jar -> [Help 1]
```
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-05 23:05:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853401#comment-15853401 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/271

Can verify that on a clean build with all ASF-mirrored SPARK and MAVEN shell is not working:

```
$ echo $SPARK_HOME
/root/spark-2.1.0-bin-hadoop2.6
$ mvn clean install -Pviennacl-omp -Phadoop2 -Dspark.version=2.1.0 -DskipTests
{...}
INFO] Mahout Build Tools ................................. SUCCESS [ 2.228 s]
[INFO] Apache Mahout ...................................... SUCCESS [ 0.043 s]
[INFO] Mahout Math ........................................ SUCCESS [ 9.467 s]
[INFO] Mahout HDFS ........................................ SUCCESS [ 1.941 s]
[INFO] Mahout Map-Reduce .................................. SUCCESS [ 18.526 s]
[INFO] Mahout Integration ................................. SUCCESS [ 2.992 s]
[INFO] Mahout Examples .................................... SUCCESS [ 18.727 s]
[INFO] Mahout Math Scala bindings ......................... SUCCESS [ 46.060 s]
[INFO] Mahout Spark bindings .............................. SUCCESS [ 52.530 s]
[INFO] Mahout Flink bindings .............................. SUCCESS [ 36.695 s]
[INFO] Mahout Native VienniaCL OpenMP Bindings ............ SUCCESS [ 23.616 s]
[INFO] Mahout Release Package ............................. SUCCESS [ 1.706 s]
[INFO] Mahout H2O backend ................................. SUCCESS [ 20.008 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:55 min
[INFO] Finished at: 2017-02-05T22:58:40+00:00
[INFO] Final Memory: 145M/2588M
[INFO] ------------------------------------------------------------------------
***@ip-123-32-17-97 mahout]$ mahout spark-shell
/vol0/mahout/bin/mahout: line 294: /bin/spark-shell: No such file or directory
```
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-06 02:22:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853443#comment-15853443 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/271

@rawkintrevo please disregard above comments (aside from the line note which breaks the build). I am attributing it to an old ami that I tried to test OpenMP *and* Spark 2.0 with the new shell out on.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-06 18:48:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854531#comment-15854531 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user skanjila commented on the issue:

https://github.com/apache/mahout/pull/271

@rawkintrevo I was wondering how we will evolve the shell if a new spark version comes out, also I am wondering what the use cases are for mahout-shell , seems like most people use mahout as an embedded application or a library, is the shell just to test out a few things? I would be all for removing the shell altogether actually. Less code to maintain in the long run, let me know if I am missing something here.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 00:11:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855010#comment-15855010 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/271

@skanjila the shell is useful enough that I'd like to keep it around if possible. Some reasons off the cuff:
- Good for 'demo-ing' mahout. Fire up the shell do some simple stuff.
- Good for sanity checking bugs in Zeppelin (something very close to my heart)
- As we move to add algorithms, I envision Mahout being used for more interactive data science. In that world there is a lot of iterative "try this, see what happens, try that" kind of approach. I do most of that in Zeppelin, but some people may use JetBrains/ other IDEs and the shell is useful in these cases.

To your point- yes, 86ing the entire shell module certainly poses some very attractive advantages. What we're seeing in this PR is an opportunity to get best of both worlds (no code, but still have a shell). Just need to work out some kinks on getting it working with spark-shell correctly.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 00:15:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855016#comment-15855016 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user skanjila commented on the issue:

https://github.com/apache/mahout/pull/271

@rawkintrevo the notion of interactive data science is very interesting to me as thats what I do at work, however what is the advantage of using mahout for that versus doing it directly in spark shell using spark sql or the ml algorithms in spark, is that where Samsara comes in, just trying to understand the tradeoffs between the spark and the mahout worlds
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 00:20:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855027#comment-15855027 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/271

@skanjila yes. In short- SparkML has a few non-extensible algorithms with limited functionality. Mahout lets your write your own algorithms, but at the moment there are some amazing tools to help you do that (distributed svd) but not many pre baked algorithms (you still need to go back to sparkML for Random Forrests, etc).

With the new algorithms framework, I hope to see Mahout catch up and exceed SparkML's pre-canned algorithm collection in short order, driven primarily by community involvement.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 02:37:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855186#comment-15855186 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/271

@andrewpalumbo checked against spark 1.6.1/2.0.2/2.1.0 on another box- no issues.

Can someone else help test this?
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 02:47:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855199#comment-15855199 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/271

Great! Is there anything left to do here?



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Trevor Grant <***@github.com>
Date: 02/06/2017 6:37 PM (GMT-08:00)
To: apache/mahout <***@noreply.github.com>
Cc: Andrew Palumbo <***@outlook.com>, Mention <***@noreply.github.com>
Subject: Re: [apache/mahout] [MAHOUT-1894] Add Support for Spark 2.x (#271)


@andrewpalumbo<https://github.com/andrewpalumbo> checked against spark 1.6.1/2.0.2/2.1.0 on another box- no issues.

Can someone else help test this?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://github.com/apache/mahout/pull/271#issuecomment-277883801>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AHU2HeRKlxw62_ne3qi5KbvpvWGPYmzbks5rZ9jkgaJpZM4L0xAh>.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 02:58:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855206#comment-15855206 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/271

+1



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Trevor Grant <***@github.com>
Date: 02/06/2017 4:20 PM (GMT-08:00)
To: apache/mahout <***@noreply.github.com>
Cc: Andrew Palumbo <***@outlook.com>, Mention <***@noreply.github.com>
Subject: Re: [apache/mahout] [MAHOUT-1894] Add Support for Spark 2.x (#271)


@skanjila<https://github.com/skanjila> yes. In short- SparkML has a few non-extensible algorithms with limited functionality. Mahout lets your write your own algorithms, but at the moment there are some amazing tools to help you do that (distributed svd) but not many pre baked algorithms (you still need to go back to sparkML for Random Forrests, etc).

With the new algorithms framework, I hope to see Mahout catch up and exceed SparkML's pre-canned algorithm collection in short order, driven primarily by community involvement.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://github.com/apache/mahout/pull/271#issuecomment-277858736>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AHU2Ha2jHySN6ab9zvmFuBSeuRJDaTNPks5rZ7jUgaJpZM4L0xAh>.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 04:27:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855300#comment-15855300 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/271

I'd like someone else to test this.

Also curious if this solves [MAHOUT-1897](https://issues.apache.org/jira/browse/MAHOUT-1897)

This DOES NOT solve [MAHOUT-1892](https://issues.apache.org/jira/browse/MAHOUT-1892) serialization when doing a map block in the shell
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 17:57:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856422#comment-15856422 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user skanjila commented on the issue:

https://github.com/apache/mahout/pull/271

@rawkintrevo I was going to test this on an azure vm, do you guys still need help testing?
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 18:26:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856494#comment-15856494 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user skanjila commented on the issue:

https://github.com/apache/mahout/pull/271

@rawkintrevo here's what I see when testing on an azure vm:
***@dsexperiments:~/code/mahout$ MASTER=local[4] ./bin/mahout spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
17/02/07 18:24:40 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/02/07 18:24:40 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/02/07 18:24:46 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/02/07 18:24:46 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.
Loading /home/saikan/code/mahout/bin/load-shell.scala...
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.sparkbindings._
sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = ***@7804474a

_ _
_ __ ___ __ _| |__ ___ _ _| |_
'_ ` _ \ / _` | '_ \ / _ \| | | | __|
| | | | (_| | | | | (_) | |_| | |_
_| |_| |_|\__,_|_| |_|\___/ \__,_|\__| version 0.13.0



That file does not exist


scala>


Looks good to me, perhaps we should try some heavy matrix ops on it for further testing
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 19:08:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856579#comment-15856579 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/271

@skanjila Thanks for testing! I usually run the ols example- though another type of test is probably advisable to truly detect bugs. Could you also confirm that it works in the following ways:
Build mahout with `mvn clean package -Dspark.version=2.0.2` and then set `export SPARK_HOME=/path/to/spark-2.0.2-bin` and then again for spark 2.1.0?

Thanks again!
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 19:23:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856608#comment-15856608 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user skanjila commented on the issue:

https://github.com/apache/mahout/pull/271

Here's the results with Spark 2.0.2
***@dsexperiments:~/code/mahout$ MASTER=local[4] ./bin/mahout spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/02/07 19:22:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/07 19:22:21 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.4.9.4:4040
Spark context available as 'sc' (master = local[4], app id = local-1486495341246).
Spark session available as 'spark'.
Loading /home/saikan/code/mahout/bin/load-shell.scala...
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.sparkbindings._
sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = ***@43f44d37

_ _
_ __ ___ __ _| |__ ___ _ _| |_
'_ ` _ \ / _` | '_ \ / _ \| | | | __|
| | | | (_| | | | | (_) | |_| | |_
_| |_| |_|\__,_|_| |_|\___/ \__,_|\__| version 0.13.0



That file does not exist

Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.2
/_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.

scala>
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 19:29:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856618#comment-15856618 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user skanjila commented on the issue:

https://github.com/apache/mahout/pull/271

And here's the results for spark 2.1.0
***@dsexperiments:~/code/mahout$ MASTER=local[4] ./bin/mahout spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/02/07 19:27:59 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0
17/02/07 19:27:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/07 19:28:05 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.4.9.4:4040
Spark context available as 'sc' (master = local[4], app id = local-1486495680739).
Spark session available as 'spark'.
Loading /home/saikan/code/mahout/bin/load-shell.scala...
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.sparkbindings._
sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = ***@3ea6753e

_ _
_ __ ___ __ _| |__ ___ _ _| |_
'_ ` _ \ / _` | '_ \ / _ \| | | | __|
| | | | (_| | | | | (_) | |_| | |_
_| |_| |_|\__,_|_| |_|\___/ \__,_|\__| version 0.13.0



That file does not exist

Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.

scala>
I would highly recommend we come up with a beefy set of tests to validate the shell further, thoughts on which set of operations to consider other than OLS?
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-08 15:34:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858128#comment-15858128 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user skanjila commented on the issue:

https://github.com/apache/mahout/pull/271

@rawkintrevo is there any other help I can provide on this, maybe run through some example mscala scripts, let me know
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-17 14:41:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871928#comment-15871928 ]

Trevor Grant commented on MAHOUT-1894:
--------------------------------------

@apalumbo is still reporting issues where ever he tries this.

Want to make general call for testers to see where the 'gotchya' is.

Here are instructions for testing- please help.

Step 1. Clone Mahout-1894

```sh
$ git clone https://github.com/rawkintrevo/mahout
$ cd mahout
$ git checkout mahout-1894
```

Step 2. Download various Sparks
```sh
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
$ tar -xzf *tgz
```
(only if those are the only tgz's in the directory)

Step 3. Iteratively Build Mahout and Test Shell

A) Spark 1.6.3
```sh
$ mvn clean package -DskipTests -Dspark.version=1.6.3
$ export SPARK_HOME=/path/to/spark/1.6.3
$ bin mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.

B) Spark 2.0.2

```sh
$ mvn clean package -DskipTests -Dspark.version=2.0.2
$ export SPARK_HOME=/path/to/spark/2.0.2
$ bin mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.


C) Spark 2.1.0
```sh
$ mvn clean package -DskipTests -Dspark.version=2.1.0
$ export SPARK_HOME=/path/to/spark/2.1.0
$ bin mahout spark-shell
```
In the shell...

```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Saikat Kanjilal (JIRA)
2017-02-17 17:27:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872157#comment-15872157 ]

Saikat Kanjilal commented on MAHOUT-1894:
-----------------------------------------

[~rawkintrevo] I've already done all this without any errors, is there other testing I can help with on this?
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Andrew Weienr (JIRA)
2017-02-18 00:08:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872778#comment-15872778 ]

Andrew Weienr edited comment on MAHOUT-1894 at 2/18/17 12:08 AM:
-----------------------------------------------------------------

I followed the instructions from [~rawkintrevo] here: https://issues.apache.org/jira/browse/MAHOUT-1894?focusedCommentId=15871928&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15871928

Was able to run the scala shell against all 3 versions of Spark without errors.
My system:
OS X El Capitan 10.11.6 (15G1217)
Maven 3.3.9
Java 1.8.0_101

One other note. Where the instructions say "$ bin mahout spark-shell" they should actually say
"$ bin/mahout spark-shell" (just in case there are any newbies helping test)



was (Author: weienran):
I followed the instructions from [~rawkintrevo] here: https://issues.apache.org/jira/browse/MAHOUT-1894?focusedCommentId=15871928&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15871928

Was able to run the scala shell against all 3 versions of Spark without errors.
My system:
OS X El Capitan 10.11.6 (15G1217)
Maven 3.3.9
Java 1.8.0_101

One other note. Where the instructions say "$ bin mahout spark-shell" they should actually say "$ bin/mahout spark-shell" (just in case there are any newbies helping test)
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Andrew Weienr (JIRA)
2017-02-18 00:08:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872778#comment-15872778 ]

Andrew Weienr commented on MAHOUT-1894:
---------------------------------------

I followed the instructions from [~rawkintrevo] here: https://issues.apache.org/jira/browse/MAHOUT-1894?focusedCommentId=15871928&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15871928

Was able to run the scala shell against all 3 versions of Spark without errors.
My system:
OS X El Capitan 10.11.6 (15G1217)
Maven 3.3.9
Java 1.8.0_101

One other note. Where the instructions say "$ bin mahout spark-shell" they should actually say "$ bin/mahout spark-shell" (just in case there are any newbies helping test)
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-18 16:22:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871928#comment-15871928 ]

Trevor Grant edited comment on MAHOUT-1894 at 2/18/17 4:21 PM:
---------------------------------------------------------------

@apalumbo is still reporting issues where ever he tries this.

Want to make general call for testers to see where the 'gotchya' is.

Here are instructions for testing- please help.

Step 1. Clone Mahout-1894

```sh
$ git clone https://github.com/rawkintrevo/mahout
$ cd mahout
$ git checkout mahout-1894
```

Step 2. Download various Sparks
```sh
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
$ tar -xzf *tgz
```
(only if those are the only tgz's in the directory)

Step 3. Iteratively Build Mahout and Test Shell

A) Spark 1.6.3
```sh
$ mvn clean package -DskipTests -Dspark.version=1.6.3
$ export SPARK_HOME=/path/to/spark/1.6.3
$ bin/mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.

B) Spark 2.0.2

```sh
$ mvn clean package -DskipTests -Dspark.version=2.0.2
$ export SPARK_HOME=/path/to/spark/2.0.2
$ bin/mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.


C) Spark 2.1.0
```sh
$ mvn clean package -DskipTests -Dspark.version=2.1.0
$ export SPARK_HOME=/path/to/spark/2.1.0
$ bin/mahout spark-shell
```
In the shell...

```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.


was (Author: rawkintrevo):
@apalumbo is still reporting issues where ever he tries this.

Want to make general call for testers to see where the 'gotchya' is.

Here are instructions for testing- please help.

Step 1. Clone Mahout-1894

```sh
$ git clone https://github.com/rawkintrevo/mahout
$ cd mahout
$ git checkout mahout-1894
```

Step 2. Download various Sparks
```sh
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
$ tar -xzf *tgz
```
(only if those are the only tgz's in the directory)

Step 3. Iteratively Build Mahout and Test Shell

A) Spark 1.6.3
```sh
$ mvn clean package -DskipTests -Dspark.version=1.6.3
$ export SPARK_HOME=/path/to/spark/1.6.3
$ bin mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.

B) Spark 2.0.2

```sh
$ mvn clean package -DskipTests -Dspark.version=2.0.2
$ export SPARK_HOME=/path/to/spark/2.0.2
$ bin mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.


C) Spark 2.1.0
```sh
$ mvn clean package -DskipTests -Dspark.version=2.1.0
$ export SPARK_HOME=/path/to/spark/2.1.0
$ bin mahout spark-shell
```
In the shell...

```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-18 18:44:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871928#comment-15871928 ]

Trevor Grant edited comment on MAHOUT-1894 at 2/18/17 6:44 PM:
---------------------------------------------------------------

@apalumbo is still reporting issues where ever he tries this.

Want to make general call for testers to see where the 'gotchya' is.

Here are instructions for testing- please help.

Step 1. Clone Mahout-1894

```sh
$ git clone https://github.com/rawkintrevo/mahout
$ cd mahout
$ git checkout mahout-1894
```

Step 2. Download various Sparks
```sh
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
$ tar -xzf *tgz
```
(only if those are the only tgz's in the directory)

Step 3. Iteratively Build Mahout and Test Shell

A) Spark 1.6.3
```sh
$ mvn clean package -DskipTests -Dspark.version=1.6.3
$ export SPARK_HOME=/path/to/spark/1.6.3
$ bin/mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
scala> timeSparseDRMMMul(200, 200, 200, 2, .2, 1234L)
```
^^ Should run with out error...
Ctrl+C to close.

B) Spark 2.0.2

```sh
$ mvn clean package -DskipTests -Dspark.version=2.0.2
$ export SPARK_HOME=/path/to/spark/2.0.2
$ bin/mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
scala> timeSparseDRMMMul(200, 200, 200, 2, .2, 1234L)
```
^^ Should run with out error...
Ctrl+C to close.


C) Spark 2.1.0
```sh
$ mvn clean package -DskipTests -Dspark.version=2.1.0
$ export SPARK_HOME=/path/to/spark/2.1.0
$ bin/mahout spark-shell
```
In the shell...

```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
scala> timeSparseDRMMMul(200, 200, 200, 2, .2, 1234L)
```
^^ Should run with out error...
Ctrl+C to close.


was (Author: rawkintrevo):
@apalumbo is still reporting issues where ever he tries this.

Want to make general call for testers to see where the 'gotchya' is.

Here are instructions for testing- please help.

Step 1. Clone Mahout-1894

```sh
$ git clone https://github.com/rawkintrevo/mahout
$ cd mahout
$ git checkout mahout-1894
```

Step 2. Download various Sparks
```sh
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
$ tar -xzf *tgz
```
(only if those are the only tgz's in the directory)

Step 3. Iteratively Build Mahout and Test Shell

A) Spark 1.6.3
```sh
$ mvn clean package -DskipTests -Dspark.version=1.6.3
$ export SPARK_HOME=/path/to/spark/1.6.3
$ bin/mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.

B) Spark 2.0.2

```sh
$ mvn clean package -DskipTests -Dspark.version=2.0.2
$ export SPARK_HOME=/path/to/spark/2.0.2
$ bin/mahout spark-shell
```
In the shell...
```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.


C) Spark 2.1.0
```sh
$ mvn clean package -DskipTests -Dspark.version=2.1.0
$ export SPARK_HOME=/path/to/spark/2.1.0
$ bin/mahout spark-shell
```
In the shell...

```scala
scala> :load examples/bin/SparseSparseDrmTimer.mscala
```
^^ Should run with out error...
Ctrl+C to close.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Saikat Kanjilal (JIRA)
2017-02-22 19:20:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879014#comment-15879014 ]

Saikat Kanjilal commented on MAHOUT-1894:
-----------------------------------------

[~rawkintrevo] I think we should author a set of mscala scripts that will: 1) certify each build versus various spark backends 2) spit out a report that outlines typical SLA's from various operations before and after upgrading to a new version of a tech stack 3) adds any zeppelin visualizations to the report as necessary. I was wondering if we should come up with a candidate list of expensive operations that we want to test based on the , any ideas what these would be in the samsara world, maybe we can have a perf test suite that iterates over a set of algorithms in samsara for this.

Thoughts?
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Saikat Kanjilal (JIRA)
2017-02-22 19:36:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879014#comment-15879014 ]

Saikat Kanjilal edited comment on MAHOUT-1894 at 2/22/17 7:35 PM:
------------------------------------------------------------------

[~rawkintrevo] I think we should author a set of mscala scripts that will: 1) certify each build versus various spark backends 2) spit out a report that outlines typical SLA's from various operations before and after upgrading to a new version of a tech stack 3) adds any zeppelin visualizations to the report as necessary. I was wondering if we should come up with a candidate list of expensive operations that we want to test based on the , any ideas what these would be in the samsara world, maybe we can have a perf test suite that iterates over a set of algorithms in samsara for this. The SparseSparseDrmTimer can be the starting point for this.

Thoughts?


was (Author: kanjilal):
[~rawkintrevo] I think we should author a set of mscala scripts that will: 1) certify each build versus various spark backends 2) spit out a report that outlines typical SLA's from various operations before and after upgrading to a new version of a tech stack 3) adds any zeppelin visualizations to the report as necessary. I was wondering if we should come up with a candidate list of expensive operations that we want to test based on the , any ideas what these would be in the samsara world, maybe we can have a perf test suite that iterates over a set of algorithms in samsara for this.

Thoughts?
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Andrew Musselman (JIRA)
2017-02-24 00:59:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881664#comment-15881664 ]

Andrew Musselman commented on MAHOUT-1894:
------------------------------------------

I'm getting "That file does not exist" after the shell opens with spark 1.6.3.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-24 01:17:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881697#comment-15881697 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user andrewmusselman commented on the issue:

https://github.com/apache/mahout/pull/271

Yeah, I'm getting a result for all three versions of spark, but the welcome banner situation could use some work; I'd like to remove the "This file does not exist" message, and with 1.6.3 the spark banner shows up before the mahout banner, while with 2.x the mahout banner shows up first. Perhaps suppressing the mahout banner makes sense.
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-24 13:35:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882675#comment-15882675 ]

ASF GitHub Bot commented on MAHOUT-1894:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/271
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-24 14:01:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant resolved MAHOUT-1894.
----------------------------------
Resolution: Fixed
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Hudson (JIRA)
2017-02-24 14:45:44 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882808#comment-15882808 ]

Hudson commented on MAHOUT-1894:
--------------------------------

FAILURE: Integrated in Jenkins build Mahout-Quality #3419 (See [https://builds.apache.org/job/Mahout-Quality/3419/])
MAHOUT-1894 Add Support for Spark 2.x closes apache/mahout#271 (rawkintrevo: rev 5afdc68e0a25e9f66a0d707a7f76d46d9603b614)
* (delete) spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala
* (add) bin/load-shell.scala
* (edit) pom.xml
* (delete) spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/MahoutSparkILoop.scala
* (edit) bin/mahout
* (edit) distribution/pom.xml
* (delete) spark-shell/pom.xml
* (delete) spark-shell/src/test/mahout/simple.mscala
* (edit) distribution/src/main/assembly/bin.xml
Post by Suneel Marthi (JIRA)
Add support for Spark 2x backend
--------------------------------
Key: MAHOUT-1894
URL: https://issues.apache.org/jira/browse/MAHOUT-1894
Project: Mahout
Issue Type: Task
Components: spark
Affects Versions: 0.13.0
Reporter: Suneel Marthi
Assignee: Trevor Grant
Priority: Critical
Fix For: 1.0.0, 0.13.0, 0.14.0
add support for Spark 2.x as backend execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Loading...