Discussion:
Zeppelin Integration PR
Trevor Grant
2016-05-31 21:22:41 UTC
Permalink
Hey folks,

looks like we're making progress on the Mahout-Zeppelin integration.

Any who are interested check out:
https://github.com/apache/incubator-zeppelin/pull/928

Regarding Moon's last comments:
Does anyone know off hand if anything will break if we roll back the
conflicting packages to the Spark 1.6 version?

Also regarding the pom.xml and:
"Packaging
If mahout requires to be loaded in spark executor's classpath, then adding
mahout dependency in pom.xml will not be enough to work with Spark cluster.
Could you clarify if mahout need to be loaded in spark executor?"

All we need to do is load the jars appropriate Mahout jars, I'm not
familiar enough with the Spark Interpreter or Spark or Java to know exactly
what would happen, any thoughts on this?

Tonight I might just try removing mahout dependencies from pom.xml and
seeing what happens? that would solve all of these problems I think. As
long as user has 'mvn install'ed Mahout, should be gtg?

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*
Trevor Grant
2016-06-01 00:31:40 UTC
Permalink
For what it is worth, simply removing the dependencies from pom.xml breaks
the Mahout interpreter.

Upon a little further testing in cluster mode, so long as the dependencies
are included in pom.xml, the appropriate Mahout jars are shipped off to the
cluster and everything works swimmingly (in Zeppelin there is a local Spark
Interpretter internal to Zeppelin and then the 'real' cluster that
everything gets shipped off to. Sometimes you can make things work in local
mode that won't work in cluster mode)

The moral of this story is that the patch DOES in fact work in local and
cluster mode, so we just need to work out the dependencies and the
licensing (and a couple of fail safes to make sure the user is running
Spark version > 1.5.2) and we should be good to go.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Trevor Grant
Hey folks,
looks like we're making progress on the Mahout-Zeppelin integration.
https://github.com/apache/incubator-zeppelin/pull/928
Does anyone know off hand if anything will break if we roll back the
conflicting packages to the Spark 1.6 version?
"Packaging
If mahout requires to be loaded in spark executor's classpath, then adding
mahout dependency in pom.xml will not be enough to work with Spark cluster.
Could you clarify if mahout need to be loaded in spark executor?"
All we need to do is load the jars appropriate Mahout jars, I'm not
familiar enough with the Spark Interpreter or Spark or Java to know exactly
what would happen, any thoughts on this?
Tonight I might just try removing mahout dependencies from pom.xml and
seeing what happens? that would solve all of these problems I think. As
long as user has 'mvn install'ed Mahout, should be gtg?
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Trevor Grant
2016-06-01 00:41:15 UTC
Permalink
As a follow up to this- it would be nice to remove the dependencies from
the pom.xml...

All we REALLY need to do is make sure we can get to the required jars and
load them. By including them in the pom we are ensuring they are
available, but there is surely some other way to get ahold of them. Since
we have assumed that Mahout is installed on the system and MAHOUT_HOME=...
we can probably leverage that...



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Trevor Grant
For what it is worth, simply removing the dependencies from pom.xml breaks
the Mahout interpreter.
Upon a little further testing in cluster mode, so long as the dependencies
are included in pom.xml, the appropriate Mahout jars are shipped off to the
cluster and everything works swimmingly (in Zeppelin there is a local Spark
Interpretter internal to Zeppelin and then the 'real' cluster that
everything gets shipped off to. Sometimes you can make things work in local
mode that won't work in cluster mode)
The moral of this story is that the patch DOES in fact work in local and
cluster mode, so we just need to work out the dependencies and the
licensing (and a couple of fail safes to make sure the user is running
Spark version > 1.5.2) and we should be good to go.
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Trevor Grant
Hey folks,
looks like we're making progress on the Mahout-Zeppelin integration.
https://github.com/apache/incubator-zeppelin/pull/928
Does anyone know off hand if anything will break if we roll back the
conflicting packages to the Spark 1.6 version?
"Packaging
If mahout requires to be loaded in spark executor's classpath, then
adding mahout dependency in pom.xml will not be enough to work with Spark
cluster. Could you clarify if mahout need to be loaded in spark executor?"
All we need to do is load the jars appropriate Mahout jars, I'm not
familiar enough with the Spark Interpreter or Spark or Java to know exactly
what would happen, any thoughts on this?
Tonight I might just try removing mahout dependencies from pom.xml and
seeing what happens? that would solve all of these problems I think. As
long as user has 'mvn install'ed Mahout, should be gtg?
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Trevor Grant
2016-06-01 12:07:39 UTC
Permalink
Can we use kryo v 2.21 (instead of 2.24)

and fasterxml 2.4.4 instead of 2.7.2 (this is what worries me)

I am also working on fishing the mahout jars directly out of
MAHOUT_HOME=..../ ... instead of including in the pom, so let's not get to
carried away with pruning dependencies.

At this point its more of a, "can anyone one think of a specific reason
this shouldn't work".


Mahout

com.esotericsoftware.kryo:kryo:jar:2.24:compile
com.fasterxml.jackson.core:jackson-core:jar:2.7.2:compile

Spark (1.6)

com.esotericsoftware.kryo:kryo:jar:2.21:compile
com.fasterxml.jackson.core:jackson-core:jar:2.4.4:compile


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*
which dependencies need to be removed?
I saw the Kryo version on the PR was conflicting also, that may be a flink
thing. I think the version at one point was being enforced in the flink
module at least.
________________________________________
Sent: Tuesday, May 31, 2016 8:41:15 PM
Subject: Re: Zeppelin Integration PR
As a follow up to this- it would be nice to remove the dependencies from
the pom.xml...
All we REALLY need to do is make sure we can get to the required jars and
load them. By including them in the pom we are ensuring they are
available, but there is surely some other way to get ahold of them. Since
we have assumed that Mahout is installed on the system and MAHOUT_HOME=...
we can probably leverage that...
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Trevor Grant
For what it is worth, simply removing the dependencies from pom.xml
breaks
Post by Trevor Grant
the Mahout interpreter.
Upon a little further testing in cluster mode, so long as the
dependencies
Post by Trevor Grant
are included in pom.xml, the appropriate Mahout jars are shipped off to
the
Post by Trevor Grant
cluster and everything works swimmingly (in Zeppelin there is a local
Spark
Post by Trevor Grant
Interpretter internal to Zeppelin and then the 'real' cluster that
everything gets shipped off to. Sometimes you can make things work in
local
Post by Trevor Grant
mode that won't work in cluster mode)
The moral of this story is that the patch DOES in fact work in local and
cluster mode, so we just need to work out the dependencies and the
licensing (and a couple of fail safes to make sure the user is running
Spark version > 1.5.2) and we should be good to go.
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Trevor Grant
Hey folks,
looks like we're making progress on the Mahout-Zeppelin integration.
https://github.com/apache/incubator-zeppelin/pull/928
Does anyone know off hand if anything will break if we roll back the
conflicting packages to the Spark 1.6 version?
"Packaging
If mahout requires to be loaded in spark executor's classpath, then
adding mahout dependency in pom.xml will not be enough to work with
Spark
Post by Trevor Grant
Post by Trevor Grant
cluster. Could you clarify if mahout need to be loaded in spark
executor?"
Post by Trevor Grant
Post by Trevor Grant
All we need to do is load the jars appropriate Mahout jars, I'm not
familiar enough with the Spark Interpreter or Spark or Java to know
exactly
Post by Trevor Grant
Post by Trevor Grant
what would happen, any thoughts on this?
Tonight I might just try removing mahout dependencies from pom.xml and
seeing what happens? that would solve all of these problems I think. As
long as user has 'mvn install'ed Mahout, should be gtg?
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Suneel Marthi
2016-06-01 12:15:04 UTC
Permalink
I think we shuld be fine with jackson 2.4.4 (its not fasterxml :))
But what's the reason for the massive downgrade? Elasticsearch 2.x and
above need atleast Jackson 2.6.2 to function.

Downgrading Kryo, I have no opinion and would let others weigh in.
Post by Trevor Grant
Can we use kryo v 2.21 (instead of 2.24)
and fasterxml 2.4.4 instead of 2.7.2 (this is what worries me)
I am also working on fishing the mahout jars directly out of
MAHOUT_HOME=..../ ... instead of including in the pom, so let's not get to
carried away with pruning dependencies.
At this point its more of a, "can anyone one think of a specific reason
this shouldn't work".
Mahout
com.esotericsoftware.kryo:kryo:jar:2.24:compile
com.fasterxml.jackson.core:jackson-core:jar:2.7.2:compile
Spark (1.6)
com.esotericsoftware.kryo:kryo:jar:2.21:compile
com.fasterxml.jackson.core:jackson-core:jar:2.4.4:compile
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
which dependencies need to be removed?
I saw the Kryo version on the PR was conflicting also, that may be a
flink
thing. I think the version at one point was being enforced in the flink
module at least.
________________________________________
Sent: Tuesday, May 31, 2016 8:41:15 PM
Subject: Re: Zeppelin Integration PR
As a follow up to this- it would be nice to remove the dependencies from
the pom.xml...
All we REALLY need to do is make sure we can get to the required jars and
load them. By including them in the pom we are ensuring they are
available, but there is surely some other way to get ahold of them.
Since
we have assumed that Mahout is installed on the system and
MAHOUT_HOME=...
we can probably leverage that...
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Trevor Grant
For what it is worth, simply removing the dependencies from pom.xml
breaks
Post by Trevor Grant
the Mahout interpreter.
Upon a little further testing in cluster mode, so long as the
dependencies
Post by Trevor Grant
are included in pom.xml, the appropriate Mahout jars are shipped off to
the
Post by Trevor Grant
cluster and everything works swimmingly (in Zeppelin there is a local
Spark
Post by Trevor Grant
Interpretter internal to Zeppelin and then the 'real' cluster that
everything gets shipped off to. Sometimes you can make things work in
local
Post by Trevor Grant
mode that won't work in cluster mode)
The moral of this story is that the patch DOES in fact work in local
and
Post by Trevor Grant
cluster mode, so we just need to work out the dependencies and the
licensing (and a couple of fail safes to make sure the user is running
Spark version > 1.5.2) and we should be good to go.
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
On Tue, May 31, 2016 at 4:22 PM, Trevor Grant <
Post by Trevor Grant
Hey folks,
looks like we're making progress on the Mahout-Zeppelin integration.
https://github.com/apache/incubator-zeppelin/pull/928
Does anyone know off hand if anything will break if we roll back the
conflicting packages to the Spark 1.6 version?
"Packaging
If mahout requires to be loaded in spark executor's classpath, then
adding mahout dependency in pom.xml will not be enough to work with
Spark
Post by Trevor Grant
Post by Trevor Grant
cluster. Could you clarify if mahout need to be loaded in spark
executor?"
Post by Trevor Grant
Post by Trevor Grant
All we need to do is load the jars appropriate Mahout jars, I'm not
familiar enough with the Spark Interpreter or Spark or Java to know
exactly
Post by Trevor Grant
Post by Trevor Grant
what would happen, any thoughts on this?
Tonight I might just try removing mahout dependencies from pom.xml and
seeing what happens? that would solve all of these problems I think.
As
Post by Trevor Grant
Post by Trevor Grant
long as user has 'mvn install'ed Mahout, should be gtg?
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Trevor Grant
2016-06-01 12:43:34 UTC
Permalink
Well, its a conflict with Spark... and keep in mind Zeppelin is supporting
Spark all the way back to 1.1 so maybe something in there?

I'm going to make a serious effort to try the local jar loading from
MAHOUT_HOME as outlined last night, bc if that works- then all of these
problems magically go away, and are much less likely to haunt us in the
future.

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Suneel Marthi
I think we shuld be fine with jackson 2.4.4 (its not fasterxml :))
But what's the reason for the massive downgrade? Elasticsearch 2.x and
above need atleast Jackson 2.6.2 to function.
Downgrading Kryo, I have no opinion and would let others weigh in.
Post by Trevor Grant
Can we use kryo v 2.21 (instead of 2.24)
and fasterxml 2.4.4 instead of 2.7.2 (this is what worries me)
I am also working on fishing the mahout jars directly out of
MAHOUT_HOME=..../ ... instead of including in the pom, so let's not get
to
Post by Trevor Grant
carried away with pruning dependencies.
At this point its more of a, "can anyone one think of a specific reason
this shouldn't work".
Mahout
com.esotericsoftware.kryo:kryo:jar:2.24:compile
com.fasterxml.jackson.core:jackson-core:jar:2.7.2:compile
Spark (1.6)
com.esotericsoftware.kryo:kryo:jar:2.21:compile
com.fasterxml.jackson.core:jackson-core:jar:2.4.4:compile
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
which dependencies need to be removed?
I saw the Kryo version on the PR was conflicting also, that may be a
flink
thing. I think the version at one point was being enforced in the
flink
Post by Trevor Grant
module at least.
________________________________________
Sent: Tuesday, May 31, 2016 8:41:15 PM
Subject: Re: Zeppelin Integration PR
As a follow up to this- it would be nice to remove the dependencies
from
Post by Trevor Grant
the pom.xml...
All we REALLY need to do is make sure we can get to the required jars
and
Post by Trevor Grant
load them. By including them in the pom we are ensuring they are
available, but there is surely some other way to get ahold of them.
Since
we have assumed that Mahout is installed on the system and
MAHOUT_HOME=...
we can probably leverage that...
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
On Tue, May 31, 2016 at 7:31 PM, Trevor Grant <
Post by Trevor Grant
For what it is worth, simply removing the dependencies from pom.xml
breaks
Post by Trevor Grant
the Mahout interpreter.
Upon a little further testing in cluster mode, so long as the
dependencies
Post by Trevor Grant
are included in pom.xml, the appropriate Mahout jars are shipped off
to
Post by Trevor Grant
the
Post by Trevor Grant
cluster and everything works swimmingly (in Zeppelin there is a local
Spark
Post by Trevor Grant
Interpretter internal to Zeppelin and then the 'real' cluster that
everything gets shipped off to. Sometimes you can make things work in
local
Post by Trevor Grant
mode that won't work in cluster mode)
The moral of this story is that the patch DOES in fact work in local
and
Post by Trevor Grant
cluster mode, so we just need to work out the dependencies and the
licensing (and a couple of fail safes to make sure the user is
running
Post by Trevor Grant
Post by Trevor Grant
Spark version > 1.5.2) and we should be good to go.
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Post by Trevor Grant
Post by Trevor Grant
On Tue, May 31, 2016 at 4:22 PM, Trevor Grant <
Post by Trevor Grant
Hey folks,
looks like we're making progress on the Mahout-Zeppelin integration.
https://github.com/apache/incubator-zeppelin/pull/928
Does anyone know off hand if anything will break if we roll back the
conflicting packages to the Spark 1.6 version?
"Packaging
If mahout requires to be loaded in spark executor's classpath, then
adding mahout dependency in pom.xml will not be enough to work with
Spark
Post by Trevor Grant
Post by Trevor Grant
cluster. Could you clarify if mahout need to be loaded in spark
executor?"
Post by Trevor Grant
Post by Trevor Grant
All we need to do is load the jars appropriate Mahout jars, I'm not
familiar enough with the Spark Interpreter or Spark or Java to know
exactly
Post by Trevor Grant
Post by Trevor Grant
what would happen, any thoughts on this?
Tonight I might just try removing mahout dependencies from pom.xml
and
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
seeing what happens? that would solve all of these problems I think.
As
Post by Trevor Grant
Post by Trevor Grant
long as user has 'mvn install'ed Mahout, should be gtg?
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Suneel Marthi
2016-06-01 12:45:30 UTC
Permalink
I guessed so after my last email. Well its ok to downgrade Jackson then.
Hopefully the next Spark version that Zeppelin supports uses the latest
jackson version.
Post by Trevor Grant
Well, its a conflict with Spark... and keep in mind Zeppelin is supporting
Spark all the way back to 1.1 so maybe something in there?
I'm going to make a serious effort to try the local jar loading from
MAHOUT_HOME as outlined last night, bc if that works- then all of these
problems magically go away, and are much less likely to haunt us in the
future.
tg
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Suneel Marthi
I think we shuld be fine with jackson 2.4.4 (its not fasterxml :))
But what's the reason for the massive downgrade? Elasticsearch 2.x and
above need atleast Jackson 2.6.2 to function.
Downgrading Kryo, I have no opinion and would let others weigh in.
Post by Trevor Grant
Can we use kryo v 2.21 (instead of 2.24)
and fasterxml 2.4.4 instead of 2.7.2 (this is what worries me)
I am also working on fishing the mahout jars directly out of
MAHOUT_HOME=..../ ... instead of including in the pom, so let's not get
to
Post by Trevor Grant
carried away with pruning dependencies.
At this point its more of a, "can anyone one think of a specific reason
this shouldn't work".
Mahout
com.esotericsoftware.kryo:kryo:jar:2.24:compile
com.fasterxml.jackson.core:jackson-core:jar:2.7.2:compile
Spark (1.6)
com.esotericsoftware.kryo:kryo:jar:2.21:compile
com.fasterxml.jackson.core:jackson-core:jar:2.4.4:compile
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
which dependencies need to be removed?
I saw the Kryo version on the PR was conflicting also, that may be a
flink
thing. I think the version at one point was being enforced in the
flink
Post by Trevor Grant
module at least.
________________________________________
Sent: Tuesday, May 31, 2016 8:41:15 PM
Subject: Re: Zeppelin Integration PR
As a follow up to this- it would be nice to remove the dependencies
from
Post by Trevor Grant
the pom.xml...
All we REALLY need to do is make sure we can get to the required jars
and
Post by Trevor Grant
load them. By including them in the pom we are ensuring they are
available, but there is surely some other way to get ahold of them.
Since
we have assumed that Mahout is installed on the system and
MAHOUT_HOME=...
we can probably leverage that...
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Post by Suneel Marthi
Post by Trevor Grant
On Tue, May 31, 2016 at 7:31 PM, Trevor Grant <
Post by Trevor Grant
For what it is worth, simply removing the dependencies from pom.xml
breaks
Post by Trevor Grant
the Mahout interpreter.
Upon a little further testing in cluster mode, so long as the
dependencies
Post by Trevor Grant
are included in pom.xml, the appropriate Mahout jars are shipped
off
Post by Suneel Marthi
to
Post by Trevor Grant
the
Post by Trevor Grant
cluster and everything works swimmingly (in Zeppelin there is a
local
Post by Suneel Marthi
Post by Trevor Grant
Spark
Post by Trevor Grant
Interpretter internal to Zeppelin and then the 'real' cluster that
everything gets shipped off to. Sometimes you can make things work
in
Post by Suneel Marthi
Post by Trevor Grant
local
Post by Trevor Grant
mode that won't work in cluster mode)
The moral of this story is that the patch DOES in fact work in
local
Post by Suneel Marthi
Post by Trevor Grant
and
Post by Trevor Grant
cluster mode, so we just need to work out the dependencies and the
licensing (and a couple of fail safes to make sure the user is
running
Post by Trevor Grant
Post by Trevor Grant
Spark version > 1.5.2) and we should be good to go.
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Post by Trevor Grant
Post by Trevor Grant
On Tue, May 31, 2016 at 4:22 PM, Trevor Grant <
Post by Trevor Grant
Hey folks,
looks like we're making progress on the Mahout-Zeppelin
integration.
Post by Suneel Marthi
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
https://github.com/apache/incubator-zeppelin/pull/928
Does anyone know off hand if anything will break if we roll back
the
Post by Suneel Marthi
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
conflicting packages to the Spark 1.6 version?
"Packaging
If mahout requires to be loaded in spark executor's classpath,
then
Post by Suneel Marthi
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
adding mahout dependency in pom.xml will not be enough to work
with
Post by Suneel Marthi
Post by Trevor Grant
Spark
Post by Trevor Grant
Post by Trevor Grant
cluster. Could you clarify if mahout need to be loaded in spark
executor?"
Post by Trevor Grant
Post by Trevor Grant
All we need to do is load the jars appropriate Mahout jars, I'm
not
Post by Suneel Marthi
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
familiar enough with the Spark Interpreter or Spark or Java to
know
Post by Suneel Marthi
Post by Trevor Grant
exactly
Post by Trevor Grant
Post by Trevor Grant
what would happen, any thoughts on this?
Tonight I might just try removing mahout dependencies from pom.xml
and
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
seeing what happens? that would solve all of these problems I
think.
Post by Suneel Marthi
Post by Trevor Grant
As
Post by Trevor Grant
Post by Trevor Grant
long as user has 'mvn install'ed Mahout, should be gtg?
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Suneel Marthi
2016-06-01 12:47:25 UTC
Permalink
As regards downgrading Kryo, it may break stuff on flink end of things. I
would let Andy speak to that, can't recall the top of my head the issues we
ran into with trying to balance Spark and Flink dependencies.
Post by Suneel Marthi
I guessed so after my last email. Well its ok to downgrade Jackson then.
Hopefully the next Spark version that Zeppelin supports uses the latest
jackson version.
Post by Trevor Grant
Well, its a conflict with Spark... and keep in mind Zeppelin is supporting
Spark all the way back to 1.1 so maybe something in there?
I'm going to make a serious effort to try the local jar loading from
MAHOUT_HOME as outlined last night, bc if that works- then all of these
problems magically go away, and are much less likely to haunt us in the
future.
tg
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Suneel Marthi
I think we shuld be fine with jackson 2.4.4 (its not fasterxml :))
But what's the reason for the massive downgrade? Elasticsearch 2.x and
above need atleast Jackson 2.6.2 to function.
Downgrading Kryo, I have no opinion and would let others weigh in.
Post by Trevor Grant
Can we use kryo v 2.21 (instead of 2.24)
and fasterxml 2.4.4 instead of 2.7.2 (this is what worries me)
I am also working on fishing the mahout jars directly out of
MAHOUT_HOME=..../ ... instead of including in the pom, so let's not
get
Post by Suneel Marthi
to
Post by Trevor Grant
carried away with pruning dependencies.
At this point its more of a, "can anyone one think of a specific
reason
Post by Suneel Marthi
Post by Trevor Grant
this shouldn't work".
Mahout
com.esotericsoftware.kryo:kryo:jar:2.24:compile
com.fasterxml.jackson.core:jackson-core:jar:2.7.2:compile
Spark (1.6)
com.esotericsoftware.kryo:kryo:jar:2.21:compile
com.fasterxml.jackson.core:jackson-core:jar:2.4.4:compile
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Post by Suneel Marthi
Post by Trevor Grant
which dependencies need to be removed?
I saw the Kryo version on the PR was conflicting also, that may be a
flink
thing. I think the version at one point was being enforced in the
flink
Post by Trevor Grant
module at least.
________________________________________
Sent: Tuesday, May 31, 2016 8:41:15 PM
Subject: Re: Zeppelin Integration PR
As a follow up to this- it would be nice to remove the dependencies
from
Post by Trevor Grant
the pom.xml...
All we REALLY need to do is make sure we can get to the required
jars
Post by Suneel Marthi
and
Post by Trevor Grant
load them. By including them in the pom we are ensuring they are
available, but there is surely some other way to get ahold of them.
Since
we have assumed that Mahout is installed on the system and
MAHOUT_HOME=...
we can probably leverage that...
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Post by Suneel Marthi
Post by Trevor Grant
On Tue, May 31, 2016 at 7:31 PM, Trevor Grant <
Post by Trevor Grant
For what it is worth, simply removing the dependencies from
pom.xml
Post by Suneel Marthi
Post by Trevor Grant
breaks
Post by Trevor Grant
the Mahout interpreter.
Upon a little further testing in cluster mode, so long as the
dependencies
Post by Trevor Grant
are included in pom.xml, the appropriate Mahout jars are shipped
off
Post by Suneel Marthi
to
Post by Trevor Grant
the
Post by Trevor Grant
cluster and everything works swimmingly (in Zeppelin there is a
local
Post by Suneel Marthi
Post by Trevor Grant
Spark
Post by Trevor Grant
Interpretter internal to Zeppelin and then the 'real' cluster that
everything gets shipped off to. Sometimes you can make things
work in
Post by Suneel Marthi
Post by Trevor Grant
local
Post by Trevor Grant
mode that won't work in cluster mode)
The moral of this story is that the patch DOES in fact work in
local
Post by Suneel Marthi
Post by Trevor Grant
and
Post by Trevor Grant
cluster mode, so we just need to work out the dependencies and the
licensing (and a couple of fail safes to make sure the user is
running
Post by Trevor Grant
Post by Trevor Grant
Spark version > 1.5.2) and we should be good to go.
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Post by Trevor Grant
Post by Trevor Grant
On Tue, May 31, 2016 at 4:22 PM, Trevor Grant <
Post by Trevor Grant
Hey folks,
looks like we're making progress on the Mahout-Zeppelin
integration.
Post by Suneel Marthi
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
https://github.com/apache/incubator-zeppelin/pull/928
Does anyone know off hand if anything will break if we roll back
the
Post by Suneel Marthi
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
conflicting packages to the Spark 1.6 version?
"Packaging
If mahout requires to be loaded in spark executor's classpath,
then
Post by Suneel Marthi
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
adding mahout dependency in pom.xml will not be enough to work
with
Post by Suneel Marthi
Post by Trevor Grant
Spark
Post by Trevor Grant
Post by Trevor Grant
cluster. Could you clarify if mahout need to be loaded in spark
executor?"
Post by Trevor Grant
Post by Trevor Grant
All we need to do is load the jars appropriate Mahout jars, I'm
not
Post by Suneel Marthi
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
familiar enough with the Spark Interpreter or Spark or Java to
know
Post by Suneel Marthi
Post by Trevor Grant
exactly
Post by Trevor Grant
Post by Trevor Grant
what would happen, any thoughts on this?
Tonight I might just try removing mahout dependencies from
pom.xml
Post by Suneel Marthi
and
Post by Trevor Grant
Post by Trevor Grant
Post by Trevor Grant
seeing what happens? that would solve all of these problems I
think.
Post by Suneel Marthi
Post by Trevor Grant
As
Post by Trevor Grant
Post by Trevor Grant
long as user has 'mvn install'ed Mahout, should be gtg?
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things."
-Virgil*
Loading...