Discussion:
Mahout distro Size
Dmitriy Lyubimov
2016-09-07 00:24:29 UTC
Permalink
I dunno. they build shaded assembly artifact it seems and are happy with
this approach. It would seem we'd just need the legacy deps in a similar
case.
bq.
4: other projects do something too. spark (at least it used to) to produce
tons of lib-managed deps as the result of its build, they probably still
have?
Do you mean using something like Spark's dependency resolver?
________________________________
Sent: Tuesday, September 6, 2016 4:46:24 PM
Subject: Re: Mahout distro Size
2 + 1
3 + 1
4: other projects do something too. spark (at least it used to) to produce
tons of lib-managed deps as the result of its build, they probably still
have?
On the other hand, the samsara only dependencies are really light. backends
are really always "provided", and the rest of it is fairly small enough not
to be an issue either way. but we probably definitely should drop local
support for MR stuff (MR local mode didn't work correctly anyway, last time
I checked)
The current apache-mahout-distribution-0.12.2.tar.gz<http://mirror.
stjschools.org/public/apache/mahout/0.12.2/apache-mahout-
distribution-0.12.2.tar.gz> is 224M. we need to look for ways to get this
size down.
2. Drop h2o (binary only) from Distro? (18M - unused)
Remove Hadoop 1 support. could save us some space.
Remove dependency jars from /lib in mahout binary distribution. Should
also
save space.
5. Having dropped support for MAHOUT_LOCAL we can now likely set a lot
of dependencies to <provided> scope, we can revisit: MAHOUT-1705<
https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
in job jar for mahout-examples.
* 16M ./lib/hadoop
* 85M ./lib/
* Many of the jars in /lib/ and possibly /lib/hadoop are
already
packaged into the mahout-examples jar and adding them to the classpath
from
/lib/ is therefore redundant. As well many may be provided.
Dmitriy Lyubimov
2016-09-06 20:47:22 UTC
Permalink
PS i probably should not say "probably definitely" next to each other.
Definitely just definitely :)
2 + 1
3 + 1
4: other projects do something too. spark (at least it used to) to produce
tons of lib-managed deps as the result of its build, they probably still
have?
On the other hand, the samsara only dependencies are really light.
backends are really always "provided", and the rest of it is fairly small
enough not to be an issue either way. but we probably definitely should
drop local support for MR stuff (MR local mode didn't work correctly
anyway, last time I checked)
The current apache-mahout-distribution-0.12.2.tar.gz<http://mirror.stjsc
hools.org/public/apache/mahout/0.12.2/apache-mahout-distribu
tion-0.12.2.tar.gz> is 224M. we need to look for ways to get this size
down.
2. Drop h2o (binary only) from Distro? (18M - unused)
Remove Hadoop 1 support. could save us some space.
Remove dependency jars from /lib in mahout binary distribution. Should also
save space.
5. Having dropped support for MAHOUT_LOCAL we can now likely set a lot
of dependencies to <provided> scope, we can revisit: MAHOUT-1705<
https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
in job jar for mahout-examples.
* 16M ./lib/hadoop
* 85M ./lib/
* Many of the jars in /lib/ and possibly /lib/hadoop are
already packaged into the mahout-examples jar and adding them to the
classpath from /lib/ is therefore redundant. As well many may be provided.
Suneel Marthi
2016-09-06 20:54:09 UTC
Permalink
Post by Dmitriy Lyubimov
PS i probably should not say "probably definitely" next to each other.
Definitely just definitely :)
That's fine.

"Openly Closed" is now officially part of Apache Lexicon, so why not add
"Definitely Probable".
Post by Dmitriy Lyubimov
2 + 1
3 + 1
4: other projects do something too. spark (at least it used to) to
produce
tons of lib-managed deps as the result of its build, they probably still
have?
On the other hand, the samsara only dependencies are really light.
backends are really always "provided", and the rest of it is fairly small
enough not to be an issue either way. but we probably definitely should
drop local support for MR stuff (MR local mode didn't work correctly
anyway, last time I checked)
The current apache-mahout-distribution-0.12.2.tar.gz<http://mirror.
stjsc
hools.org/public/apache/mahout/0.12.2/apache-mahout-distribu
tion-0.12.2.tar.gz> is 224M. we need to look for ways to get this size
down.
2. Drop h2o (binary only) from Distro? (18M - unused)
Remove Hadoop 1 support. could save us some space.
Remove dependency jars from /lib in mahout binary distribution. Should
also
save space.
5. Having dropped support for MAHOUT_LOCAL we can now likely set a
lot
of dependencies to <provided> scope, we can revisit: MAHOUT-1705<
https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
in job jar for mahout-examples.
* 16M ./lib/hadoop
* 85M ./lib/
* Many of the jars in /lib/ and possibly /lib/hadoop are
already packaged into the mahout-examples jar and adding them to the
classpath from /lib/ is therefore redundant. As well many may be
provided.
Suneel Marthi
2016-09-06 20:55:10 UTC
Permalink
+1 to all of them. 2 and 3 are very trivial to do. Definitely consider
doing #5.
The current apache-mahout-distribution-0.12.2.tar.gz<http://mirror.stjsc
hools.org/public/apache/mahout/0.12.2/apache-mahout-distribu
tion-0.12.2.tar.gz> is 224M. we need to look for ways to get this size
down.
2. Drop h2o (binary only) from Distro? (18M - unused)
Remove Hadoop 1 support. could save us some space.
Remove dependency jars from /lib in mahout binary distribution. Should also
save space.
5. Having dropped support for MAHOUT_LOCAL we can now likely set a lot
of dependencies to <provided> scope, we can revisit: MAHOUT-1705<
https://issues.apache.org/jira/browse/MAHOUT-1705>: Verify dependencies
in job jar for mahout-examples.
* 16M ./lib/hadoop
* 85M ./lib/
* Many of the jars in /lib/ and possibly /lib/hadoop are already
packaged into the mahout-examples jar and adding them to the classpath from
/lib/ is therefore redundant. As well many may be provided.
Loading...