So I've been messing with this all night-
Maven really doesn't seem to like this idea of tacking a string on to a
version number. I can make it work- but its sloppy and really fattens up
the poms. (We end up with things like 0.13.2-SNAPSHOT-spark_1.6-SNAPSHOT,
or a lot of plugins that I think would eventually work, but I was unable to
wrangle them)
The alternative, maven's preferred method as far as I can tell, is adding a
classifier.
This gives a maven coordinate of
org.apache.mahout:mahout-spark_scala-2.10:spark-1.6:0.13.2
org.apache.mahout:mahout-spark_scala-2.11:spark-2.1:0.13.2
The jars come out looking like:
mahout-spark_2.10-0.13.2-spark_1.6.jar
mahout-spark_2.11-0.13.2-spark_2.1.jar
If one were importing into their pom, it would be like this:
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-spark_2.10</artifactId>
<classifier>spark_1.6</classifier>
<version>0.13.2</version>
</dependency>
I have a provisional PR out implementing this:
https://github.com/apache/mahout/pull/330
Feel free to respond here or on the PR, but does anyone have any specific
objection to this method? It _seems_ the "maven" way to do things, though I
am not certain that is correct as I have never come across classifiers
before.
From [1], "As a motivation for this element, consider for example a project
that offers an artifact targeting JRE 1.5 but at the same time also an
artifact that still supports JRE 1.4. The first artifact could be equipped
with the classifier jdk15 and the second one with jdk14 such that clients
can choose which one to use." Which seems to be our use case (sort of). I
entirely concede that I may be wrong here.
[1] https://maven.apache.org/pom.html
+1 if so (sbt naming re: pats comment).
Also +1 on Zeppelin integration being non-trivial.
Sent from my Verizon Wireless 4G LTE smartphone
-------- Original message --------
Date: 07/07/2017 10:35 PM (GMT-08:00)
Subject: Re: [DISCUSS] Naming convention for multiple spark/scala combos
IIRC these all fit sbtâs conventons?
So to tie all of this together-
org.apache.mahout:mahout-spark_2.10:0.13.1_spark_1_6
org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2_0
org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2_1
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_1_6
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2_0
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2_1
(will jars compiled with 2.1 dependencies run on 2.0? I assume not, but I
don't know) (afaik, mahout compiled for spark 1.6.x tends to work with
spark 1.6.y, anecdotal)
A non-trivial motivation here, is we would like all of these available to
tighten up the Apache Zeppelin integration, where the user could have a
number of different spark/scala combos going on and we want it to 'just
work' out of the box (which means a wide array of binaries available, to
dmitriy's point).
I'm +1 on this, and as RM will begin cutting a provisional RC, just to try
to figure out how all of this will work (it's my first time as release
master, and this is a new thing we're doing).
72 hour lazy consensus. (will probably take me 72 hours to figure out
anyway ;) )
If no objections expect an RC on Monday evening.
tg
Post by Holden KarauTrevor looped me in on this since I hadn't had a chance to subscribe to
the list yet (on now :)).
Artifacts from cross spark-version building isn't super standardized (and
their are two sort of very different types of cross-building).
For folks who just need to build for the 1.X and 2.X and branches
appending _spark1 & _spark2 to the version string is indeed pretty common
and the DL4J folks do something pretty similar as Trevor pointed out.
The folks over at hammerlab have made some sbt specific tooling to make
this easier to do on the publishing side (see https://github.com/hammer
lab/sbt-parent )
It is true some people build Scala 2.10 artifacts for Spark 1.X series
and
Post by Holden Karau2.11 artifacts for Spark 2.X series only and use that to differentiate (I
don't personally like this approach since it is super opaque and someone
could upgrade their Scala version and then accidentally be using a
different version of Spark which would likely not go very well).
For folks who need to hook into internals and cross build against
different minor versions there is much less of a consistent pattern,
[artifactname]_[scalaversion]:[sparkversion]_[artifact releaseversion]
But this really only makes sense when you have to cross-build for lots of
different Spark versions (which should be avoidable for Mahout).
Since you are likely not depending on the internals of different point
releases, I'd think the _spark1 / _spark2 is probably the right way (or
_spark_1 / _spark_2 is fine too).
---------- Forwarded message ----------
Date: Fri, Jul 7, 2017 at 12:28 PM
Subject: Re: [DISCUSS] Naming convention for multiple spark/scala combos
mahout-spark-2.11_2.10-0.13.1.jar
mahout-spark-2.11_2.11-0.13.1.jar
mahout-math-scala-2.11_2.10-0.13.1.jar
i.e. <module>-<spark version>-<scala version>-<mahout-version>.jar
not exactly pretty.. I somewhat prefer Trevor's idea of Dl4j convention.
________________________________
Sent: Friday, July 7, 2017 11:57:53 AM
Subject: [DISCUSS] Naming convention for multiple spark/scala combos
Hey all,
Working on releasing 0.13.1 with multiple spark/scala combos.
Afaik, there is no 'standard' for multiple spark versions (but I may be
wrong, I don't claim expertise here).
Spark-1.6 + Scala 2.10
Spark-2.1 + Scala 2.11
OR
We could do like dl4j
org.apache.mahout:mahout-spark_2.10:0.13.1_spark_1
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_1
org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2
OR
some other option I don't know of.
--
Cell : 425-233-8271 <(425)%20233-8271>