[DISCUSS] More meaningful error when running on Spark 2.0

Discussion:

Trevor Grant

2016-11-14 14:49:43 UTC

Hi,

currently when running on Spark 2.0 the user will hit some sort of error,
one such error is:

java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
at scala.collection.IterableLike$class.head(IterableLike.scala:107)
at scala.collection.mutable.ArrayOps$ofRef.scala$collection$Ind
exedSeqOptimized$$super$head(ArrayOps.scala:186)
at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOp
timized.scala:126)
at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
y(package.scala:155)
at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
y(package.scala:133)
at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
sableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
sableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
qOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.mahout.math.scalabindings.package$.dense(package.scala:133)
at org.apache.mahout.sparkbindings.SparkEngine$.drmSampleKRows(
SparkEngine.scala:289)
at org.apache.mahout.math.drm.package$.drmSampleKRows(package.scala:149)
at org.apache.mahout.math.drm.package$.drmSampleToTSV(package.scala:165)
... 58 elided

With the recent Zeppelin-Mahout integration, there are going to be a lot of
users unknowingly attempting to run on Mahout on Spark 2.0. I think it
would be simple to implement yet save a lot of time on the Zeppelin and
Mahout mailing lists to do something like:

if sc.version > 1.6.2 then:
error("Spark versions ${sc.verion} isn't supported. Please see
MAHOUT-... (appropriate jira info)")

I'd like to put something together and, depending on how many issues people
have on Zeppelin list, be prepared to do a hotfix on 0.12.2 if it becomes
prudent. Everyone always complaining that Zeppelin doesn't work because of
some mystical error, is bad pr. It DOES say in the notebook and elsewhere
that we're not 2.0 compliant, however one of the advantages/drawbacks of
Zeppelin is that without having to really know what you're doing you can
get a functional local cluster of Flink, Spark, etc. all going.

So we easily could have a space where someone read none of the docs, and is
whining. Surely few if any would ever do such a thing, but still I think a
prudent fix to have in the back pocket.

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*

Dmitriy Lyubimov

2016-11-15 17:23:10 UTC

Permalink

+1 on version checking.
And, there's a little bug as well. this error is technically generated by
something like

dense(Set.empty[Vector]),

i.e., it cannot form a matrix out of an empty collection of vectors. While
this is true, i suppose it needs a `require(...)` insert there to generate
a more meaningful response instead of allowing Scala complaining about
empty collection.

-d

+1
Sent from my Verizon Wireless 4G LTE smartphone
-------- Original message --------
Date: 11/14/2016 6:49 AM (GMT-08:00)
Subject: [DISCUSS] More meaningful error when running on Spark 2.0
Hi,
currently when running on Spark 2.0 the user will hit some sort of error,
java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
at scala.collection.IterableLike$class.head(IterableLike.scala:107)
at scala.collection.mutable.ArrayOps$ofRef.scala$collection$Ind
exedSeqOptimized$$super$head(ArrayOps.scala:186)
at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOp
timized.scala:126)
at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
y(package.scala:155)
at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
y(package.scala:133)
at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
sableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
sableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
qOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.mahout.math.scalabindings.package$.dense(package.scala:133)
at org.apache.mahout.sparkbindings.SparkEngine$.drmSampleKRows(
SparkEngine.scala:289)
at org.apache.mahout.math.drm.package$.drmSampleKRows(package.scala:149)
at org.apache.mahout.math.drm.package$.drmSampleToTSV(package.scala:165)
... 58 elided
With the recent Zeppelin-Mahout integration, there are going to be a lot of
users unknowingly attempting to run on Mahout on Spark 2.0. I think it
would be simple to implement yet save a lot of time on the Zeppelin and
error("Spark versions ${sc.verion} isn't supported. Please see
MAHOUT-... (appropriate jira info)")
I'd like to put something together and, depending on how many issues people
have on Zeppelin list, be prepared to do a hotfix on 0.12.2 if it becomes
prudent. Everyone always complaining that Zeppelin doesn't work because of
some mystical error, is bad pr. It DOES say in the notebook and elsewhere
that we're not 2.0 compliant, however one of the advantages/drawbacks of
Zeppelin is that without having to really know what you're doing you can
get a functional local cluster of Flink, Spark, etc. all going.
So we easily could have a space where someone read none of the docs, and is
whining. Surely few if any would ever do such a thing, but still I think a
prudent fix to have in the back pocket.
tg
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*

Andrew Musselman

2016-11-15 19:30:06 UTC

Permalink

Post by Dmitriy Lyubimov
+1 on version checking.
And, there's a little bug as well. this error is technically generated by
something like
dense(Set.empty[Vector]),
i.e., it cannot form a matrix out of an empty collection of vectors. While
this is true, i suppose it needs a `require(...)` insert there to generate
a more meaningful response instead of allowing Scala complaining about
empty collection.
-d

IndexedSeqLike.scala:63)

at scala.collection.IterableLike$class.head(IterableLike.scala:107)
at scala.collection.mutable.ArrayOps$ofRef.scala$collection$Ind
exedSeqOptimized$$super$head(ArrayOps.scala:186)
at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOp
timized.scala:126)
at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
y(package.scala:155)
at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
y(package.scala:133)
at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
sableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
sableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
qOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.mahout.math.scalabindings.package$.dense(

package.scala:133)

at org.apache.mahout.sparkbindings.SparkEngine$.drmSampleKRows(
SparkEngine.scala:289)
at org.apache.mahout.math.drm.package$.drmSampleKRows(package.scala:149)
at org.apache.mahout.math.drm.package$.drmSampleToTSV(package.scala:165)
... 58 elided
With the recent Zeppelin-Mahout integration, there are going to be a lot

users unknowingly attempting to run on Mahout on Spark 2.0. I think it
would be simple to implement yet save a lot of time on the Zeppelin and
error("Spark versions ${sc.verion} isn't supported. Please see
MAHOUT-... (appropriate jira info)")
I'd like to put something together and, depending on how many issues

people

have on Zeppelin list, be prepared to do a hotfix on 0.12.2 if it becomes
prudent. Everyone always complaining that Zeppelin doesn't work because

some mystical error, is bad pr. It DOES say in the notebook and

elsewhere

that we're not 2.0 compliant, however one of the advantages/drawbacks of
Zeppelin is that without having to really know what you're doing you can
get a functional local cluster of Flink, Spark, etc. all going.
So we easily could have a space where someone read none of the docs, and

whining. Surely few if any would ever do such a thing, but still I

think a

prudent fix to have in the back pocket.
tg
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*