Discussion:
[jira] [Created] (MAHOUT-1950) Unread Block Data in Spark Shell Pseudo Cluster
Trevor Grant (JIRA)
2017-03-05 01:11:32 UTC
Permalink
Trevor Grant created MAHOUT-1950:
------------------------------------

Summary: Unread Block Data in Spark Shell Pseudo Cluster
Key: MAHOUT-1950
URL: https://issues.apache.org/jira/browse/MAHOUT-1950
Project: Mahout
Issue Type: Bug
Components: Mahout spark shell
Affects Versions: 0.13.0
Environment: Spark 1.6.3 Cluster / Pseudo Cluster / YARN Cluster (all observed)
Reporter: Trevor Grant
Assignee: Trevor Grant
Priority: Blocker


When doing an operation in the Spark Shell on a Pseudo Cluster, a `java.lang.IllegalStateException: unread block data` error is thrown.

Research and stack trace implies there is some issue with serialization. Other issues with spark in cluster mode, hint that the Kryo Jars aren't being shipped around.

Toying has shown that:
`$SPARK_HOME/bin/spark-shell --jars "/opt/mahout/math-scala/target/mahout-math-scala_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/math/target/mahout-math-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT-dependency-reduced.jar" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator --conf spark.kryoserializer.buffer=32k --conf spark.kryoserializer.buffer.max=600m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer`

works, and should be used in place of:
https://github.com/apache/mahout/blob/master/bin/mahout#L294



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-03-05 03:11:32 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896035#comment-15896035 ]

ASF GitHub Bot commented on MAHOUT-1950:
----------------------------------------

GitHub user rawkintrevo opened a pull request:

https://github.com/apache/mahout/pull/291

MAHOUT-1950 Fix block unread error in shell



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rawkintrevo/mahout mahout-1950

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/291.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #291

----
commit a30d241a67d3b90673924c4332ce432f335a7d05
Author: rawkintrevo <***@gmail.com>
Date: 2017-03-05T03:08:33Z

MAHOUT-1950 Fix block unread error in shell

----
Post by Trevor Grant (JIRA)
Unread Block Data in Spark Shell Pseudo Cluster
-----------------------------------------------
Key: MAHOUT-1950
URL: https://issues.apache.org/jira/browse/MAHOUT-1950
Project: Mahout
Issue Type: Bug
Components: Mahout spark shell
Affects Versions: 0.13.0
Environment: Spark 1.6.3 Cluster / Pseudo Cluster / YARN Cluster (all observed)
Reporter: Trevor Grant
Assignee: Trevor Grant
Priority: Blocker
When doing an operation in the Spark Shell on a Pseudo Cluster, a `java.lang.IllegalStateException: unread block data` error is thrown.
Research and stack trace implies there is some issue with serialization. Other issues with spark in cluster mode, hint that the Kryo Jars aren't being shipped around.
`$SPARK_HOME/bin/spark-shell --jars "/opt/mahout/math-scala/target/mahout-math-scala_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/math/target/mahout-math-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT-dependency-reduced.jar" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator --conf spark.kryoserializer.buffer=32k --conf spark.kryoserializer.buffer.max=600m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer`
https://github.com/apache/mahout/blob/master/bin/mahout#L294
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-03-06 05:25:32 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896758#comment-15896758 ]

ASF GitHub Bot commented on MAHOUT-1950:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/291
Post by Trevor Grant (JIRA)
Unread Block Data in Spark Shell Pseudo Cluster
-----------------------------------------------
Key: MAHOUT-1950
URL: https://issues.apache.org/jira/browse/MAHOUT-1950
Project: Mahout
Issue Type: Bug
Components: Mahout spark shell
Affects Versions: 0.13.0
Environment: Spark 1.6.3 Cluster / Pseudo Cluster / YARN Cluster (all observed)
Reporter: Trevor Grant
Assignee: Trevor Grant
Priority: Blocker
When doing an operation in the Spark Shell on a Pseudo Cluster, a `java.lang.IllegalStateException: unread block data` error is thrown.
Research and stack trace implies there is some issue with serialization. Other issues with spark in cluster mode, hint that the Kryo Jars aren't being shipped around.
`$SPARK_HOME/bin/spark-shell --jars "/opt/mahout/math-scala/target/mahout-math-scala_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/math/target/mahout-math-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT-dependency-reduced.jar" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator --conf spark.kryoserializer.buffer=32k --conf spark.kryoserializer.buffer.max=600m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer`
https://github.com/apache/mahout/blob/master/bin/mahout#L294
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-03-06 05:45:32 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant resolved MAHOUT-1950.
----------------------------------
Resolution: Fixed
Post by Trevor Grant (JIRA)
Unread Block Data in Spark Shell Pseudo Cluster
-----------------------------------------------
Key: MAHOUT-1950
URL: https://issues.apache.org/jira/browse/MAHOUT-1950
Project: Mahout
Issue Type: Bug
Components: Mahout spark shell
Affects Versions: 0.13.0
Environment: Spark 1.6.3 Cluster / Pseudo Cluster / YARN Cluster (all observed)
Reporter: Trevor Grant
Assignee: Trevor Grant
Priority: Blocker
When doing an operation in the Spark Shell on a Pseudo Cluster, a `java.lang.IllegalStateException: unread block data` error is thrown.
Research and stack trace implies there is some issue with serialization. Other issues with spark in cluster mode, hint that the Kryo Jars aren't being shipped around.
`$SPARK_HOME/bin/spark-shell --jars "/opt/mahout/math-scala/target/mahout-math-scala_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/math/target/mahout-math-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT-dependency-reduced.jar" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator --conf spark.kryoserializer.buffer=32k --conf spark.kryoserializer.buffer.max=600m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer`
https://github.com/apache/mahout/blob/master/bin/mahout#L294
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Hudson (JIRA)
2017-03-06 06:07:32 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896777#comment-15896777 ]

Hudson commented on MAHOUT-1950:
--------------------------------

SUCCESS: Integrated in Jenkins build Mahout-Quality #3442 (See [https://builds.apache.org/job/Mahout-Quality/3442/])
MAHOUT-1950 Fix block unread error in shell closes apache/mahout#291 (rawkintrevo: rev ca24f0c44931aaf6ea57ef97384e12e39ccc561d)
* (edit) spark/pom.xml
* (edit) bin/mahout
* (edit) flink/pom.xml
* (edit) .gitignore
* (edit) math-scala/pom.xml
* (edit) math/pom.xml
* (edit) h2o/pom.xml
* (edit) mr/pom.xml
* (edit) hdfs/pom.xml
Post by Trevor Grant (JIRA)
Unread Block Data in Spark Shell Pseudo Cluster
-----------------------------------------------
Key: MAHOUT-1950
URL: https://issues.apache.org/jira/browse/MAHOUT-1950
Project: Mahout
Issue Type: Bug
Components: Mahout spark shell
Affects Versions: 0.13.0
Environment: Spark 1.6.3 Cluster / Pseudo Cluster / YARN Cluster (all observed)
Reporter: Trevor Grant
Assignee: Trevor Grant
Priority: Blocker
When doing an operation in the Spark Shell on a Pseudo Cluster, a `java.lang.IllegalStateException: unread block data` error is thrown.
Research and stack trace implies there is some issue with serialization. Other issues with spark in cluster mode, hint that the Kryo Jars aren't being shipped around.
`$SPARK_HOME/bin/spark-shell --jars "/opt/mahout/math-scala/target/mahout-math-scala_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/math/target/mahout-math-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT-dependency-reduced.jar" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator --conf spark.kryoserializer.buffer=32k --conf spark.kryoserializer.buffer.max=600m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer`
https://github.com/apache/mahout/blob/master/bin/mahout#L294
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Hudson (JIRA)
2017-03-09 20:36:38 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903807#comment-15903807 ]

Hudson commented on MAHOUT-1950:
--------------------------------

SUCCESS: Integrated in Jenkins build Mahout-Quality #3446 (See [https://builds.apache.org/job/Mahout-Quality/3446/])
MAHOUT-1950 fixes CLI dirvers missing classes, need to make sure this (pat: rev f8f8f127231781bca0b981c26e2387bfad3958c2)
* (edit) spark/src/main/scala/org/apache/mahout/sparkbindings/package.scala
Post by Trevor Grant (JIRA)
Unread Block Data in Spark Shell Pseudo Cluster
-----------------------------------------------
Key: MAHOUT-1950
URL: https://issues.apache.org/jira/browse/MAHOUT-1950
Project: Mahout
Issue Type: Bug
Components: Mahout spark shell
Affects Versions: 0.13.0
Environment: Spark 1.6.3 Cluster / Pseudo Cluster / YARN Cluster (all observed)
Reporter: Trevor Grant
Assignee: Trevor Grant
Priority: Blocker
When doing an operation in the Spark Shell on a Pseudo Cluster, a `java.lang.IllegalStateException: unread block data` error is thrown.
Research and stack trace implies there is some issue with serialization. Other issues with spark in cluster mode, hint that the Kryo Jars aren't being shipped around.
`$SPARK_HOME/bin/spark-shell --jars "/opt/mahout/math-scala/target/mahout-math-scala_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/math/target/mahout-math-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT-dependency-reduced.jar" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator --conf spark.kryoserializer.buffer=32k --conf spark.kryoserializer.buffer.max=600m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer`
https://github.com/apache/mahout/blob/master/bin/mahout#L294
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Loading...