Discussion:
[jira] [Created] (MAHOUT-1786) Make classes implements Serializable for Spark 1.5+
Michel Lemay (JIRA)
2015-11-06 15:08:27 UTC
Permalink
Michel Lemay created MAHOUT-1786:
------------------------------------

Summary: Make classes implements Serializable for Spark 1.5+
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor


Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.

I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-11-06 15:37:27 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993824#comment-14993824 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

GitHub user michellemay opened a pull request:

https://github.com/apache/mahout/pull/174

MAHOUT-1786: Make classes implements Serializable for Spark 1.5+

Add some "implements Serializable" for Apache Spark 1.5+
There might be other classes that would benefit from the same modification.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/michellemay/mahout fix-mahout-1786

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #174

----
commit a0aad6a716d001c5bbe1ee08e8865dbb83b6f1e4
Author: michellemay <***@gmail.com>
Date: 2015-11-06T15:34:04Z

implements Serializable

----
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-11-06 17:17:11 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994009#comment-14994009 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/174#issuecomment-154476424

-1. Unfortunately, i think just adding "serializable" to every matrix class is not going to cut it.

If custom java serialization were implemented a bit more efficiently (as it is currently done in Writeable and Kryo serializations), i would vote -0.1 (per apache voting guidelines).

The reason is, java serialization is still not the best way to pack the tensor data.

Another reason is there is no motivation in Mahout for java serialization support. All of our supported backends support both Kryo and Writable protocols for the purpose of the backends.

There are other minor reasons not to use java serialization as well (such as class compatibility checks etc.)

There admittedly may be an external reason but i feel like kryo serialzation should be an answer good enough for external reasons as well.
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-11-06 17:27:11 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994036#comment-14994036 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/174#issuecomment-154478671

although i am not sure about efficiency point said in the original issue -- bytecode generated something in spark 1.5... i am dubious it will work well without custom serialization algorithms. needs a benchmark imo. there is still may be a significant difference between between serializing data structure vs. serializing data structure iterators and rebuilding a data structure.
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-11-06 17:37:11 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994051#comment-14994051 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user michellemay commented on the pull request:

https://github.com/apache/mahout/pull/174#issuecomment-154481034

Forcing the use of Kryo might not be a valid statu-quo for spark 1.5+ though..

For reference, here is project Tungsten:
https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html

"The above chart compares the performance of shuffling 8 million complex rows in one thread using the Kryo serializer and a code generated custom serializer. The code generated serializer exploits the fact that all rows in a single shuffle have the same schema and generates specialized code for that. This made the generated version over 2X faster to shuffle than the Kryo version."
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-11-06 17:38:11 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994053#comment-14994053 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user michellemay commented on the pull request:

https://github.com/apache/mahout/pull/174#issuecomment-154481151

I leave it to you to find a way to use that new codegen optimizations in spark.
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-11-06 21:13:10 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994434#comment-14994434 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/174#issuecomment-154542945

Michel:

Just to be clear: can you or can you not confirm that this change

(1) is known to work correctly (as in all unit tests pass in a
java-serialized session); and

(2) that it is actually indeed faster than the same things in a kryo
session?

I think those are the main things to confirm in this issue.
Post by ASF GitHub Bot (JIRA)
I leave it to you to find a way to use that new codegen optimizations in
spark.

Reply to this email directly or view it on GitHub
<https://github.com/apache/mahout/pull/174#issuecomment-154481151>.
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-11-09 12:40:11 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996455#comment-14996455 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user michellemay commented on the pull request:

https://github.com/apache/mahout/pull/174#issuecomment-155052769

I have a hard time testing vanilla 'master' on Windows.. (my dev environment)
I get tons of NPE (92 in total) at __randomizedtesting.SeedInfo.seed, at java.lang.ProcessBuilder.start
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-11-10 15:13:10 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998721#comment-14998721 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user michellemay commented on the pull request:

https://github.com/apache/mahout/pull/174#issuecomment-155447141

I'm not 100% sure.. I still have other tests to do but it looks like default serializer of spark 1.5.1 is 19% SLOWER than kryo when performing large matrice AtA.

That is really sad since I see huge gains elsewhere in our spark tasks.

Is there any possibility to have best of both world ?
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2015-11-15 00:01:10 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-1786:
-------------------------------------
Sprint: Jan/Feb-2016
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2016-03-30 17:25:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-1786:
-------------------------------------
Sprint: Mar/Apr-2016 (was: Jan/Feb-2016)
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2016-03-30 17:26:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-1786:
-------------------------------------
Sprint: Jan/Feb-2016 (was: Mar/Apr-2016)
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-09-10 02:40:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478973#comment-15478973 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user andrewmusselman commented on the issue:

https://github.com/apache/mahout/pull/174
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Pat Ferrel (JIRA)
2016-12-19 16:36:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761631#comment-15761631 ]

Pat Ferrel commented on MAHOUT-1786:
------------------------------------

It sounds like we could remove Kryo altogether and improve performance by using the new Spark serializer. It also sounds like this uses the more standard extending serializable, which is built into many Scala classes IIRC.

Removing Kryo with a performance gains seems a big win. Kryo causes many config problems for new users.
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-12-19 06:07:59 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760270#comment-15760270 ]

Andrew Palumbo commented on MAHOUT-1786:
----------------------------------------

This (MAHOUT-1786) may be a part of the issue of the broken broadcast issues.
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Minor
Labels: performance
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-12-20 18:48:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1786:
-----------------------------------
Sprint: Jan/Feb-2017
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Priority: Blocker
Labels: performance
Fix For: 0.13.0
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-01-11 02:29:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1786:
-----------------------------------
Assignee: Pat Ferrel
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Pat Ferrel
Priority: Blocker
Labels: performance
Fix For: 0.13.0
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Pat Ferrel (JIRA)
2017-01-14 18:26:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pat Ferrel updated MAHOUT-1786:
-------------------------------
Assignee: Andrew Palumbo (was: Pat Ferrel)

Hmm, removing Kryo altogether is probably a good idea, I have never touched this code and do not maintain classes that need this. All my classes either use data that is in the above types or base scala types that have serializable.

I'm sending this back to [~Andrew_Palumbo] for reassignment or further discussion.

If the new serializer if better than Kryo by all means let's move there ASAP.
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Andrew Palumbo
Priority: Blocker
Labels: performance
Fix For: 0.13.0
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-01-14 20:21:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1786:
-----------------------------------
Priority: Minor (was: Blocker)
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Andrew Palumbo
Priority: Minor
Labels: performance
Fix For: 0.13.1
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-01-14 20:21:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1786:
-----------------------------------
Fix Version/s: (was: 0.13.0)
0.13.1
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Andrew Palumbo
Priority: Minor
Labels: performance
Fix For: 0.13.1
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-01-14 20:21:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1786:
-----------------------------------
Assignee: Pat Ferrel (was: Andrew Palumbo)
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Pat Ferrel
Priority: Minor
Labels: performance
Fix For: 0.13.1
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-01-16 02:29:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1786:
-----------------------------------
Sprint: (was: Jan/Feb-2017)
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Pat Ferrel
Priority: Minor
Labels: performance
Fix For: 0.13.1
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2017-02-02 13:57:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849929#comment-15849929 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user rawkintrevo commented on the issue:

https://github.com/apache/mahout/pull/174

@michellemay let me know if there is anything I can do to help push this forward.

Thanks for your hard work!
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Pat Ferrel
Priority: Minor
Labels: performance
Fix For: 0.13.1
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-09 01:25:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858824#comment-15858824 ]

ASF GitHub Bot commented on MAHOUT-1786:
----------------------------------------

Github user dlyubimov commented on the issue:

https://github.com/apache/mahout/pull/174

@rawkintrevo i don't believe at this point this issue contains valid assertion w.r.t. being java serializable triggering codegen serializer. Nor is there any experimental evidence of such. And there's spark 2.0.0 guidance to the contrary as i mentioned.
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Pat Ferrel
Priority: Minor
Labels: performance
Fix For: 0.13.1
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-06-23 04:22:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060396#comment-16060396 ]

Trevor Grant commented on MAHOUT-1786:
--------------------------------------

[~dlyubimov] close this issue as "won't fix" ?
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Pat Ferrel
Priority: Minor
Labels: performance
Fix For: 0.13.2
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Trevor Grant (JIRA)
2017-06-23 04:22:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant updated MAHOUT-1786:
---------------------------------
Fix Version/s: (was: 0.13.1)
0.13.2
Post by Michel Lemay (JIRA)
Make classes implements Serializable for Spark 1.5+
---------------------------------------------------
Key: MAHOUT-1786
URL: https://issues.apache.org/jira/browse/MAHOUT-1786
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.11.0
Reporter: Michel Lemay
Assignee: Pat Ferrel
Priority: Minor
Labels: performance
Fix For: 0.13.2
Spark 1.5 comes with a new very efficient serializer that uses code generation. It is twice as fast as kryo. When using mahout, we have to set KryoSerializer because some classes aren't serializable otherwise.
I suggest to declare Math classes as "implements Serializable" where needed. For instance, to use coocurence package in spark 1.5, we had to modify AbstractMatrix, AbstractVector, DenseVector and SparseRowMatrix to make it work without Kryo.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Loading...