Discussion:
[jira] [Created] (MAHOUT-1572) blockify() to detect (naively) the data sparsity in the loaded data
Dmitriy Lyubimov (JIRA)
2014-06-04 21:59:01 UTC
Permalink
Dmitriy Lyubimov created MAHOUT-1572:
----------------------------------------

Summary: blockify() to detect (naively) the data sparsity in the loaded data
Key: MAHOUT-1572
URL: https://issues.apache.org/jira/browse/MAHOUT-1572
Project: Mahout
Issue Type: Bug
Reporter: Dmitriy Lyubimov
Fix For: 1.0


per [~ssc]:
.bq a dense matrix is converted into a SparseRowMatrix with dense row vectors by blockify(), after serialization this becomes a dense matrix in sparse format (triggering OOMs)!

i guess we can look at first row vector and go on to either DenseMatrix or SparseRowMatrix




--
This message was sent by Atlassian JIRA
(v6.2#6252)
ASF GitHub Bot (JIRA)
2014-06-04 22:13:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018266#comment-14018266 ]

ASF GitHub Bot commented on MAHOUT-1572:
----------------------------------------

GitHub user dlyubimov opened a pull request:

https://github.com/apache/mahout/pull/10

MAHOUT-1572 blockify() to detect (naively) the data sparsity in the loaded data

naive initial fix (?)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dlyubimov/mahout MAHOUT-1572

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10

----
commit 162c5ca36e00af91a9599075332c577d9b1a13c4
Author: Dmitriy Lyubimov <***@apache.org>
Date: 2014-06-04T22:10:11Z

initial fix (?)

----
Post by Dmitriy Lyubimov (JIRA)
blockify() to detect (naively) the data sparsity in the loaded data
--------------------------------------------------------------------
Key: MAHOUT-1572
URL: https://issues.apache.org/jira/browse/MAHOUT-1572
Project: Mahout
Issue Type: Bug
Reporter: Dmitriy Lyubimov
Fix For: 1.0
.bq a dense matrix is converted into a SparseRowMatrix with dense row vectors by blockify(), after serialization this becomes a dense matrix in sparse format (triggering OOMs)!
i guess we can look at first row vector and go on to either DenseMatrix or SparseRowMatrix
--
This message was sent by Atlassian JIRA
(v6.2#6252)
Dmitriy Lyubimov (JIRA)
2014-06-04 22:15:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-1572:
-------------------------------------

Status: Patch Available (was: Open)
Post by Dmitriy Lyubimov (JIRA)
blockify() to detect (naively) the data sparsity in the loaded data
--------------------------------------------------------------------
Key: MAHOUT-1572
URL: https://issues.apache.org/jira/browse/MAHOUT-1572
Project: Mahout
Issue Type: Bug
Reporter: Dmitriy Lyubimov
Fix For: 1.0
.bq a dense matrix is converted into a SparseRowMatrix with dense row vectors by blockify(), after serialization this becomes a dense matrix in sparse format (triggering OOMs)!
i guess we can look at first row vector and go on to either DenseMatrix or SparseRowMatrix
--
This message was sent by Atlassian JIRA
(v6.2#6252)
ASF GitHub Bot (JIRA)
2014-06-09 22:43:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025870#comment-14025870 ]

ASF GitHub Bot commented on MAHOUT-1572:
----------------------------------------

Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/10#issuecomment-45553612

i will commit it soon then
Post by Dmitriy Lyubimov (JIRA)
blockify() to detect (naively) the data sparsity in the loaded data
--------------------------------------------------------------------
Key: MAHOUT-1572
URL: https://issues.apache.org/jira/browse/MAHOUT-1572
Project: Mahout
Issue Type: Bug
Reporter: Dmitriy Lyubimov
Fix For: 1.0
.bq a dense matrix is converted into a SparseRowMatrix with dense row vectors by blockify(), after serialization this becomes a dense matrix in sparse format (triggering OOMs)!
i guess we can look at first row vector and go on to either DenseMatrix or SparseRowMatrix
--
This message was sent by Atlassian JIRA
(v6.2#6252)
ASF GitHub Bot (JIRA)
2014-06-10 18:35:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026809#comment-14026809 ]

ASF GitHub Bot commented on MAHOUT-1572:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/10
Post by Dmitriy Lyubimov (JIRA)
blockify() to detect (naively) the data sparsity in the loaded data
--------------------------------------------------------------------
Key: MAHOUT-1572
URL: https://issues.apache.org/jira/browse/MAHOUT-1572
Project: Mahout
Issue Type: Bug
Reporter: Dmitriy Lyubimov
Fix For: 1.0
.bq a dense matrix is converted into a SparseRowMatrix with dense row vectors by blockify(), after serialization this becomes a dense matrix in sparse format (triggering OOMs)!
i guess we can look at first row vector and go on to either DenseMatrix or SparseRowMatrix
--
This message was sent by Atlassian JIRA
(v6.2#6252)
Dmitriy Lyubimov (JIRA)
2014-06-10 18:37:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-1572:
-------------------------------------

Resolution: Fixed
Status: Resolved (was: Patch Available)
Post by Dmitriy Lyubimov (JIRA)
blockify() to detect (naively) the data sparsity in the loaded data
--------------------------------------------------------------------
Key: MAHOUT-1572
URL: https://issues.apache.org/jira/browse/MAHOUT-1572
Project: Mahout
Issue Type: Bug
Reporter: Dmitriy Lyubimov
Fix For: 1.0
.bq a dense matrix is converted into a SparseRowMatrix with dense row vectors by blockify(), after serialization this becomes a dense matrix in sparse format (triggering OOMs)!
i guess we can look at first row vector and go on to either DenseMatrix or SparseRowMatrix
--
This message was sent by Atlassian JIRA
(v6.2#6252)
Hudson (JIRA)
2014-06-10 19:15:03 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026865#comment-14026865 ]

Hudson commented on MAHOUT-1572:
--------------------------------

SUCCESS: Integrated in Mahout-Quality #2649 (See [https://builds.apache.org/job/Mahout-Quality/2649/])
MAHOUT-1572: blockify() to detect (naively) the data sparsity in the loaded data (dlyubimov: rev 8c529ccff23d419c4cb5191b0435de40d6a9831c)
* spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala
* CHANGELOG
* spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala
Post by Dmitriy Lyubimov (JIRA)
blockify() to detect (naively) the data sparsity in the loaded data
--------------------------------------------------------------------
Key: MAHOUT-1572
URL: https://issues.apache.org/jira/browse/MAHOUT-1572
Project: Mahout
Issue Type: Bug
Reporter: Dmitriy Lyubimov
Fix For: 1.0
.bq a dense matrix is converted into a SparseRowMatrix with dense row vectors by blockify(), after serialization this becomes a dense matrix in sparse format (triggering OOMs)!
i guess we can look at first row vector and go on to either DenseMatrix or SparseRowMatrix
--
This message was sent by Atlassian JIRA
(v6.2#6252)
ASF GitHub Bot (JIRA)
2016-04-14 20:21:25 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241823#comment-15241823 ]

ASF GitHub Bot commented on MAHOUT-1572:
----------------------------------------

Github user mariusmuja commented on the pull request:

https://github.com/apache/mahout/pull/10#issuecomment-210132385

For me, this caused ```spark-rowsimilarity``` to always stop with an ```java.lang.OutOfMemoryError: Java heap space```. Reverting this allowed spark-rowsimilarity to successfully complete.
Post by Dmitriy Lyubimov (JIRA)
blockify() to detect (naively) the data sparsity in the loaded data
--------------------------------------------------------------------
Key: MAHOUT-1572
URL: https://issues.apache.org/jira/browse/MAHOUT-1572
Project: Mahout
Issue Type: Bug
Reporter: Dmitriy Lyubimov
Fix For: 0.10.0
.bq a dense matrix is converted into a SparseRowMatrix with dense row vectors by blockify(), after serialization this becomes a dense matrix in sparse format (triggering OOMs)!
i guess we can look at first row vector and go on to either DenseMatrix or SparseRowMatrix
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-04-19 15:38:25 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247992#comment-15247992 ]

ASF GitHub Bot commented on MAHOUT-1572:
----------------------------------------

Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/10#issuecomment-211983401

yes this is not the way to do it
actually it seems vectors are not reporting density faithfully, i had to do a more thorough analysis of data density elsewhere before doing things. this probably needs to be patched with a more thorough technique.
+1
Post by Dmitriy Lyubimov (JIRA)
blockify() to detect (naively) the data sparsity in the loaded data
--------------------------------------------------------------------
Key: MAHOUT-1572
URL: https://issues.apache.org/jira/browse/MAHOUT-1572
Project: Mahout
Issue Type: Bug
Reporter: Dmitriy Lyubimov
Fix For: 0.10.0
.bq a dense matrix is converted into a SparseRowMatrix with dense row vectors by blockify(), after serialization this becomes a dense matrix in sparse format (triggering OOMs)!
i guess we can look at first row vector and go on to either DenseMatrix or SparseRowMatrix
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Loading...