Discussion:
[jira] [Created] (MAHOUT-1837) Sparse/Dense Matrix analysis for Matrix Multiplication
Andrew Palumbo (JIRA)
2016-04-27 18:56:12 UTC
Permalink
Andrew Palumbo created MAHOUT-1837:
--------------------------------------

Summary: Sparse/Dense Matrix analysis for Matrix Multiplication
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Fix For: 0.12.1


In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.

There are two issues here one with a quick Fix and one a bit more involved:
# in {{ABt.Scala}} use check the `MatrixFlavor` of the combiner and use the flavor of the Block as the resulting Sparse or Dense matrix type:
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.

# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-04-28 18:42:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262703#comment-15262703 ]

Suneel Marthi commented on MAHOUT-1837:
---------------------------------------

Wouldn't it be easier to separate the 2 out into respective jiras ?
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Fix For: 0.12.1
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-04-28 19:53:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262827#comment-15262827 ]

Andrew Palumbo commented on MAHOUT-1837:
----------------------------------------

The first fix is trivial (that's the actual fix in the description) and does not actually check the data for sparsity (it does help in the dense/dense case which is currently being returned as sparse). Also the combiner above: {{comb}} is the result of the in-core {{mmul}} class which is where we would actully check for densified data. Since they're tightly coupled, it seemed to make sense to knock them out all at once. Take a look at https://github.com/apache/mahout/pull/228.

Maybe the {{blockify}} part belongs in another issue though. I have not looked into that yet.
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Fix For: 0.12.1
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-05-03 18:52:13 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo reassigned MAHOUT-1837:
--------------------------------------

Assignee: Andrew Palumbo
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.12.1
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-05-04 00:01:23 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269875#comment-15269875 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

https://github.com/apache/mahout/pull/228#discussion_r61976030

--- Diff: math-scala/src/main/scala/org/apache/mahout/math/scalabindings/package.scala ---
@@ -410,4 +412,34 @@ package object scalabindings {

def dist(mxX: Matrix, mxY: Matrix): Matrix = sqDist(mxX, mxY) := sqrt _

+ /**
+ * Check the density of an in-core matrix based on supplied criteria.
+ *
+ * @param mxX The matrix to check density of.
+ * @param rowSparsityThreshold the proportion of the rows which must be dense.
+ * @param elementSparsityThreshold the prpoportion of the rows in the random sample of the matrix which must be dense.
+ * @param sample how moch of the matrix to sample.
+ */
+ def isMatrixDense(mxX: Matrix, rowSparsityThreshold: Double = .30, elementSparsityThreshold: Double = .30, sample: Double = .25): Boolean = {
+ val rand = RandomUtils.getRandom
+ val m = mxX.numRows()
+ val numRowToTest: Int = (sample * m).toInt
+
+ var numDenseRows: Int = 0
+
+ for (i <- 0 until numRowToTest) {
+ // select a row at random
+ val row: Vector = mxX(rand.nextInt(m), ::)
+ // check the sparsity of that rosw if it is greater than the set sparsity threshold count this row as dense
+ if (row.getNumNonZeroElements / row.size().toDouble > elementSparsityThreshold) {
+ numDenseRows = numDenseRows + 1
+ }
+ }
+
+ // return the number of denserows/tested rows > rowSparsityThreshold
+ numDenseRows/numRowToTest > rowSparsityThreshold
+ }
+
--- End diff --

@dlyubimov does this seem like a decent test for matrix Density? I've put in both an `elementSparsityThreshold` to determine if a Vector itself is sparse, and a `rowSparsityThreshold` as a threshold for the entire matrix. I've also added in a `Vector.mean()` method but am not sure if it is needed in this case.
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.12.1
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-05-04 00:33:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269923#comment-15269923 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

https://github.com/apache/mahout/pull/228#discussion_r61978602

--- Diff: spark/src/main/scala/org/apache/mahout/sparkbindings/blas/ABt.scala ---
@@ -116,7 +118,12 @@ object ABt {
// Empty combiner += value
createCombiner = (t: (Array[K], Array[Int], Matrix)) => {
val (rowKeys, colKeys, block) = t
- val comb = new SparseMatrix(prodNCol, block.nrow).t
--- End diff --

I see now that there is more to this than I'd originally thought. We have to be careful in operations like `drmA %*% drmB` to have both sparse and dense blocks in the returned DRM.
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.12.1
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-05-04 03:58:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270089#comment-15270089 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

https://github.com/apache/mahout/pull/228#discussion_r61988305

--- Diff: spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala ---
@@ -60,19 +60,27 @@ package object drm {
val keys = data.map(t => t._1).toArray[K]
val vectors = data.map(t => t._2).toArray

- val block = if (vectors(0).isDense) {
- val block = new DenseMatrix(vectors.length, blockncol)
- var row = 0
- while (row < vectors.length) {
- block(row, ::) := vectors(row)
- row += 1
- }
+ // create the block by default as sparse.
+ // would probably be better to sample a subset of these
+ // vectors first before creating the entire matrix.
+ // so that we don't have the overhead of creating a full second matrix in
+ // the case that the matrix is not Spars
+ val block = new DenseMatrix(vectors.length, blockncol)
+ var row = 0
+ while (row < vectors.length) {
+ block(row, ::) := vectors(row)
+ row += 1
+ }
+
+ // Test the density of the data. If the matrix does not meet the
+ // requirements for sparsity, convert the Vectors to a dense Matrix.
+ val resBlock = if (isMatrixDense(block)) {
block
--- End diff --

currently fails this test (when testing samples of the full matrix) with a density threshold of .3 rows/matrix containing, .30% nonZeroElements/row and a sample size or .25 (with a minimum of one row to test):
```scala
test("DRM blockify sparse -> SRM") {

val inCoreA = sparse(
(1, 2, 3),
0 -> 3 :: 2 -> 5 :: Nil
)
val drmA = drmParallelize(inCoreA, numPartitions = 2)

(inCoreA - drmA.mapBlock() {
case (keys, block) =>
if (!block.isInstanceOf[SparseRowMatrix])
throw new AssertionError("Block must be dense.")
keys -> block
}).norm should be < 1e-4
}
```
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.12.1
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-05-04 08:11:13 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270296#comment-15270296 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on the pull request:

https://github.com/apache/mahout/pull/228#issuecomment-216776902

Changed only this test (when testing samples a full matrix) with a density threshold of .3 rows/matrix containing, .30% nonZeroElements/row and a sample size or .25 (with a minimum of one row to test). It seems that the test *should* be returning a `DenseMatrix` (there is only a single missing element in the entire matrix.

```scala
test("DRM blockify sparse -> SRM") {

val inCoreA = sparse(
(1, 2, 3),
0 -> 3 :: 2 -> 5 :: Nil
)
val drmA = drmParallelize(inCoreA, numPartitions = 2)

(inCoreA - drmA.mapBlock() {
case (keys, block) =>
--> // if (!block.isInstanceOf[SparseRowMatrix])
if (block.isInstanceOf[SparseRowMatrix])
throw new AssertionError("Block must be dense.")
keys -> block
}).norm should be < 1e-4
}
```
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.12.1
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-05-04 09:03:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270352#comment-15270352 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on the pull request:

https://github.com/apache/mahout/pull/228#issuecomment-216799830

Performance comparison:
We can see some speedups though also some performance hits. The performance hits are likely due to the sampling and testing of the data, rather than just the examination of the matrix structure, and at first glance seems to be somewhat constant.

Performance gains are significant in some places will try to examine more closely later.

There is also likely something to gain by tuning the parameters of the `isMatrixDense(...)` function.

Master
-
```
Ad %*% Bd: (231.66666666666666,51.0)
Ad(::,::) %*% Bd: (288.0,33.0)
Ad' %*% Bd: (440.6666666666667,35.666666666666664)
Ad %*% Bd': (168.0,38.666666666666664)
Ad' %*% Bd': (300.6666666666667,37.333333333333336)
Ad'' %*% Bd'': (276.3333333333333,35.666666666666664)

Asr %*% Bsr: (47.666666666666664,24.333333333333332)
Asr' %*% Bsr: (1200.3333333333333,20.0)
Asr %*% Bsr': (119.0,22.0)
Asr' %*% Bsr': (1000.0,16.666666666666668)
Asr'' %*% Bsr'': (15.666666666666666,18.666666666666668)

Asm %*% Bsm: (1228.3333333333333,27.333333333333332)
Asm' %*% Bsm: (1633.6666666666667,29.333333333333332)
Asm %*% Bsm': (1021.3333333333334,34.333333333333336)
Asm' %*% Bsm': (1367.6666666666667,17.666666666666668)
Asm'' %*% Bsm'': (1314.6666666666667,17.0)

Asm %*% Bsr: (1049.6666666666667,18.333333333333332)
Asm' %*% Bsr: (1507.3333333333333,23.333333333333332)
Asm %*% Bsr': (957.3333333333334,20.666666666666668)
Asm' %*% Bsr': (1401.0,16.333333333333332)
Asm'' %*% Bsr'': (1051.3333333333333,17.333333333333332)

Asr %*% Bsm: (18.333333333333332,20.333333333333332)
Asr' %*% Bsm: (1610.3333333333333,21.666666666666668)
Asr %*% Bsm': (118.33333333333333,26.333333333333332)
Asr' %*% Bsm': (1306.0,18.0)
Asr'' %*% Bsm'': (18.0,17.666666666666668)

Ad %*% Bsr: (701.0,66.0)
Ad' %*% Bsr: (900.0,52.333333333333336)
Ad %*% Bsr': (620.0,50.0)
Ad' %*% Bsr': (819.0,49.666666666666664)
Ad'' %*% Bsr'': (711.0,55.333333333333336)

Asr %*% Bd: (54.333333333333336,50.0)
Asr' %*% Bd: (707.0,53.666666666666664)
Asr %*% Bd': (112.33333333333333,74.66666666666667)
Asr' %*% Bd': (779.3333333333334,80.0)
Asr'' %*% Bd'': (63.666666666666664,71.33333333333333)

Ad %*% Bsm: (964.3333333333334,172.33333333333334)
Ad' %*% Bsm: (1183.0,171.66666666666666)
Ad %*% Bsm': (704.0,175.0)
Ad' %*% Bsm': (891.6666666666666,77.33333333333333)
Ad'' %*% Bsm'': (952.6666666666666,172.0)

Asm %*% Bd: (514.6666666666666,72.33333333333333)
Asm' %*% Bd: (814.3333333333334,77.0)
Asm %*% Bd': (541.6666666666666,73.33333333333333)
Asm' %*% Bd': (842.6666666666666,165.0)
Asm'' %*% Bd'': (507.3333333333333,71.0)

Ad %*% D: (213.0,13.666666666666666)
Asr %*% D: (57.333333333333336,19.333333333333332)
Asm %*% D: (378.6666666666667,15.0)
D %*% Ad: (10.333333333333334,2.6666666666666665)
D %*% Asr: (8.0,2.0)
D %*% Asm: (1.0,1.3333333333333333)

Ad' %*% D: (409.3333333333333,2.6666666666666665)
Asr' %*% D: (571.0,1.3333333333333333)
Asm' %*% D: (658.3333333333334,1.3333333333333333)
D %*% Ad': (13.666666666666666,5.666666666666667)
D %*% Asr': (16.333333333333332,6.333333333333333)
D %*% Asm': (10.333333333333334,6.333333333333333)

```

This Branch
-
```
Ad %*% Bd: (193.33333333333334,55.666666666666664)
Ad(::,::) %*% Bd: (272.0,53.333333333333336)
Ad' %*% Bd: (438.0,40.666666666666664)
Ad %*% Bd': (169.0,32.333333333333336)
Ad' %*% Bd': (291.6666666666667,33.333333333333336)
Ad'' %*% Bd'': (230.0,41.666666666666664)

Asr %*% Bsr: (42.333333333333336,31.666666666666668)
Asr' %*% Bsr: (1183.6666666666667,24.0)
Asr %*% Bsr': (98.0,23.333333333333332)
Asr' %*% Bsr': (993.3333333333334,21.333333333333332)
Asr'' %*% Bsr'': (15.666666666666666,22.666666666666668)

Asm %*% Bsm: (1267.6666666666667,23.333333333333332)
Asm' %*% Bsm: (1635.3333333333333,25.666666666666668)
Asm %*% Bsm': (1012.3333333333334,24.333333333333332)
Asm' %*% Bsm': (1428.3333333333333,19.0)
Asm'' %*% Bsm'': (1254.6666666666667,29.333333333333332)

Asm %*% Bsr: (1037.6666666666667,17.333333333333332)
Asm' %*% Bsr: (1480.6666666666667,22.666666666666668)
Asm %*% Bsr': (954.0,20.666666666666668)
Asm' %*% Bsr': (1383.3333333333333,25.666666666666668)
Asm'' %*% Bsr'': (1034.6666666666667,16.666666666666668)

Asr %*% Bsm: (17.666666666666668,17.0)
Asr' %*% Bsm: (1563.3333333333333,25.333333333333332)
Asr %*% Bsm': (111.66666666666667,24.0)
Asr' %*% Bsm': (1271.3333333333333,19.333333333333332)
Asr'' %*% Bsm'': (17.0,23.333333333333332)

Ad %*% Bsr: (692.3333333333334,61.666666666666664)
Ad' %*% Bsr: (875.6666666666666,49.0)
Ad %*% Bsr': (623.6666666666666,54.333333333333336)
Ad' %*% Bsr': (795.0,48.333333333333336)
Ad'' %*% Bsr'': (693.0,50.0)

Asr %*% Bd: (55.0,47.666666666666664)
Asr' %*% Bd: (681.3333333333334,49.0)
Asr %*% Bd': (110.0,66.33333333333333)
Asr' %*% Bd': (758.6666666666666,74.0)
Asr'' %*% Bd'': (59.666666666666664,59.0)

Ad %*% Bsm: (940.3333333333334,173.0)
Ad' %*% Bsm: (1164.0,174.33333333333334)
Ad %*% Bsm': (698.6666666666666,168.33333333333334)
Ad' %*% Bsm': (881.3333333333334,65.66666666666667)
Ad'' %*% Bsm'': (939.3333333333334,162.0)

Asm %*% Bd: (527.3333333333334,60.666666666666664)
Asm' %*% Bd: (808.3333333333334,66.66666666666667)
Asm %*% Bd': (577.6666666666666,62.333333333333336)
Asm' %*% Bd': (870.6666666666666,166.66666666666666)
Asm'' %*% Bd'': (512.0,61.0)

Ad %*% D: (229.66666666666666,14.333333333333334)
Asr %*% D: (52.666666666666664,19.666666666666668)
Asm %*% D: (429.3333333333333,14.666666666666666)
D %*% Ad: (8.666666666666666,2.3333333333333335)
D %*% Asr: (6.333333333333333,1.6666666666666667)
D %*% Asm: (1.0,1.3333333333333333)

Ad' %*% D: (437.0,4.333333333333333)
Asr' %*% D: (570.3333333333334,2.0)
Asm' %*% D: (664.6666666666666,2.3333333333333335)
D %*% Ad': (13.666666666666666,7.0)
D %*% Asr': (19.666666666666668,5.666666666666667)
D %*% Asm': (10.333333333333334,7.0)

```
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.12.1
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-05-04 09:44:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1837:
-----------------------------------
Attachment: compareDensityTest.ods

Spreadsheet of time comparisons for density calculation in {{MMul}} class vs. {{MatrixFlavor}}.
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.12.1
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-05-04 09:50:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270400#comment-15270400 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on the pull request:

https://github.com/apache/mahout/pull/228#issuecomment-216814874

See the Spreadsheet attached to the Jira:

https://issues.apache.org/jira/secure/attachment/12802153/compareDensityTest.ods

for performance comparison.
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.12.1
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-05-18 15:40:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1837:
-----------------------------------
Fix Version/s: (was: 0.12.1)
0.13.0
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Hudson (JIRA)
2016-06-16 03:48:05 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333013#comment-15333013 ]

Hudson commented on MAHOUT-1837:
--------------------------------

FAILURE: Integrated in Mahout-Quality #3379 (See [https://builds.apache.org/job/Mahout-Quality/3379/])
MAHOUT-1837: Sparse/Dense Matrix analysis for Matrix Multiplication. (apalumbo: rev d9940489d2f849d36af396d603f6170ab560e505)
* spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala
* spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala
* math-scala/src/main/scala/org/apache/mahout/math/scalabindings/package.scala
* math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala
* math-scala/src/test/scala/org/apache/mahout/math/scalabindings/MathSuite.scala
* math-scala/src/main/scala/org/apache/mahout/math/scalabindings/MMul.scala
* spark/src/main/scala/org/apache/mahout/sparkbindings/blas/ABt.scala
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-06-16 15:55:05 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo resolved MAHOUT-1837.
------------------------------------
Resolution: Fixed

committed to master
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-06-24 20:54:16 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348602#comment-15348602 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

GitHub user andrewpalumbo opened a pull request:

https://github.com/apache/mahout/pull/244

MAHOUT-1837 flip <= threshold to > at the final return for dense

fix for the incorrect threshold analysis


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewpalumbo/mahout MAHOUT-1837-b

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #244

----
commit 1388d8f2d3bbdc50bf0e554b6b9176da2231f7d1
Author: Andrew Palumbo <***@apache.org>
Date: 2016-06-24T20:51:36Z

flip <= threshold to > at the final return for dense

----
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-06-24 21:26:16 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348678#comment-15348678 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/244
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Hudson (JIRA)
2016-06-24 22:17:16 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348777#comment-15348777 ]

Hudson commented on MAHOUT-1837:
--------------------------------

SUCCESS: Integrated in Mahout-Quality #3380 (See [https://builds.apache.org/job/Mahout-Quality/3380/])
MAHOUT-1837: fix incorrect <= threshold to > threshold to indicate a (apalumbo: rev 727e5be85c0326d9c009d9cdc361fe47ffa201ad)
* math-scala/src/test/scala/org/apache/mahout/math/scalabindings/MathSuite.scala
* spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala
* math-scala/src/main/scala/org/apache/mahout/math/scalabindings/MMul.scala
* math-scala/src/main/scala/org/apache/mahout/math/scalabindings/package.scala
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-08-25 16:21:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo reopened MAHOUT-1837:
------------------------------------

Reopening- may need to use {{SparseMatrix}} as default in {{Drm.blockify()}}, Users are seeing OOM errors in when using {{DenseMatrix}} as default block:

{code:title=Drm.blockify()}
{...}
val block = new DenseMatrix(vectors.length, blockncol)
var row = 0
while (row < vectors.length) {
block(row, ::) := vectors(row)
row += 1
}

// Test the density of the data. If the matrix does not meet the
// requirements for density, convert the Vectors to a sparse Matrix.
val resBlock = if (densityAnalysis(block)) {
block
} else {
new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
}
{code}

I proposing using {{SparseMatrix}} as default and then testing sparsity and copying into a {{DenseMatrix}} if necessary.
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-08-25 19:09:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437473#comment-15437473 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on the pull request:

https://github.com/apache/mahout/commit/727e5be85c0326d9c009d9cdc361fe47ffa201ad#commitcomment-18781772

Re-opened MAHOUT-1837. Will get a fix out soon.
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-08-26 01:06:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438237#comment-15438237 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

GitHub user andrewpalumbo opened a pull request:

https://github.com/apache/mahout/pull/252

MAHOUT-1837: Fixed dense bug in drm/package.blockify()

Create a `SparseRowMatrix` by default in order to keep `OOM` errors from occurring in `blockify()` per conversation in: https://github.com/apache/mahout/commit/727e5be85c0326d9c009d9cdc361fe47ffa201ad#commitcomment-18771603. run `densityAnalysis()` on that and convert to dense if requirements are met.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewpalumbo/mahout MAHOUT-1837-dense-bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #252

----
commit 5d2f5cc9746f968d0c776f869070dd9a439de9f1
Author: Andrew Palumbo <***@apache.org>
Date: 2016-08-26T00:58:57Z

fixed dense bug in drm/package.blockify()

----
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-08-26 18:44:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439547#comment-15439547 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user dlyubimov commented on the issue:

https://github.com/apache/mahout/pull/252

Looks good to me
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-08-27 03:04:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440540#comment-15440540 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

https://github.com/apache/mahout/pull/252#discussion_r76509011

--- Diff: spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala ---
@@ -60,26 +60,22 @@ package object drm {
val keys = data.map(t => t._1).toArray[K]
val vectors = data.map(t => t._2).toArray

- // create the block by default as dense.
- // would probably be better to sample a subset of these
- // vectors first before creating the entire matrix.
- // so that we don't have the overhead of creating a full second matrix in
- // the case that the matrix is not dense.
- val block = new DenseMatrix(vectors.length, blockncol)
- var row = 0
- while (row < vectors.length) {
- block(row, ::) := vectors(row)
- row += 1
- }
+ // create the block by default as Sparse.
+ val block = new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)

- // Test the density of the data. If the matrix does not meet the
- // requirements for density, convert the Vectors to a sparse Matrix.
+ // Test the density of the data. If the matrix does meets the
+ // requirements for density, convert the Vectors to a DenseMatrix.
val resBlock = if (densityAnalysis(block)) {
- block
+ val dBlock = new DenseMatrix(vectors.length, blockncol)
+ var row = 0
+ while (row < vectors.length) {
+ dBlock(row, ::) := vectors(row)
+ row += 1
+ }
+ dBlock
} else {
- new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
--- End diff --

Should the default here be a `SparseRowMatrix` of Random Access Sparse Vectors? Seems so. I.e.
This line should probably read:
```java
new SparseRowMatrix(vectors.length, blockncol, vectors, true, true)
```
rather than:
```java
new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
```
as is, correct?
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-08-27 04:23:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440647#comment-15440647 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user dlyubimov commented on a diff in the pull request:

https://github.com/apache/mahout/pull/252#discussion_r76509896

--- Diff: spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala ---
@@ -60,26 +60,22 @@ package object drm {
val keys = data.map(t => t._1).toArray[K]
val vectors = data.map(t => t._2).toArray

- // create the block by default as dense.
- // would probably be better to sample a subset of these
- // vectors first before creating the entire matrix.
- // so that we don't have the overhead of creating a full second matrix in
- // the case that the matrix is not dense.
- val block = new DenseMatrix(vectors.length, blockncol)
- var row = 0
- while (row < vectors.length) {
- block(row, ::) := vectors(row)
- row += 1
- }
+ // create the block by default as Sparse.
+ val block = new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)

- // Test the density of the data. If the matrix does not meet the
- // requirements for density, convert the Vectors to a sparse Matrix.
+ // Test the density of the data. If the matrix does meets the
+ // requirements for density, convert the Vectors to a DenseMatrix.
val resBlock = if (densityAnalysis(block)) {
- block
+ val dBlock = new DenseMatrix(vectors.length, blockncol)
+ var row = 0
+ while (row < vectors.length) {
+ dBlock(row, ::) := vectors(row)
+ row += 1
+ }
+ dBlock
} else {
- new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
--- End diff --

No, i believe sequential should stay as it will be more natural (ordered)
when converted to CSR which hopefuly will be our most common modus operandi
for exponential algorithms.

in fact, random access rarely makes sense at all for block-wise algorithms.
In spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala
Post by Andrew Palumbo (JIRA)
} else {
- new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
Should the default here be a SparseRowMatrix of Random Access Sparse
Vectors? Seems so. I.e.
new SparseRowMatrix(vectors.length, blockncol, vectors, true, true)
new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
as is, correct?

You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/apache/mahout/pull/252/files/5d2f5cc9746f968d0c776f869070dd9a439de9f1#r76509011>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAf7_zKcdpkHa0zDQnNvOFMuXQHNW897ks5qj6j4gaJpZM4JtpfO>
.
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-08-27 16:21:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15441848#comment-15441848 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

https://github.com/apache/mahout/pull/252#discussion_r76518811

--- Diff: spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala ---
@@ -60,26 +60,22 @@ package object drm {
val keys = data.map(t => t._1).toArray[K]
val vectors = data.map(t => t._2).toArray

- // create the block by default as dense.
- // would probably be better to sample a subset of these
- // vectors first before creating the entire matrix.
- // so that we don't have the overhead of creating a full second matrix in
- // the case that the matrix is not dense.
- val block = new DenseMatrix(vectors.length, blockncol)
- var row = 0
- while (row < vectors.length) {
- block(row, ::) := vectors(row)
- row += 1
- }
+ // create the block by default as Sparse.
+ val block = new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)

- // Test the density of the data. If the matrix does not meet the
- // requirements for density, convert the Vectors to a sparse Matrix.
+ // Test the density of the data. If the matrix does meets the
+ // requirements for density, convert the Vectors to a DenseMatrix.
val resBlock = if (densityAnalysis(block)) {
- block
+ val dBlock = new DenseMatrix(vectors.length, blockncol)
+ var row = 0
+ while (row < vectors.length) {
+ dBlock(row, ::) := vectors(row)
+ row += 1
+ }
+ dBlock
} else {
- new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
--- End diff --

thx.
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-08-27 16:23:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15441852#comment-15441852 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/252
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-08-27 16:28:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15441858#comment-15441858 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/252
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Hudson (JIRA)
2016-08-27 17:00:22 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15441908#comment-15441908 ]

Hudson commented on MAHOUT-1837:
--------------------------------

FAILURE: Integrated in Jenkins build Mahout-Quality #3387 (See [https://builds.apache.org/job/Mahout-Quality/3387/])
MAHOUT-1837: fix bug in drm.blockify(): use SparseRowMatrix by default (apalumbo: rev f4a71d084958f2e1865efc8ac8115cd51e1e57d9)
* (edit) spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-08-28 00:02:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo resolved MAHOUT-1837.
------------------------------------
Resolution: Fixed

fixed default {DenseMatrix}} bug which was throwing OOM errors via https://github.com/apache/mahout/commit/f4a71d084958f2e1865efc8ac8115cd51e1e57d9
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-08-28 13:23:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15443444#comment-15443444 ]

ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------

Github user AddictedCS commented on the issue:

https://github.com/apache/mahout/pull/252
Post by Andrew Palumbo (JIRA)
Sparse/Dense Matrix analysis for Matrix Multiplication
------------------------------------------------------
Key: MAHOUT-1837
URL: https://issues.apache.org/jira/browse/MAHOUT-1837
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.12.0
Reporter: Andrew Palumbo
Assignee: Andrew Palumbo
Fix For: 0.13.0
Attachments: compareDensityTest.ods
In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
{code}
val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
new SparseMatrix(prodNCol, block.nrow).t
} else {
new DenseMatrix(prodNCol, block.nrow).t
}
{code}
a simlar check needs to be made in the {{blockify}} transformation.
# More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Loading...