Discussion:
[jira] [Created] (MAHOUT-1582) Create simpler row and column aggregation API at local level
Ted Dunning (JIRA)
2014-06-16 03:35:01 UTC
Permalink
Ted Dunning created MAHOUT-1582:
-----------------------------------

Summary: Create simpler row and column aggregation API at local level
Key: MAHOUT-1582
URL: https://issues.apache.org/jira/browse/MAHOUT-1582
Project: Mahout
Issue Type: Bug
Reporter: Ted Dunning


The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.

What I suggest is an API of this form:

{code}
Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}

This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.

The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.




--
This message was sent by Atlassian JIRA
(v6.2#6252)
Sahil Sharma (JIRA)
2014-06-16 03:56:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032103#comment-14032103 ]

Sahil Sharma commented on MAHOUT-1582:
--------------------------------------

Hi,

Could you please point to the specific API(or class actually) you are referring to? Maybe using a link from http://grepcode.com/snapshot/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.9
Post by Ted Dunning (JIRA)
Create simpler row and column aggregation API at local level
------------------------------------------------------------
Key: MAHOUT-1582
URL: https://issues.apache.org/jira/browse/MAHOUT-1582
Project: Mahout
Issue Type: Bug
Reporter: Ted Dunning
The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.
{code}
Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}
This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.
The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
Ted Dunning (JIRA)
2014-06-16 06:18:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032136#comment-14032136 ]

Ted Dunning commented on MAHOUT-1582:
-------------------------------------

The API's that I have had problems with are the Matrix.aggregateRows and Matrix.aggregateColumns API's.
Post by Ted Dunning (JIRA)
Create simpler row and column aggregation API at local level
------------------------------------------------------------
Key: MAHOUT-1582
URL: https://issues.apache.org/jira/browse/MAHOUT-1582
Project: Mahout
Issue Type: Bug
Reporter: Ted Dunning
The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.
{code}
Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}
This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.
The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
Sahil Sharma (JIRA)
2014-06-16 11:53:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032364#comment-14032364 ]

Sahil Sharma commented on MAHOUT-1582:
--------------------------------------

Hey,

Just to be clear, what you are talking about is this , right?
http://goo.gl/84dDBo
Post by Ted Dunning (JIRA)
Create simpler row and column aggregation API at local level
------------------------------------------------------------
Key: MAHOUT-1582
URL: https://issues.apache.org/jira/browse/MAHOUT-1582
Project: Mahout
Issue Type: Bug
Reporter: Ted Dunning
The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.
{code}
Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}
This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.
The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
Ted Dunning (JIRA)
2014-06-16 14:40:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032478#comment-14032478 ]

Ted Dunning commented on MAHOUT-1582:
-------------------------------------

Thanks for the link. Actually, very close to that. More like this:

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-math/0.9/org/apache/mahout/math/Matrix.java#Matrix.aggregateColumns%28org.apache.mahout.math.function.VectorFunction%29

or this

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-math/0.9/org/apache/mahout/math/Matrix.java#Matrix.aggregateRows%28org.apache.mahout.math.function.VectorFunction%29

These three entry points are right next to each other in the javadoc, however, so it is a little subtle telling them apart.
Post by Ted Dunning (JIRA)
Create simpler row and column aggregation API at local level
------------------------------------------------------------
Key: MAHOUT-1582
URL: https://issues.apache.org/jira/browse/MAHOUT-1582
Project: Mahout
Issue Type: Bug
Reporter: Ted Dunning
The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.
{code}
Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}
This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.
The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
Andrew Palumbo (JIRA)
2015-03-06 01:45:38 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1582:
-----------------------------------
Labels: legacy math scala (was: )
Post by Ted Dunning (JIRA)
Create simpler row and column aggregation API at local level
------------------------------------------------------------
Key: MAHOUT-1582
URL: https://issues.apache.org/jira/browse/MAHOUT-1582
Project: Mahout
Issue Type: Bug
Reporter: Ted Dunning
Labels: legacy, math, scala
The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.
{code}
Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}
This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.
The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2015-06-18 19:51:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-1582:
-------------------------------------
Assignee: Suneel Marthi
Post by Ted Dunning (JIRA)
Create simpler row and column aggregation API at local level
------------------------------------------------------------
Key: MAHOUT-1582
URL: https://issues.apache.org/jira/browse/MAHOUT-1582
Project: Mahout
Issue Type: Bug
Reporter: Ted Dunning
Assignee: Suneel Marthi
Labels: legacy, math, scala
The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.
{code}
Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}
This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.
The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2015-10-25 11:45:27 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973196#comment-14973196 ]

Suneel Marthi commented on MAHOUT-1582:
---------------------------------------

Is this still required, post Mahout - 0.10.x ?
Post by Ted Dunning (JIRA)
Create simpler row and column aggregation API at local level
------------------------------------------------------------
Key: MAHOUT-1582
URL: https://issues.apache.org/jira/browse/MAHOUT-1582
Project: Mahout
Issue Type: Bug
Reporter: Ted Dunning
Assignee: Suneel Marthi
Labels: legacy, math, scala
The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.
{code}
Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}
This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.
The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-10-09 23:07:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi resolved MAHOUT-1582.
-----------------------------------
Resolution: Won't Fix
Fix Version/s: 0.13.0

Resolving this as 'Won't Fix', please feel free to create a new Jira
Post by Ted Dunning (JIRA)
Create simpler row and column aggregation API at local level
------------------------------------------------------------
Key: MAHOUT-1582
URL: https://issues.apache.org/jira/browse/MAHOUT-1582
Project: Mahout
Issue Type: Bug
Reporter: Ted Dunning
Assignee: Suneel Marthi
Labels: legacy, math, scala
Fix For: 0.13.0
The issue is that the current row and column aggregation API makes it difficult to do anything but row by row aggregation using anonymous classes. There is no scope for being aware of locality, nor to use the well known function definitions in Functions. This makes lots of optimizations impossible and many of these are optimizations that we want to have. An example would be adding up absolute values of values. With the current API, it would be very hard to optimize for sparse matrices and the wrong direction of iteration but with a different API, this should be easy.
{code}
Vector aggregateRows(DoubleDoubleFunction combiner, DoubleFunction mapper)
{code}
This will produce a vector with one element per row in the original. The nice thing here is that if the matrix is row major, we can iterate over rows and accumulate a value for each row using sparsity as available. On the other hand, if the matrix is column major, we can keep a vector of accumulators and still use sparsity as appropriate.
The use of sparsity comes in because the matrix code now has control over both of the loops involved and also has visibility into properties of the map and combine functions. For instance, ABS(0) == 0 so if we combine with PLUS, we can use a sparse iterator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Loading...