Discussion:
[jira] [Created] (MAHOUT-1626) Support for required quasi-algebraic operations and starting with aggregating rows/blocks
Gokhan Capan (JIRA)
2014-11-15 13:12:33 UTC
Permalink
Gokhan Capan created MAHOUT-1626:
------------------------------------

Summary: Support for required quasi-algebraic operations and starting with aggregating rows/blocks
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Fix For: 1.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2014-11-15 13:16:33 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213591#comment-14213591 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

GitHub user gcapan opened a pull request:

https://github.com/apache/mahout/pull/62

MAHOUT-1626 Support for required quasi-algebraic operations and starting with aggregating rows/blocks

As discussed in the dev-list, we are now in the progress of adding an additional set of distributed operations to be performed on checkpointed matrices.

An initial implementation for aggregating rows/blocks is included in this request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gcapan/mahout accumulateblocks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/62.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #62

----
commit 6c75dead432c9f2d714834fade51361cf9bd2781
Author: Gokhan <***@gmail.com>
Date: 2014-11-11T20:01:30Z

aggregate call to a drm, aggregateBlocks implementation for Spark backend

commit 060d6f136343bb659902bfed64b93fbc58c9dcfe
Author: Gokhan <***@gmail.com>
Date: 2014-11-11T20:40:27Z

making sure that zeroval is a double in tests

commit 7cf03c40d0dd6a681ecab2a2c68e258ce4016729
Author: Gokhan <***@gmail.com>
Date: 2014-11-14T19:36:33Z

Updates based on Dmitriy's comments

commit 3e012b24c65d5c6646ceb28fb2ea93523345de33
Author: Gokhan <***@gmail.com>
Date: 2014-11-15T13:02:52Z

License

commit 65567a61aee5a65a96a7beda3664e5c7ec5dbada
Author: Gokhan <***@gmail.com>
Date: 2014-11-15T13:03:34Z

Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/mahout into accumulateblocks

----
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Fix For: 1.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-01-26 19:23:35 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292274#comment-14292274 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/62#issuecomment-71520531

what's the status?

Also, what happens to H20 support of this? it'll just be failing?
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Fix For: 1.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-01-26 19:28:34 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292282#comment-14292282 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user dlyubimov commented on a diff in the pull request:

https://github.com/apache/mahout/pull/62#discussion_r23556922

--- Diff: h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala ---
@@ -32,6 +32,7 @@ import org.apache.mahout.common.{Hadoop1HDFSUtil, HDFSUtil}
object H2OEngine extends DistributedEngine {
// By default, use Hadoop 1 utils
var hdfsUtils: HDFSUtil = Hadoop1HDFSUtil
+ val operations: DistributedOperations = null
--- End diff --

I guess in case of H20 this will break with NPE... Perhaps we can make it an Option[...] and gracefully report capability is not supported by engine if it happens to be None.
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Fix For: 1.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-02-03 12:52:35 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303214#comment-14303214 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user gcapan commented on a diff in the pull request:

https://github.com/apache/mahout/pull/62#discussion_r24001140

--- Diff: h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala ---
@@ -32,6 +32,7 @@ import org.apache.mahout.common.{Hadoop1HDFSUtil, HDFSUtil}
object H2OEngine extends DistributedEngine {
// By default, use Hadoop 1 utils
var hdfsUtils: HDFSUtil = Hadoop1HDFSUtil
+ val operations: DistributedOperations = null
--- End diff --

You're right, I'll fix it.
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Fix For: 1.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-02-03 13:33:35 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303257#comment-14303257 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user gcapan commented on the pull request:

https://github.com/apache/mahout/pull/62#issuecomment-72650471

The status is that I need to revise the code based on reviews.

But I have some concerns, summarized below:

Here is the story.

I'm going to contribute my recent work on distributed implementation of stochastic optimization to some open source library, and for me, the only reason that accumulating blocks matters is that I require it for averaging-based distributed stochastic gradient descent (DSGD).

I was an advocate of having Mahout as the ML and Matrix Computations core for distributed processing engines, and was thinking that the Matrix DSL would be sufficient for implementing such algorithms (such as DSGD) in an engine-agnostic way.

It seems that for implementing most optimization algorithms and ML models, one requires other-than-DSL operations. And those operations are highly engine-specific.

Repeating the aggregating operation in Mahout is duplicate work, just like MLlib's having some of Mahout's Matrix DSL capabilities duplicated in uglier ways. Plus, having an algorithm in Mahout but not in MLlib (or vice versa) really bothers me because other's users could not benefit.

Considering your recent codebase refactoring effort, @dlyubimov, I imagine the best way to use the DSL is by utilizing it inside MLlib (or whatever your favorite ML library is). That is, MLlib depends on Mahout Matrix-DSL implementation, Matrix I/O and computations are handled in Mahout, ML algorithms are handled in MLlib and/or other libraries.

Can we just slow this down and think about what should be contributed to where, and reconsider the ideal Mahout-Spark integration?
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Fix For: 1.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-02-05 02:22:34 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306511#comment-14306511 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/62#issuecomment-72982881

What's your method on DSGD? i know there's DSGD paper but it deals with SGD specifically for matrix factorizations, not a general SGD computational scheme. Could you please adduce a reference.

In general, computing batch (non-stochastic) gradient is demonstrably algebraic task -- but that's likely not what you are doing at all here of course.
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Fix For: 1.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-02-05 09:14:34 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306888#comment-14306888 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user gcapan commented on the pull request:

https://github.com/apache/mahout/pull/62#issuecomment-73015659

* The first and simplest will be Zinkevich et al.'s Parallelized Stochastic Gradient Descent [1]: The algorithm is basically running multiple local SGD's in parallel, then averaging them. For implementation, I was thinking of running SGD's locally in blocks of rows and averaging them.

* Further, I hope to implement distributed stratified SGD for matrix factorization (Gemulla et al.)[2]: The algorithm is forming strata (where each stratum consists of a set of blocks that do not share any rows or columns), then for each stratum, performing SGD updates in parallel.

I am not yet sure if the latter would require additional non-DSL stuff. I will raise my concerns once I get to it.

[1] http://martin.zinkevich.org/publications/nips2010.pdf
[2] http://dl.acm.org/citation.cfm?id=2020426
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Fix For: 1.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2015-03-06 02:12:39 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1626:
-----------------------------------
Labels: DSL scala spark (was: )
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 1.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Musselman (JIRA)
2015-03-28 17:05:53 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Musselman updated MAHOUT-1626:
-------------------------------------
Fix Version/s: (was: 1.0)
0.10.0
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.10.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Musselman (JIRA)
2015-03-28 17:06:53 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Musselman updated MAHOUT-1626:
-------------------------------------
Assignee: Gokhan Capan
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.10.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Anand Avati (JIRA)
2015-04-05 07:02:33 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396120#comment-14396120 ]

Anand Avati commented on MAHOUT-1626:
-------------------------------------

Is this still targeted for 0.10.0? If so I can wire up an implementation of aggregate() on H2O backend sooner rather than later..
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.10.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi
2015-04-05 07:03:25 UTC
Permalink
go for it, regardless of the target release.
Post by Anand Avati (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396120#comment-14396120
]
-------------------------------------
Is this still targeted for 0.10.0? If so I can wire up an implementation
of aggregate() on H2O backend sooner rather than later..
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with
aggregating rows/blocks
-----------------------------------------------------------------------------------------
Post by Gokhan Capan (JIRA)
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.10.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-04-05 07:14:33 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396124#comment-14396124 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user avati commented on the pull request:

https://github.com/apache/mahout/pull/62#issuecomment-89729815

Is this still being worked on? I can at least help with the H2O implementation of aggregate(), if there is interest in this PR..
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.10.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Musselman (JIRA)
2015-04-07 17:02:13 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483506#comment-14483506 ]

Andrew Musselman commented on MAHOUT-1626:
------------------------------------------

Status for freeze today?

Move to 0.10.1?
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.10.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi
2015-04-07 17:02:46 UTC
Permalink
Move to 0.10.1
Post by Andrew Musselman (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483506#comment-14483506
]
------------------------------------------
Status for freeze today?
Move to 0.10.1?
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with
aggregating rows/blocks
-----------------------------------------------------------------------------------------
Post by Gokhan Capan (JIRA)
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.10.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Musselman (JIRA)
2015-04-07 17:07:13 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Musselman updated MAHOUT-1626:
-------------------------------------
Fix Version/s: (was: 0.10.0)
0.10.1
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.10.1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-04-29 20:55:06 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520216#comment-14520216 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user gcapan commented on the pull request:

https://github.com/apache/mahout/pull/62#issuecomment-97582966

Under some conditions, which are satisfied in the case of linear and logistic regression, a statistical optimization problem (the parameter estimation) over i.i.d. data, distributed as:

1. average of the local estimates
2. a combination of the average of the local estimates and the average of the estimates on the subsamples of the local sample sets

converges in mean to the optimal risk minimizer, as it is described in [1]. Given that, these methods are not only a way to distribute machine learning, they also provide a _justification for machine learning on Big Data_ (that is, these algorithms converge to the true risk minimizer as the whole data were processed on a single computer).

With this motivation, I propose to add the two distributing schemes for machine learning: averaging and bootstrap-averaging. These would be abstracted away from the actual loss minimization algorithms, and the backend engines would only provide these two simple functions. The users can throw their favourite (in-core) optimization algorithm, and of course we would want to provide some of them out-of-box.

Very soon, I am hoping to submit a patch for that. The current patch would be obsolete then, so there is no need to replicate this. Once I submit it, I'll close the current PR.

[1] http://arxiv.org/abs/1209.4129
(The short version in NIPS: http://stanford.edu/~jduchi/projects/ZhangDuWa12_nips.pdf)
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.11.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-04-29 21:20:06 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520275#comment-14520275 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/62#issuecomment-97588410

let me route for that.
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.11.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-06-08 20:32:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577785#comment-14577785 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/62#issuecomment-110128740

ok i introduced allreduceBlock() in #135 but that, like spark's reduce, ends up in the driver collection, and supports only tensor types which is what i think we really should be constraining ourselves with (otherwise it would create with serialization requirements to the engines). Everything non-tensor and non-key i guess will have to be native engine code.

now, very speculatively, allreduceBlock (think spark's map+reduce on matrix blocks) should cover what you are talking about.
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.11.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2015-06-09 15:43:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579109#comment-14579109 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user gcapan commented on the pull request:

https://github.com/apache/mahout/pull/62#issuecomment-110407430

It seems it does, let me try.

Thanks, Dmitriy

Sent from my iPhone

On Jun 8, 2015, at 23:31, Dmitriy Lyubimov <***@github.com> wrote:

ok i introduced allreduceBlock() in #135
<https://github.com/apache/mahout/pull/135> but that, like spark's reduce,
ends up in the driver collection, and supports only tensor types which is
what i think we really should be constraining ourselves with (otherwise it
would create with serialization requirements to the engines). Everything
non-tensor and non-key i guess will have to be native engine code.

now, very speculatively, allreduceBlock (think spark's map+reduce on matrix
blocks) should cover what you are talking about.


Reply to this email directly or view it on GitHub
<https://github.com/apache/mahout/pull/62#issuecomment-110128740>.
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 1.0.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.11.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2015-10-25 18:11:27 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1626:
----------------------------------
Affects Version/s: (was: 1.0.0)
0.10.0
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.12.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Musselman (JIRA)
2015-11-05 01:01:22 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990859#comment-14990859 ]

Andrew Musselman commented on MAHOUT-1626:
------------------------------------------

Punt to 0.12.0?

Gokhan, will you be able to pick this back up?
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 0.11.1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2015-11-05 23:20:27 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1626:
----------------------------------
Fix Version/s: (was: 0.11.1)
1.0.0
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 1.0.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2015-11-05 23:20:27 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992694#comment-14992694 ]

Suneel Marthi commented on MAHOUT-1626:
---------------------------------------

Moving this to the "Ultimate" release :)
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 1.0.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-09-10 02:58:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15479002#comment-15479002 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user andrewmusselman commented on the issue:

https://github.com/apache/mahout/pull/62
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 1.0.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-09-12 18:53:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484955#comment-15484955 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user dlyubimov commented on the issue:

https://github.com/apache/mahout/pull/62
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 1.0.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-09-12 19:06:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi resolved MAHOUT-1626.
-----------------------------------
Resolution: Won't Fix
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 1.0.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-09-12 20:50:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485257#comment-15485257 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user sscdotopen commented on the issue:

https://github.com/apache/mahout/pull/62
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 1.0.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-09-13 19:14:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488131#comment-15488131 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/62
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 1.0.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-09-19 07:46:21 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502615#comment-15502615 ]

ASF GitHub Bot commented on MAHOUT-1626:
----------------------------------------

Github user gcapan commented on the issue:

https://github.com/apache/mahout/pull/62

Yeah, the required primitives are now included. It is fine to close this.
Post by Gokhan Capan (JIRA)
Support for required quasi-algebraic operations and starting with aggregating rows/blocks
-----------------------------------------------------------------------------------------
Key: MAHOUT-1626
URL: https://issues.apache.org/jira/browse/MAHOUT-1626
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: 0.10.0
Reporter: Gokhan Capan
Assignee: Gokhan Capan
Labels: DSL, scala, spark
Fix For: 1.0.0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Loading...