Discussion:
[jira] [Created] (MAHOUT-1936) FactorMap finds column maximums incorrectly on large data sets
Trevor Grant (JIRA)
2017-02-03 16:45:52 UTC
Permalink
Trevor Grant created MAHOUT-1936:
------------------------------------

Summary: FactorMap finds column maximums incorrectly on large data sets
Key: MAHOUT-1936
URL: https://issues.apache.org/jira/browse/MAHOUT-1936
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.0
Reporter: Trevor Grant
Fix For: 0.13.0


FactorMap's fit method does not properly find the maximum of the column.

Likely due to an impropper allreduceBlock here
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40

Also, factorMap in this instance might be more appropriately named "factorMax"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-04 05:53:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852612#comment-15852612 ]

ASF GitHub Bot commented on MAHOUT-1936:
----------------------------------------

GitHub user rawkintrevo opened a pull request:

https://github.com/apache/mahout/pull/278

MAHOUT-1936 fix AsFactor allReduce block

Issue in AsFactor fit method was max was being found in "map" phase, not reduce phase.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rawkintrevo/mahout mahout-1936

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #278

----
commit 14c795c9b2ab0868e5acc281e9ce5f9710534df0
Author: rawkintrevo <***@gmail.com>
Date: 2017-02-04T05:50:07Z

MAHOUT-1936 fix AsFactor allReduce block

----
Post by Trevor Grant (JIRA)
FactorMap finds column maximums incorrectly on large data sets
--------------------------------------------------------------
Key: MAHOUT-1936
URL: https://issues.apache.org/jira/browse/MAHOUT-1936
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.0
Reporter: Trevor Grant
Fix For: 0.13.0
FactorMap's fit method does not properly find the maximum of the column.
Likely due to an impropper allreduceBlock here
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40
Also, factorMap in this instance might be more appropriately named "factorMax"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-04 06:09:52 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant reassigned MAHOUT-1936:
------------------------------------

Assignee: Trevor Grant
Post by Trevor Grant (JIRA)
FactorMap finds column maximums incorrectly on large data sets
--------------------------------------------------------------
Key: MAHOUT-1936
URL: https://issues.apache.org/jira/browse/MAHOUT-1936
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.0
Reporter: Trevor Grant
Assignee: Trevor Grant
Fix For: 0.13.0
FactorMap's fit method does not properly find the maximum of the column.
Likely due to an impropper allreduceBlock here
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40
Also, factorMap in this instance might be more appropriately named "factorMax"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-04 23:14:52 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852977#comment-15852977 ]

ASF GitHub Bot commented on MAHOUT-1936:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

https://github.com/apache/mahout/pull/278#discussion_r99477362

--- Diff: math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala ---
@@ -38,11 +38,13 @@ class AsFactor extends PreprocessorFitter {

import org.apache.mahout.math.function.VectorFunction
val factorMap = input.allreduceBlock(
- { case (keys, block: Matrix) =>
+ { case (keys, block: Matrix) => block },
+ { case (oldM: Matrix, newM: Matrix) =>
// someday we'll replace this with block.max: Vector
// or better yet- block.distinct
- dense(block.aggregateColumns( new VectorFunction {
- def apply(f: Vector): Double = f.max
+
+ dense((oldM rbind newM).aggregateColumns( new VectorFunction {
--- End diff --

+1
Post by Trevor Grant (JIRA)
FactorMap finds column maximums incorrectly on large data sets
--------------------------------------------------------------
Key: MAHOUT-1936
URL: https://issues.apache.org/jira/browse/MAHOUT-1936
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.0
Reporter: Trevor Grant
Assignee: Trevor Grant
Fix For: 0.13.0
FactorMap's fit method does not properly find the maximum of the column.
Likely due to an impropper allreduceBlock here
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40
Also, factorMap in this instance might be more appropriately named "factorMax"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-04 23:16:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852979#comment-15852979 ]

ASF GitHub Bot commented on MAHOUT-1936:
----------------------------------------

Github user andrewpalumbo commented on the issue:

https://github.com/apache/mahout/pull/278

+1 LGTM - Do we have any unit tests for the pipeline?
Post by Trevor Grant (JIRA)
FactorMap finds column maximums incorrectly on large data sets
--------------------------------------------------------------
Key: MAHOUT-1936
URL: https://issues.apache.org/jira/browse/MAHOUT-1936
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.0
Reporter: Trevor Grant
Assignee: Trevor Grant
Fix For: 0.13.0
FactorMap's fit method does not properly find the maximum of the column.
Likely due to an impropper allreduceBlock here
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40
Also, factorMap in this instance might be more appropriately named "factorMax"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-02-07 04:06:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855282#comment-15855282 ]

ASF GitHub Bot commented on MAHOUT-1936:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/278
Post by Trevor Grant (JIRA)
FactorMap finds column maximums incorrectly on large data sets
--------------------------------------------------------------
Key: MAHOUT-1936
URL: https://issues.apache.org/jira/browse/MAHOUT-1936
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.0
Reporter: Trevor Grant
Assignee: Trevor Grant
Fix For: 0.13.0
FactorMap's fit method does not properly find the maximum of the column.
Likely due to an impropper allreduceBlock here
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40
Also, factorMap in this instance might be more appropriately named "factorMax"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-02-07 04:08:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant resolved MAHOUT-1936.
----------------------------------
Resolution: Fixed
Post by Trevor Grant (JIRA)
FactorMap finds column maximums incorrectly on large data sets
--------------------------------------------------------------
Key: MAHOUT-1936
URL: https://issues.apache.org/jira/browse/MAHOUT-1936
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.0
Reporter: Trevor Grant
Assignee: Trevor Grant
Fix For: 0.13.0
FactorMap's fit method does not properly find the maximum of the column.
Likely due to an impropper allreduceBlock here
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40
Also, factorMap in this instance might be more appropriately named "factorMax"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Hudson (JIRA)
2017-02-07 04:32:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855309#comment-15855309 ]

Hudson commented on MAHOUT-1936:
--------------------------------

FAILURE: Integrated in Jenkins build Mahout-Quality #3416 (See [https://builds.apache.org/job/Mahout-Quality/3416/])
MAHOUT-1936 fix AsFactor allReduce block closes apache/mahout#278 (rawkintrevo: rev 60bb751926524b62be52f9b4c9d1c70d735a0afc)
* (edit) math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala
Post by Trevor Grant (JIRA)
FactorMap finds column maximums incorrectly on large data sets
--------------------------------------------------------------
Key: MAHOUT-1936
URL: https://issues.apache.org/jira/browse/MAHOUT-1936
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.0
Reporter: Trevor Grant
Assignee: Trevor Grant
Fix For: 0.13.0
FactorMap's fit method does not properly find the maximum of the column.
Likely due to an impropper allreduceBlock here
https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala#L40
Also, factorMap in this instance might be more appropriately named "factorMax"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Loading...