Discussion:
[jira] [Assigned] (MAHOUT-1866) Add matrix-to-tsv string function
Suneel Marthi (JIRA)
2016-05-29 15:27:13 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi reassigned MAHOUT-1866:
-------------------------------------

Assignee: Suneel Marthi
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-05-29 15:27:13 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on MAHOUT-1866 started by Suneel Marthi.
---------------------------------------------
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-05-29 16:28:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305980#comment-15305980 ]

ASF GitHub Bot commented on MAHOUT-1866:
----------------------------------------

GitHub user smarthi opened a pull request:

https://github.com/apache/mahout/pull/237

MAHOUT-1866: Add matrix-to-tsv string function



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/smarthi/mahout MAHOUT-1866

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #237

----
commit 16dbcdd63e457c8a1956394a9fefdc5aae3f2d18
Author: smarthi <***@apache.org>
Date: 2016-05-29T16:27:50Z

MAHOUT-1866: Add matrix-to-tsv string function

----
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-05-29 16:34:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1866:
----------------------------------
Affects Version/s: 0.12.1
Component/s: visiualization
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Components: visiualization
Affects Versions: 0.12.1
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-05-29 16:53:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305990#comment-15305990 ]

ASF GitHub Bot commented on MAHOUT-1866:
----------------------------------------

Github user andrewpalumbo commented on a diff in the pull request:

https://github.com/apache/mahout/pull/237#discussion_r65007959

--- Diff: math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala ---
@@ -148,6 +148,42 @@ package object drm {
def drmSampleKRows[K](drmX: DrmLike[K], numSamples: Int, replacement: Boolean = false): Matrix =
drmX.context.engine.drmSampleKRows(drmX, numSamples, replacement)

+ /**
+ * Convert a sampled DRM into a Tab Separated Vector (TSV) to be loaded into an R-DataFrame
+ * for plotting and sketching
+ * @param drmX - DRM
+ * @param samplePercent - Percentage of Sample elements from the DRM to be fished out for plotting
+ * @tparam K
+ * @return TSV String
+ */
+ def sampleMatrixToTSV[K](drmX: DrmLike[K], samplePercent: Double = 1): String = {
+
--- End diff --

Minor point: maybe rename to `drmSampleToTSV` or something along those lines so that it is obvious that its a DRM and not a Matrix? Other than that +1 from me.
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Components: visiualization
Affects Versions: 0.12.1
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
ASF GitHub Bot (JIRA)
2016-05-29 17:07:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305995#comment-15305995 ]

ASF GitHub Bot commented on MAHOUT-1866:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/237
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Components: visiualization
Affects Versions: 0.12.1
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-05-29 17:08:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi resolved MAHOUT-1866.
-----------------------------------
Resolution: Implemented
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Components: visiualization
Affects Versions: 0.12.1
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Hudson (JIRA)
2016-05-29 18:06:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306012#comment-15306012 ]

Hudson commented on MAHOUT-1866:
--------------------------------

SUCCESS: Integrated in Mahout-Quality #3360 (See [https://builds.apache.org/job/Mahout-Quality/3360/])
MAHOUT-1866: Add matrix-to-tsv string function, this closes (smarthi: rev 8f4ee88fb40710d983ea3fb6ad008317f6c00936)
* math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Components: visiualization
Affects Versions: 0.12.1
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Trevor Grant (JIRA)
2016-05-29 03:13:13 UTC
Permalink
Trevor Grant created MAHOUT-1866:
------------------------------------

Summary: Add matrix-to-tsv string funciton
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Reporter: Trevor Grant


Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager

It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.

Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Trevor Grant (JIRA)
2016-05-29 03:15:12 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant updated MAHOUT-1866:
---------------------------------
Summary: Add matrix-to-tsv string function (was: Add matrix-to-tsv string funciton)
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Reporter: Trevor Grant
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Loading...