Suneel Marthi (JIRA)
2016-05-29 15:27:13 UTC
[ https://issues.apache.org/jira/browse/MAHOUT-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suneel Marthi reassigned MAHOUT-1866:
-------------------------------------
Assignee: Suneel Marthi
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi reassigned MAHOUT-1866:
-------------------------------------
Assignee: Suneel Marthi
Add matrix-to-tsv string function
---------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
-----------------------------------
Key: MAHOUT-1866
URL: https://issues.apache.org/jira/browse/MAHOUT-1866
Project: Mahout
Issue Type: Sub-task
Reporter: Trevor Grant
Assignee: Suneel Marthi
Fix For: 0.13.0
Need a function to convert a matrix to a tsv string which can then be plotted by
- Zeppelin %table visualization packages
- Passed to R / Python via Zeppelin Resource Manager
It has been noted that a matrix can be registered as an RDD and passed across contexts directly in Spark, however this breaks the 'backend agnoistic' philosophy. Until H20 and Flink also both support Python / R environments it is more reasonable to use tab-seperated-value strings.
Further, matrices might be extremely large and unfit for being directly converted to tsvs. It may be wise to introduce some sort of safety valve for preventing excessively large matrices from being materialized into local memory (eg. supposing the user hasn't called their own sampling method on a matrix).
This message was sent by Atlassian JIRA
(v6.3.4#6332)