Discussion:
[jira] [Created] (MAHOUT-1884) Allow specification of dimensions of a DRM
Sebastian Schelter (JIRA)
2016-10-03 06:51:20 UTC
Permalink
Sebastian Schelter created MAHOUT-1884:
------------------------------------------

Summary: Allow specification of dimensions of a DRM
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor


Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.

In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov
2016-10-03 21:00:05 UTC
Permalink
this has been covered by drwWrap() signature from the very beginning.
I vote this as non-issue.
Post by Sebastian Schelter (JIRA)
------------------------------------------
Summary: Allow specification of dimensions of a DRM
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Currently, in many cases, a DRM must be read to compute its dimensions
when a user calls nrow or ncol. This also implicitly caches the
corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when
the matrices are synthetically generated, or when some metadata about them
is known). In such cases, the user should be able to specify the dimensions
upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2016-10-03 21:12:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543437#comment-15543437 ]

Dmitriy Lyubimov commented on MAHOUT-1884:
------------------------------------------



Which api is this about specifically?

wrapping existing RDD (drmWrap() api) supports this.

Also note that for drms off disk, these are one-pass computations that are of cost no more than RDD$count(). Since for any dataset we call dfsRead(), the obvious intent is to use it, loading & caching is not doing any harm as that's what would happen anyway.

also, matrix dimensions are the most obvious ones but not everything that optimizer may need to analyze about the dataset (lazily). There are more heuristics about datasets that drmWrap() accepts (and even more that it doesn't).

if we are talking about cases where drmWrap() cannot be used for some reason, we probably should request metadata equivalent to what drmWrap() does, not just ncol, nrow.
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Sebastian Schelter (JIRA)
2016-10-04 17:47:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546112#comment-15546112 ]

Sebastian Schelter commented on MAHOUT-1884:
--------------------------------------------

I know that this is already supported internally, I want to expose it as optional parameters to drmDfsRead. I disagree that caching an input matrix to read is always intended by the users, at least I want to be able to retain control over what is cached and what not.
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2016-10-04 21:07:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546663#comment-15546663 ]

Dmitriy Lyubimov commented on MAHOUT-1884:
------------------------------------------

drmWrap is not internal in the least (which is why it is not package-private). it is public and intended for plugging external general sources into input barrier of the optimizer/

loading in memory would happen anyway. Caching is not necessarily -- but it is not guaranteed not to happen, there's no such contract.

Materially it only makes difference if the input is larger than avaialble cluster capacity. Which is I am yet to encounter as algebraic tasks are CPU and io bound, but not memory. Usually we run out of IO and CPU much sooner that we run out of memory, which makes this situation pragmatically unrealistic.

note that optimizer should --and will -- retain control over caching. we don't have explicit caching api except for checkpoint "hints" but even that is only a hint, not guaranteed. Giving it some heuristics about dataset doesn't guarantee that it won't compute others or won't cache or sample for some other reason, now or in the future.

This siutation is fine as it is one of the function of optimizer, as much as choosing degrees of parallelization, product task sizes or operators to execute. Making those choices automatically is, actually, the point. As long as optimizer does right enough things, that should be ok.

Bottom line, i don't see harm in adding ncol and nrow to drmDfsRead specifically. There's possibly only a slight benefit right now (no no-cache or no-sample guarantee), which likely only decrease in the future. I am fine with it as understood there's no "no-cache" contract anywhere.
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2016-10-04 21:09:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546663#comment-15546663 ]

Dmitriy Lyubimov edited comment on MAHOUT-1884 at 10/4/16 9:08 PM:
-------------------------------------------------------------------

drmWrap is not internal in the least (which is why it is not package-private). it is public and intended for plugging external general sources into input barrier of the optimizer/

loading in memory would happen anyway. Caching is not necessarily -- but it is not guaranteed not to happen, there's no such contract.

Materially it only makes difference if the input is larger than avaialble cluster capacity. Which is I am yet to encounter as algebraic tasks are CPU and io bound, but not memory. Usually we run out of IO and CPU much sooner that we run out of memory, which makes this situation pragmatically unrealistic.

note that optimizer should --and will -- retain control over caching. we don't have explicit caching api except for checkpoint "hints" but even that is only a hint, not guaranteed. Giving it some heuristics about dataset doesn't guarantee that it won't compute others or won't cache or sample for some other reason, now or in the future.

This siutation is fine as it is one of the function of optimizer, as much as choosing degrees of parallelization, product task sizes or operators to execute. Making those choices automatically is, actually, the point. As long as optimizer does right enough things, that should be ok.

Bottom line, i don't see harm in adding _optional_ ncol and nrow to drmDfsRead specifically. But I do not see a tangible benefit either. There's possibly only a slight benefit right now (no no-cache or no-sample guarantee), which likely only decrease in the future. I am fine with it as understood there's no "no-cache" contract anywhere.



was (Author: dlyubimov):
drmWrap is not internal in the least (which is why it is not package-private). it is public and intended for plugging external general sources into input barrier of the optimizer/

loading in memory would happen anyway. Caching is not necessarily -- but it is not guaranteed not to happen, there's no such contract.

Materially it only makes difference if the input is larger than avaialble cluster capacity. Which is I am yet to encounter as algebraic tasks are CPU and io bound, but not memory. Usually we run out of IO and CPU much sooner that we run out of memory, which makes this situation pragmatically unrealistic.

note that optimizer should --and will -- retain control over caching. we don't have explicit caching api except for checkpoint "hints" but even that is only a hint, not guaranteed. Giving it some heuristics about dataset doesn't guarantee that it won't compute others or won't cache or sample for some other reason, now or in the future.

This siutation is fine as it is one of the function of optimizer, as much as choosing degrees of parallelization, product task sizes or operators to execute. Making those choices automatically is, actually, the point. As long as optimizer does right enough things, that should be ok.

Bottom line, i don't see harm in adding ncol and nrow to drmDfsRead specifically. There's possibly only a slight benefit right now (no no-cache or no-sample guarantee), which likely only decrease in the future. I am fine with it as understood there's no "no-cache" contract anywhere.
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2016-10-04 21:10:22 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546663#comment-15546663 ]

Dmitriy Lyubimov edited comment on MAHOUT-1884 at 10/4/16 9:09 PM:
-------------------------------------------------------------------

drmWrap is not internal in the least (which is why it is not package-private). it is public and intended for plugging external general sources into input barrier of the optimizer/

loading in memory would happen anyway. Caching is not necessarily -- but it is not guaranteed not to happen, there's no such contract.

Materially it only makes any difference if the input is larger than avaialble cluster capacity. Which is I am yet to encounter as algebraic tasks are CPU and io bound, but not memory. Usually we run out of IO and CPU much sooner that we run out of memory, which makes this situation pragmatically unrealistic.

note that optimizer should --and will -- retain control over caching. we don't have explicit caching api except for checkpoint "hints" but even that is only a hint, not guaranteed. Giving it some heuristics about dataset doesn't guarantee that it won't compute others or won't cache or sample for some other reason, now or in the future.

This siutation is fine as it is one of the function of optimizer, as much as choosing degrees of parallelization, product task sizes or operators to execute. Making those choices automatically is, actually, the point. As long as optimizer does right enough things, that should be ok.

Bottom line, i don't see harm in adding _optional_ ncol and nrow to drmDfsRead specifically. But I do not see a tangible benefit either. There's possibly only a slight benefit right now (no no-cache or no-sample guarantee), which likely only decrease in the future. I am fine with it as understood there's no "no-cache" contract anywhere.



was (Author: dlyubimov):
drmWrap is not internal in the least (which is why it is not package-private). it is public and intended for plugging external general sources into input barrier of the optimizer/

loading in memory would happen anyway. Caching is not necessarily -- but it is not guaranteed not to happen, there's no such contract.

Materially it only makes difference if the input is larger than avaialble cluster capacity. Which is I am yet to encounter as algebraic tasks are CPU and io bound, but not memory. Usually we run out of IO and CPU much sooner that we run out of memory, which makes this situation pragmatically unrealistic.

note that optimizer should --and will -- retain control over caching. we don't have explicit caching api except for checkpoint "hints" but even that is only a hint, not guaranteed. Giving it some heuristics about dataset doesn't guarantee that it won't compute others or won't cache or sample for some other reason, now or in the future.

This siutation is fine as it is one of the function of optimizer, as much as choosing degrees of parallelization, product task sizes or operators to execute. Making those choices automatically is, actually, the point. As long as optimizer does right enough things, that should be ok.

Bottom line, i don't see harm in adding _optional_ ncol and nrow to drmDfsRead specifically. But I do not see a tangible benefit either. There's possibly only a slight benefit right now (no no-cache or no-sample guarantee), which likely only decrease in the future. I am fine with it as understood there's no "no-cache" contract anywhere.
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2016-10-04 21:11:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546663#comment-15546663 ]

Dmitriy Lyubimov edited comment on MAHOUT-1884 at 10/4/16 9:10 PM:
-------------------------------------------------------------------

drmWrap is not internal in the least (which is why it is not package-private). it is public and intended for plugging external general sources into input barrier of the optimizer/

loading in memory would happen anyway. Caching is not necessarily -- but it is not guaranteed not to happen, there's no such contract.

Materially it only makes any difference if the input is larger than avaialble cluster capacity. Which is I am yet to encounter as algebraic tasks are CPU and io bound, but not memory. Usually we run out of IO and CPU much sooner that we run out of memory, which makes this situation pragmatically unrealistic.

note that optimizer should --and will -- retain control over caching. we don't have explicit caching api except for checkpoint "hints" but even that is only a hint, not guaranteed. Giving it some heuristics about dataset doesn't guarantee that it won't compute others or won't cache or sample for some other reason, now or in the future.

This siutation is fine as it is one of the function of optimizer, as much as choosing degrees of parallelization, product task sizes or operators to execute. Making those choices automatically is, actually, the point. As long as optimizer does right enough things, that should be ok.

Bottom line, i don't see harm in adding _optional_ ncol and nrow to drmDfsRead specifically. But I do not see a tangible benefit either. There's possibly only a slight benefit right now (no no-cache or no-sample guarantee), which likely only decrease in the future. I am fine with it as understood there's no "no-cache" contract anywhere.



was (Author: dlyubimov):
drmWrap is not internal in the least (which is why it is not package-private). it is public and intended for plugging external general sources into input barrier of the optimizer/

loading in memory would happen anyway. Caching is not necessarily -- but it is not guaranteed not to happen, there's no such contract.

Materially it only makes any difference if the input is larger than avaialble cluster capacity. Which is I am yet to encounter as algebraic tasks are CPU and io bound, but not memory. Usually we run out of IO and CPU much sooner that we run out of memory, which makes this situation pragmatically unrealistic.

note that optimizer should --and will -- retain control over caching. we don't have explicit caching api except for checkpoint "hints" but even that is only a hint, not guaranteed. Giving it some heuristics about dataset doesn't guarantee that it won't compute others or won't cache or sample for some other reason, now or in the future.

This siutation is fine as it is one of the function of optimizer, as much as choosing degrees of parallelization, product task sizes or operators to execute. Making those choices automatically is, actually, the point. As long as optimizer does right enough things, that should be ok.

Bottom line, i don't see harm in adding _optional_ ncol and nrow to drmDfsRead specifically. But I do not see a tangible benefit either. There's possibly only a slight benefit right now (no no-cache or no-sample guarantee), which likely only decrease in the future. I am fine with it as understood there's no "no-cache" contract anywhere.
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-10-14 22:49:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1884:
----------------------------------
Fix Version/s: 0.13.0
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Fix For: 0.13.0
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-12-20 18:48:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1884:
-----------------------------------
Sprint: Jan/Feb-2017
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Fix For: 0.13.0
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-01-16 02:28:28 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1884:
-----------------------------------
Sprint: Jan/Feb-2016 (was: Jan/Feb-2017)
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Fix For: 0.13.0
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-01-16 02:28:29 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1884:
-----------------------------------
Sprint: Jan/Feb-2017 (was: Jan/Feb-2016)
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Fix For: 0.13.0
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-01-16 03:28:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823402#comment-15823402 ]

Andrew Palumbo commented on MAHOUT-1884:
----------------------------------------

[~ssc], [~dlyubimov] this seems like a good candidate to bump to 0.13.1- I'll Assign to myself for now.. LMK if this is something that you'd like to get into 0.13.0.
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
Fix For: 0.13.0
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-01-16 03:29:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo reassigned MAHOUT-1884:
--------------------------------------

Assignee: Andrew Palumbo (was: Sebastian Schelter)
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Andrew Palumbo
Priority: Minor
Fix For: 0.13.0
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2017-02-01 22:52:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1884:
-----------------------------------
Fix Version/s: (was: 0.13.0)
0.13.1
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Andrew Palumbo
Priority: Minor
Fix For: 0.13.1
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Andrew Palumbo (JIRA)
2017-02-01 22:54:52 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Palumbo updated MAHOUT-1884:
-----------------------------------
Sprint: (was: Jan/Feb-2017)
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Andrew Palumbo
Priority: Minor
Fix For: 0.13.1
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-06-23 04:23:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant updated MAHOUT-1884:
---------------------------------
Fix Version/s: (was: 0.13.1)
0.13.2
Post by Sebastian Schelter (JIRA)
Allow specification of dimensions of a DRM
------------------------------------------
Key: MAHOUT-1884
URL: https://issues.apache.org/jira/browse/MAHOUT-1884
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.12.2
Reporter: Sebastian Schelter
Assignee: Andrew Palumbo
Priority: Minor
Fix For: 0.13.2
Currently, in many cases, a DRM must be read to compute its dimensions when a user calls nrow or ncol. This also implicitly caches the corresponding DRM.
In some cases, the user actually knows the matrix dimensions (e.g., when the matrices are synthetically generated, or when some metadata about them is known). In such cases, the user should be able to specify the dimensions upon creating the DRM and the caching should be avoided.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Loading...