Discussion:
[jira] [Created] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup
Pat Ferrel (JIRA)
2015-11-06 23:45:10 UTC
Permalink
Pat Ferrel created MAHOUT-1788:
----------------------------------

Summary: spark-itemsimilarity integration test script cleanup
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Bug
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 0.12.0


binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.

Clean this up so it copies data if needed and the data is in both versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Pat Ferrel (JIRA)
2016-03-17 15:43:33 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pat Ferrel updated MAHOUT-1788:
-------------------------------
Fix Version/s: (was: 0.12.0)
1.0.0
Issue Type: Improvement (was: Bug)

work on this as time is available, not blocking anything IMO
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.
Clean this up so it copies data if needed and the data is in both versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
shashi bushan dongur (JIRA)
2016-03-30 16:05:25 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216 ]

shashi bushan dongur commented on MAHOUT-1788:
----------------------------------------------

Hello. I would like to start contributing to mahout. Can I work on this issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.
Clean this up so it copies data if needed and the data is in both versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov
2016-03-30 17:12:37 UTC
Permalink
Oh but of course! please do!

You may work on any issue, this or any other of your choice, or even on any
new issue you can think of (for sizeable contributions it is recommended to
start discussion on the @dev list first though, to make sure to benefit
from experience of others. Please file any new issue first to jira).

On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by shashi bushan dongur (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on this issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith
binary nor source versions will run on a cluster unless data is hand copied
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Khurrum Nasim
2016-03-30 18:43:00 UTC
Permalink
Thanks Dimirtry.

I take a look at see where I can start pitching in. Do I need contributor access ? how would I create feature branch of my work ?

Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or even on any
new issue you can think of (for sizeable contributions it is recommended to
from experience of others. Please file any new issue first to jira).
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by shashi bushan dongur (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on this issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith
binary nor source versions will run on a cluster unless data is hand copied
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov
2016-03-30 18:52:26 UTC
Permalink
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need contributor
access ? how would I create feature branch of my work ?
Khurrum,

you only need github account. What you need is to create mahout's master
fork in your github space and keep it in sync, as possible, with master as
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.

At any point in time (I recommend at perhaps when you feel you are about 50
to 70% done or just need a code advice), you can create a github pull
request to the apache/mahout master. Make sure to include MAHOUT-XXX issue
in the head of the pull request, that way ASF will automatically propagate
code comments to jira, and so all discussion can be done entirely on github.

Again, if you take on a signficant contribution (such as a new numerical
method contribution), I recommend to discuss the proposal on the @dev list

thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or even on
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is recommended
to
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to jira).
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on this issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith
binary nor source versions will run on a cluster unless data is hand
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov
2016-03-30 18:57:06 UTC
Permalink
PS You may also want to sign up with ASF Jira so we can assign issues to
yourself.
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work ?
Khurrum,
you only need github account. What you need is to create mahout's master
fork in your github space and keep it in sync, as possible, with master as
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are about
50 to 70% done or just need a code advice), you can create a github pull
request to the apache/mahout master. Make sure to include MAHOUT-XXX issue
in the head of the pull request, that way ASF will automatically propagate
code comments to jira, and so all discussion can be done entirely on github.
Again, if you take on a signficant contribution (such as a new numerical
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or even on
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to jira).
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on this issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith
binary nor source versions will run on a cluster unless data is hand
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Khurrum Nasim
2016-03-30 19:05:42 UTC
Permalink
Thanks for the advice Dimitry. I’m already signed up on ASF jira. My handle is “nasimk”

Do I need to be a linear algebra expert and or math phd to contribute ?
I have 10 plus years of computer programming experience. my background is comp sci.

Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign issues to
yourself.
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work ?
Khurrum,
you only need github account. What you need is to create mahout's master
fork in your github space and keep it in sync, as possible, with master as
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are about
50 to 70% done or just need a code advice), you can create a github pull
request to the apache/mahout master. Make sure to include MAHOUT-XXX issue
in the head of the pull request, that way ASF will automatically propagate
code comments to jira, and so all discussion can be done entirely on github.
Again, if you take on a signficant contribution (such as a new numerical
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or even on
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to jira).
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on this issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith
binary nor source versions will run on a cluster unless data is hand
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi
2016-03-30 19:10:54 UTC
Permalink
Thanks Khurrum for stepping up.

You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra stuff.


Welcome aboard !!
Post by Khurrum Nasim
Thanks for the advice Dimitry. I’m already signed up on ASF jira. My
handle is “nasimk”
Do I need to be a linear algebra expert and or math phd to contribute ?
I have 10 plus years of computer programming experience. my background is comp sci.
Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign issues to
yourself.
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work ?
Khurrum,
you only need github account. What you need is to create mahout's master
fork in your github space and keep it in sync, as possible, with master
as
Post by Dmitriy Lyubimov
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are about
50 to 70% done or just need a code advice), you can create a github pull
request to the apache/mahout master. Make sure to include MAHOUT-XXX
issue
Post by Dmitriy Lyubimov
in the head of the pull request, that way ASF will automatically
propagate
Post by Dmitriy Lyubimov
code comments to jira, and so all discussion can be done entirely on
github.
Post by Dmitriy Lyubimov
Again, if you take on a signficant contribution (such as a new numerical
list
Post by Dmitriy Lyubimov
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or even
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
benefit
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to jira).
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on
this
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
https://issues.apache.org/jira/browse/MAHOUT-1788
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
Post by Pat Ferrel (JIRA)
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith
binary nor source versions will run on a cluster unless data is hand
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Khurrum Nasim
2016-03-31 15:28:42 UTC
Permalink
Thanks everyone - I’m glad to be a part of this.

Khurrum
Post by Suneel Marthi
Thanks Khurrum for stepping up.
You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra stuff.
Welcome aboard !!
Thanks for the advice Dimitry. I’m already signed up on ASF jira. My
handle is “nasimk”
Do I need to be a linear algebra expert and or math phd to contribute ?
I have 10 plus years of computer programming experience. my background is comp sci.
Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign issues to
yourself.
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work ?
Khurrum,
you only need github account. What you need is to create mahout's master
fork in your github space and keep it in sync, as possible, with master
as
Post by Dmitriy Lyubimov
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are about
50 to 70% done or just need a code advice), you can create a github pull
request to the apache/mahout master. Make sure to include MAHOUT-XXX
issue
Post by Dmitriy Lyubimov
in the head of the pull request, that way ASF will automatically
propagate
Post by Dmitriy Lyubimov
code comments to jira, and so all discussion can be done entirely on
github.
Post by Dmitriy Lyubimov
Again, if you take on a signficant contribution (such as a new numerical
list
Post by Dmitriy Lyubimov
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or even
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
benefit
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to jira).
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on
this
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
https://issues.apache.org/jira/browse/MAHOUT-1788
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
Post by Pat Ferrel (JIRA)
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith
binary nor source versions will run on a cluster unless data is hand
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Khurrum Nasim
2016-04-18 18:49:47 UTC
Permalink
Hi Guys,

Can Mahout be used for things like face detection ? Also which unit tests or integration tests do you recommend I should run just to get a better feel of the execution flow.

I’m still slowly acclimating to the project. But hopefully should come up to speed soon.


Many Thanks,

Khurrum
Post by Suneel Marthi
Thanks Khurrum for stepping up.
You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra stuff.
Welcome aboard !!
Thanks for the advice Dimitry. I’m already signed up on ASF jira. My
handle is “nasimk”
Do I need to be a linear algebra expert and or math phd to contribute ?
I have 10 plus years of computer programming experience. my background is comp sci.
Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign issues to
yourself.
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work ?
Khurrum,
you only need github account. What you need is to create mahout's master
fork in your github space and keep it in sync, as possible, with master
as
Post by Dmitriy Lyubimov
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are about
50 to 70% done or just need a code advice), you can create a github pull
request to the apache/mahout master. Make sure to include MAHOUT-XXX
issue
Post by Dmitriy Lyubimov
in the head of the pull request, that way ASF will automatically
propagate
Post by Dmitriy Lyubimov
code comments to jira, and so all discussion can be done entirely on
github.
Post by Dmitriy Lyubimov
Again, if you take on a signficant contribution (such as a new numerical
list
Post by Dmitriy Lyubimov
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or even
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
benefit
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to jira).
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on
this
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
https://issues.apache.org/jira/browse/MAHOUT-1788
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
Post by Pat Ferrel (JIRA)
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith
binary nor source versions will run on a cluster unless data is hand
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov
2016-04-19 00:18:34 UTC
Permalink
Khurrum,

mahout is so much a library at this point.

if you mean if it can be used to build networks with 2d inputs, yes i did
some of that. multi-epoch SGD based systems should be easy enough to build,
and will probably have a reasonable performance -- although I think
dedicated CNN systems like Caffe would still run faster at this point. Full
batch trainers are somewhat slow for larger problems though, my
investigation points that there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.
Post by Khurrum Nasim
Hi Guys,
Can Mahout be used for things like face detection ? Also which unit
tests or integration tests do you recommend I should run just to get a
better feel of the execution flow.
I’m still slowly acclimating to the project. But hopefully should come up
to speed soon.
Many Thanks,
Khurrum
Post by Suneel Marthi
Thanks Khurrum for stepping up.
You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra stuff.
Welcome aboard !!
Post by Khurrum Nasim
Thanks for the advice Dimitry. I’m already signed up on ASF jira. My
handle is “nasimk”
Do I need to be a linear algebra expert and or math phd to contribute ?
I have 10 plus years of computer programming experience. my background
is
Post by Suneel Marthi
Post by Khurrum Nasim
comp sci.
Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign issues
to
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
yourself.
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work ?
Khurrum,
you only need github account. What you need is to create mahout's
master
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
fork in your github space and keep it in sync, as possible, with
master
Post by Suneel Marthi
Post by Khurrum Nasim
as
Post by Dmitriy Lyubimov
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are
about
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
50 to 70% done or just need a code advice), you can create a github
pull
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
request to the apache/mahout master. Make sure to include MAHOUT-XXX
issue
Post by Dmitriy Lyubimov
in the head of the pull request, that way ASF will automatically
propagate
Post by Dmitriy Lyubimov
code comments to jira, and so all discussion can be done entirely on
github.
Post by Dmitriy Lyubimov
Again, if you take on a signficant contribution (such as a new
numerical
Post by Suneel Marthi
Post by Khurrum Nasim
list
Post by Dmitriy Lyubimov
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or even
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
benefit
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to jira).
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on
this
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
https://issues.apache.org/jira/browse/MAHOUT-1788
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
Post by Pat Ferrel (JIRA)
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests,
neith
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
binary nor source versions will run on a cluster unless data is
hand
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov
2016-04-19 00:19:00 UTC
Permalink
i meant "not so much a library"
Post by Dmitriy Lyubimov
Khurrum,
mahout is so much a library at this point.
if you mean if it can be used to build networks with 2d inputs, yes i did
some of that. multi-epoch SGD based systems should be easy enough to build,
and will probably have a reasonable performance -- although I think
dedicated CNN systems like Caffe would still run faster at this point. Full
batch trainers are somewhat slow for larger problems though, my
investigation points that there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.
Post by Khurrum Nasim
Hi Guys,
Can Mahout be used for things like face detection ? Also which unit
tests or integration tests do you recommend I should run just to get a
better feel of the execution flow.
I’m still slowly acclimating to the project. But hopefully should come
up to speed soon.
Many Thanks,
Khurrum
Post by Suneel Marthi
Thanks Khurrum for stepping up.
You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra
stuff.
Post by Suneel Marthi
Welcome aboard !!
On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks for the advice Dimitry. I’m already signed up on ASF jira.
My
Post by Suneel Marthi
Post by Khurrum Nasim
handle is “nasimk”
Do I need to be a linear algebra expert and or math phd to contribute
?
Post by Suneel Marthi
Post by Khurrum Nasim
I have 10 plus years of computer programming experience. my
background is
Post by Suneel Marthi
Post by Khurrum Nasim
comp sci.
Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign issues
to
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
yourself.
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work ?
Khurrum,
you only need github account. What you need is to create mahout's
master
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
fork in your github space and keep it in sync, as possible, with
master
Post by Suneel Marthi
Post by Khurrum Nasim
as
Post by Dmitriy Lyubimov
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are
about
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
50 to 70% done or just need a code advice), you can create a github
pull
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
request to the apache/mahout master. Make sure to include MAHOUT-XXX
issue
Post by Dmitriy Lyubimov
in the head of the pull request, that way ASF will automatically
propagate
Post by Dmitriy Lyubimov
code comments to jira, and so all discussion can be done entirely on
github.
Post by Dmitriy Lyubimov
Again, if you take on a signficant contribution (such as a new
numerical
Post by Suneel Marthi
Post by Khurrum Nasim
list
Post by Dmitriy Lyubimov
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or
even
Post by Suneel Marthi
Post by Khurrum Nasim
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
benefit
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to
jira).
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on
this
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
https://issues.apache.org/jira/browse/MAHOUT-1788
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
Post by Pat Ferrel (JIRA)
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests,
neith
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
binary nor source versions will run on a cluster unless data is
hand
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov
2016-04-19 00:22:55 UTC
Permalink
I am not sure of your question about tests...

there are in-memory tests which you can by 'mvn test' in /math-scala
module; distributed tests are done per engine under 'spark', 'h2o' or
'flink' modules.
Post by Dmitriy Lyubimov
i meant "not so much a library"
Post by Dmitriy Lyubimov
Khurrum,
mahout is so much a library at this point.
if you mean if it can be used to build networks with 2d inputs, yes i did
some of that. multi-epoch SGD based systems should be easy enough to build,
and will probably have a reasonable performance -- although I think
dedicated CNN systems like Caffe would still run faster at this point. Full
batch trainers are somewhat slow for larger problems though, my
investigation points that there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.
Post by Khurrum Nasim
Hi Guys,
Can Mahout be used for things like face detection ? Also which unit
tests or integration tests do you recommend I should run just to get a
better feel of the execution flow.
I’m still slowly acclimating to the project. But hopefully should come
up to speed soon.
Many Thanks,
Khurrum
Post by Suneel Marthi
Thanks Khurrum for stepping up.
You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra
stuff.
Post by Suneel Marthi
Welcome aboard !!
On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks for the advice Dimitry. I’m already signed up on ASF jira.
My
Post by Suneel Marthi
Post by Khurrum Nasim
handle is “nasimk”
Do I need to be a linear algebra expert and or math phd to
contribute ?
Post by Suneel Marthi
Post by Khurrum Nasim
I have 10 plus years of computer programming experience. my
background is
Post by Suneel Marthi
Post by Khurrum Nasim
comp sci.
Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign
issues to
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
yourself.
On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work
?
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Khurrum,
you only need github account. What you need is to create mahout's
master
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
fork in your github space and keep it in sync, as possible, with
master
Post by Suneel Marthi
Post by Khurrum Nasim
as
Post by Dmitriy Lyubimov
you go (by doing regular pulls). That way you have the most chance
of
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are
about
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
50 to 70% done or just need a code advice), you can create a github
pull
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
request to the apache/mahout master. Make sure to include MAHOUT-XXX
issue
Post by Dmitriy Lyubimov
in the head of the pull request, that way ASF will automatically
propagate
Post by Dmitriy Lyubimov
code comments to jira, and so all discussion can be done entirely on
github.
Post by Dmitriy Lyubimov
Again, if you take on a signficant contribution (such as a new
numerical
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
method contribution), I recommend to discuss the proposal on the
@dev
Post by Suneel Marthi
Post by Khurrum Nasim
list
Post by Dmitriy Lyubimov
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or
even
Post by Suneel Marthi
Post by Khurrum Nasim
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
benefit
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to
jira).
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work
on
Post by Suneel Marthi
Post by Khurrum Nasim
this
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
https://issues.apache.org/jira/browse/MAHOUT-1788
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
Post by Pat Ferrel (JIRA)
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests,
neith
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
binary nor source versions will run on a cluster unless data is
hand
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in
both
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Khurrum Nasim
2016-04-19 15:38:52 UTC
Permalink
okay thanks - i’ll run those tests. i actually ran a few others as well like the MatrixWritableTest.
Post by Dmitriy Lyubimov
I am not sure of your question about tests...
there are in-memory tests which you can by 'mvn test' in /math-scala
module; distributed tests are done per engine under 'spark', 'h2o' or
'flink' modules.
Post by Dmitriy Lyubimov
i meant "not so much a library"
Post by Dmitriy Lyubimov
Khurrum,
mahout is so much a library at this point.
if you mean if it can be used to build networks with 2d inputs, yes i did
some of that. multi-epoch SGD based systems should be easy enough to build,
and will probably have a reasonable performance -- although I think
dedicated CNN systems like Caffe would still run faster at this point. Full
batch trainers are somewhat slow for larger problems though, my
investigation points that there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.
Post by Khurrum Nasim
Hi Guys,
Can Mahout be used for things like face detection ? Also which unit
tests or integration tests do you recommend I should run just to get a
better feel of the execution flow.
I’m still slowly acclimating to the project. But hopefully should come
up to speed soon.
Many Thanks,
Khurrum
Post by Suneel Marthi
Thanks Khurrum for stepping up.
You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra
stuff.
Post by Suneel Marthi
Welcome aboard !!
On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <
Thanks for the advice Dimitry. I’m already signed up on ASF jira.
My
Post by Suneel Marthi
handle is “nasimk”
Do I need to be a linear algebra expert and or math phd to
contribute ?
Post by Suneel Marthi
I have 10 plus years of computer programming experience. my
background is
Post by Suneel Marthi
comp sci.
Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign
issues to
Post by Suneel Marthi
Post by Dmitriy Lyubimov
yourself.
On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work
?
Post by Suneel Marthi
Post by Dmitriy Lyubimov
Khurrum,
you only need github account. What you need is to create mahout's
master
Post by Suneel Marthi
Post by Dmitriy Lyubimov
fork in your github space and keep it in sync, as possible, with
master
Post by Suneel Marthi
as
Post by Dmitriy Lyubimov
you go (by doing regular pulls). That way you have the most chance
of
Post by Suneel Marthi
Post by Dmitriy Lyubimov
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are
about
Post by Suneel Marthi
Post by Dmitriy Lyubimov
50 to 70% done or just need a code advice), you can create a github
pull
Post by Suneel Marthi
Post by Dmitriy Lyubimov
request to the apache/mahout master. Make sure to include MAHOUT-XXX
issue
Post by Dmitriy Lyubimov
in the head of the pull request, that way ASF will automatically
propagate
Post by Dmitriy Lyubimov
code comments to jira, and so all discussion can be done entirely on
github.
Post by Dmitriy Lyubimov
Again, if you take on a signficant contribution (such as a new
numerical
Post by Suneel Marthi
Post by Dmitriy Lyubimov
method contribution), I recommend to discuss the proposal on the
@dev
Post by Suneel Marthi
list
Post by Dmitriy Lyubimov
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or
even
Post by Suneel Marthi
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
benefit
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to
jira).
Post by Suneel Marthi
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Suneel Marthi
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work
on
Post by Suneel Marthi
this
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
https://issues.apache.org/jira/browse/MAHOUT-1788
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
Post by Pat Ferrel (JIRA)
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests,
neith
Post by Suneel Marthi
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
binary nor source versions will run on a cluster unless data is
hand
Post by Suneel Marthi
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in
both
Post by Suneel Marthi
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Khurrum Nasim
2016-04-19 15:08:21 UTC
Permalink
Thank you Dimitry.

So is there an architectural blueprint for mahout ? What I mean is how can get the 1000 feet overview ? Or the bird eye view of the project.
I do see Mahout is very modularized - however I’m still trying to make heads and tails out it :)

@Dimitry -
"my investigation points that there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.” - Can you share some more details about this - I’m just curious.
Post by Dmitriy Lyubimov
Khurrum,
mahout is so much a library at this point.
if you mean if it can be used to build networks with 2d inputs, yes i did
some of that. multi-epoch SGD based systems should be easy enough to build,
and will probably have a reasonable performance -- although I think
dedicated CNN systems like Caffe would still run faster at this point. Full
batch trainers are somewhat slow for larger problems though, my
investigation points that there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.
Post by Khurrum Nasim
Hi Guys,
Can Mahout be used for things like face detection ? Also which unit
tests or integration tests do you recommend I should run just to get a
better feel of the execution flow.
I’m still slowly acclimating to the project. But hopefully should come up
to speed soon.
Many Thanks,
Khurrum
Post by Suneel Marthi
Thanks Khurrum for stepping up.
You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra stuff.
Welcome aboard !!
Thanks for the advice Dimitry. I’m already signed up on ASF jira. My
handle is “nasimk”
Do I need to be a linear algebra expert and or math phd to contribute ?
I have 10 plus years of computer programming experience. my background
is
Post by Suneel Marthi
comp sci.
Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign issues
to
Post by Suneel Marthi
Post by Dmitriy Lyubimov
yourself.
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work ?
Khurrum,
you only need github account. What you need is to create mahout's
master
Post by Suneel Marthi
Post by Dmitriy Lyubimov
fork in your github space and keep it in sync, as possible, with
master
Post by Suneel Marthi
as
Post by Dmitriy Lyubimov
you go (by doing regular pulls). That way you have the most chance of
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are
about
Post by Suneel Marthi
Post by Dmitriy Lyubimov
50 to 70% done or just need a code advice), you can create a github
pull
Post by Suneel Marthi
Post by Dmitriy Lyubimov
request to the apache/mahout master. Make sure to include MAHOUT-XXX
issue
Post by Dmitriy Lyubimov
in the head of the pull request, that way ASF will automatically
propagate
Post by Dmitriy Lyubimov
code comments to jira, and so all discussion can be done entirely on
github.
Post by Dmitriy Lyubimov
Again, if you take on a signficant contribution (such as a new
numerical
Post by Suneel Marthi
list
Post by Dmitriy Lyubimov
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or even
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
benefit
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to jira).
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Suneel Marthi
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work on
this
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
https://issues.apache.org/jira/browse/MAHOUT-1788
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
Post by Pat Ferrel (JIRA)
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests,
neith
Post by Suneel Marthi
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
binary nor source versions will run on a cluster unless data is
hand
Post by Suneel Marthi
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in both
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi
2016-04-19 15:20:54 UTC
Permalink
Post by Khurrum Nasim
Thank you Dimitry.
So is there an architectural blueprint for mahout ? What I mean is how
can get the 1000 feet overview ? Or the bird eye view of the project.
I do see Mahout is very modularized - however I’m still trying to make
heads and tails out it :)
@Dimitry -
"my investigation points that there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.” - Can you
share some more details about this - I’m just curious.
Long story short - "Distributed != Scalable"
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Khurrum,
mahout is so much a library at this point.
if you mean if it can be used to build networks with 2d inputs, yes i did
some of that. multi-epoch SGD based systems should be easy enough to
build,
Post by Dmitriy Lyubimov
and will probably have a reasonable performance -- although I think
dedicated CNN systems like Caffe would still run faster at this point.
Full
Post by Dmitriy Lyubimov
batch trainers are somewhat slow for larger problems though, my
investigation points that there are architectural problems in spark that
are hard to overcome at this point for high IO algorithms.
On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <
Post by Khurrum Nasim
Hi Guys,
Can Mahout be used for things like face detection ? Also which unit
tests or integration tests do you recommend I should run just to get a
better feel of the execution flow.
I’m still slowly acclimating to the project. But hopefully should come
up
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
to speed soon.
Many Thanks,
Khurrum
Post by Suneel Marthi
Thanks Khurrum for stepping up.
You just need basic programming skills - Java/Scala to be able to
contribute. We can help you with the algorithms and linear algebra
stuff.
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Welcome aboard !!
On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks for the advice Dimitry. I’m already signed up on ASF jira.
My
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
handle is “nasimk”
Do I need to be a linear algebra expert and or math phd to
contribute ?
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
I have 10 plus years of computer programming experience. my
background
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
is
Post by Suneel Marthi
Post by Khurrum Nasim
comp sci.
Khurrum
Post by Dmitriy Lyubimov
PS You may also want to sign up with ASF Jira so we can assign issues
to
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
yourself.
On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <
On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
Post by Khurrum Nasim
Thanks Dimirtry.
I take a look at see where I can start pitching in. Do I need
contributor access ? how would I create feature branch of my work
?
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Khurrum,
you only need github account. What you need is to create mahout's
master
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
fork in your github space and keep it in sync, as possible, with
master
Post by Suneel Marthi
Post by Khurrum Nasim
as
Post by Dmitriy Lyubimov
you go (by doing regular pulls). That way you have the most chance
of
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
having least conflicts possible.
At any point in time (I recommend at perhaps when you feel you are
about
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
50 to 70% done or just need a code advice), you can create a github
pull
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
request to the apache/mahout master. Make sure to include MAHOUT-XXX
issue
Post by Dmitriy Lyubimov
in the head of the pull request, that way ASF will automatically
propagate
Post by Dmitriy Lyubimov
code comments to jira, and so all discussion can be done entirely on
github.
Post by Dmitriy Lyubimov
Again, if you take on a signficant contribution (such as a new
numerical
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
method contribution), I recommend to discuss the proposal on the
@dev
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
list
Post by Dmitriy Lyubimov
thanks.
Post by Khurrum Nasim
Khurrum
Post by Dmitriy Lyubimov
Oh but of course! please do!
You may work on any issue, this or any other of your choice, or
even
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
any
Post by Dmitriy Lyubimov
new issue you can think of (for sizeable contributions it is
recommended to
benefit
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
from experience of others. Please file any new issue first to
jira).
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
Post by Pat Ferrel (JIRA)
[
https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
]
----------------------------------------------
Hello. I would like to start contributing to mahout. Can I work
on
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
this
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
issue?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
https://issues.apache.org/jira/browse/MAHOUT-1788
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
Post by Pat Ferrel (JIRA)
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests,
neith
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
binary nor source versions will run on a cluster unless data is
hand
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
copied
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
to hdfs.
Post by Pat Ferrel (JIRA)
Clean this up so it copies data if needed and the data is in
both
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Suneel Marthi
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Khurrum Nasim
Post by Dmitriy Lyubimov
Post by Pat Ferrel (JIRA)
versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-03-30 17:25:26 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218389#comment-15218389 ]

Suneel Marthi commented on MAHOUT-1788:
---------------------------------------

[~shashidongur] thanks for taking this up, please make a PR once u r done.
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.
Clean this up so it copies data if needed and the data is in both versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Dmitriy Lyubimov (JIRA)
2016-03-30 18:29:25 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218531#comment-15218531 ]

Dmitriy Lyubimov commented on MAHOUT-1788:
------------------------------------------

[~shashidongur]

Oh but of course! please do!
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.
Clean this up so it copies data if needed and the data is in both versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-04-03 13:21:25 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223282#comment-15223282 ]

Suneel Marthi commented on MAHOUT-1788:
---------------------------------------

[~shashidongur] Any progress on this yet? Do you need any help?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.
Clean this up so it copies data if needed and the data is in both versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
shashi bushan dongur (JIRA)
2016-04-04 21:36:25 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225099#comment-15225099 ]

shashi bushan dongur commented on MAHOUT-1788:
----------------------------------------------

[~smarthi] I currently have mahout installed and set up on my VM. I am digging up the source code to understand how it work. I will post update when I start editing the code.

Is there any resource I can look at to learn how to efficiently edit and run mahout? I have followed the instructions on github, but having hard time understanding how I can run and test the code. Any resource regarding that would hugely help!

P.S: I am new to apache open source or open source in general.
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.
Clean this up so it copies data if needed and the data is in both versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-10-14 03:36:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574059#comment-15574059 ]

Suneel Marthi commented on MAHOUT-1788:
---------------------------------------

Is someone still working on this?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.
Clean this up so it copies data if needed and the data is in both versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Andrew Palumbo (JIRA)
2016-12-19 06:03:58 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760262#comment-15760262 ]

Andrew Palumbo commented on MAHOUT-1788:
----------------------------------------

[~pferrel], [~shashidongur] Is there anything to be done here? Can we close it out or should we bump it to 0.14/1.0.0?
Post by Pat Ferrel (JIRA)
spark-itemsimilarity integration test script cleanup
----------------------------------------------------
Key: MAHOUT-1788
URL: https://issues.apache.org/jira/browse/MAHOUT-1788
Project: Mahout
Issue Type: Improvement
Components: cooccurrence
Affects Versions: 0.11.0
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Priority: Trivial
Fix For: 1.0.0
binary release does not contain data for itemsimilarity tests, neith binary nor source versions will run on a cluster unless data is hand copied to hdfs.
Clean this up so it copies data if needed and the data is in both versions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Loading...