Aditya
2017-03-31 23:47:51 UTC
Hi everyone,
I've been talking with Trevor over email and he shared some documents with
me. They contained content that he (along with a few others) were
developing to make Mahout easily accessible to newbies like myself.
I've gone through the planned blog posts titled "Why Mahout", "Getting
Started with Mahout", "Algorithms Framework" and "Building Apache Mahout
from Source" and I have to say, I've got a lot of questions. Since Trevor
is on vacation and the deadline for final proposal submission is fast
approaching, I thought I'll post my questions on the dev forum.
So here goes the big list of my questions. I hope of those of you who were
/ are involved in the development of these blog posts will be able to help
me. Some of the questions are vague / abstract, I suggest you answer them
as if you're explaining it to a layman.
1. Could you elaborate to me the high-level structure of Mahout?
2. What are the plans in pipeline for Mahout's development in the months to
come?
3. How does contribution of a new algorithm work in Mahout? When I was
reading the doc "Getting Started with Mahout" the example implemented the
Ordinary Least Squares Regression in Samsara, Mahout's DSL.
I had something different in my mind before reading the blog posts. I had
thought that I would be contributing the distributed algorithm to Mahout
from scratch, written in Scala and make it available as a package (which
users can import and use) to users who use Mahout.
4. In general, is there a plan to contribute the algorithms in future using
Samsara only? If so, what will be the limitations and advantages of this
decision? I mean, the algorithms that will be a part of Mahout in the
future, is there a plan to write all of them in Samsara.
5. What are the building blocks of Mahout that enable the distributed
processing? The blog post mentions the Distributed Row Matrix. Are there
any other distributed data structures available? If not, won't the
algorithms that can be a part of the Mahout framework in the future become
limited? Meaning, algorithms that cannot be reduced to a Linear Algebra
problem?
6. What is expected of a newbie in the community? What is the learning
curve to become an active contributor to Mahout? Are there any specific
books / blog posts that I can read that will make the process easier?
7. Also, if you could give me some background as to how the development of
Mahout has been going on. Not the motivation / inspiration that led to
Mahout's conception but something like, what work has gone on between the
previous release and the current release candidate.
8. What was the high level motivation of developing Mahout's own DSL,
Samsara?
Regards,
Aditya
I've been talking with Trevor over email and he shared some documents with
me. They contained content that he (along with a few others) were
developing to make Mahout easily accessible to newbies like myself.
I've gone through the planned blog posts titled "Why Mahout", "Getting
Started with Mahout", "Algorithms Framework" and "Building Apache Mahout
from Source" and I have to say, I've got a lot of questions. Since Trevor
is on vacation and the deadline for final proposal submission is fast
approaching, I thought I'll post my questions on the dev forum.
So here goes the big list of my questions. I hope of those of you who were
/ are involved in the development of these blog posts will be able to help
me. Some of the questions are vague / abstract, I suggest you answer them
as if you're explaining it to a layman.
1. Could you elaborate to me the high-level structure of Mahout?
2. What are the plans in pipeline for Mahout's development in the months to
come?
3. How does contribution of a new algorithm work in Mahout? When I was
reading the doc "Getting Started with Mahout" the example implemented the
Ordinary Least Squares Regression in Samsara, Mahout's DSL.
I had something different in my mind before reading the blog posts. I had
thought that I would be contributing the distributed algorithm to Mahout
from scratch, written in Scala and make it available as a package (which
users can import and use) to users who use Mahout.
4. In general, is there a plan to contribute the algorithms in future using
Samsara only? If so, what will be the limitations and advantages of this
decision? I mean, the algorithms that will be a part of Mahout in the
future, is there a plan to write all of them in Samsara.
5. What are the building blocks of Mahout that enable the distributed
processing? The blog post mentions the Distributed Row Matrix. Are there
any other distributed data structures available? If not, won't the
algorithms that can be a part of the Mahout framework in the future become
limited? Meaning, algorithms that cannot be reduced to a Linear Algebra
problem?
6. What is expected of a newbie in the community? What is the learning
curve to become an active contributor to Mahout? Are there any specific
books / blog posts that I can read that will make the process easier?
7. Also, if you could give me some background as to how the development of
Mahout has been going on. Not the motivation / inspiration that led to
Mahout's conception but something like, what work has gone on between the
previous release and the current release candidate.
8. What was the high level motivation of developing Mahout's own DSL,
Samsara?
Regards,
Aditya