there might be a concept of "contrib" sub project with totally separate
code tree, some asf projects do that. that way it is easy to keep it around
Post by Khurrum NasimI agree with Andrew. Mahout should remain indigenous.
Prakash - you may want to create your own project on github using the mahout library.
I don't think that this sort of of integration work would be a good fit
directly to the Mahout project. Mahout is more about math, algorithms and
an environment to develop algorithms. We stay away from direct platform
integration. In the past we did have some elasticsearch/mahout integration
work that is not in the code base for this exact reason. I would suggest
that better places to contribute something like this may be: PIO (
https://prediction.io/), or even directly as a package for spark
http://spark-packages.org/ .
https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
.
I think that the project that you are proposing would be a better fit
there.
Thanks,
Andy
________________________________________
Sent: Thursday, April 28, 2016 1:50 PM
Subject: Re: Mahout contributions
I want to start with social data as an example, for example data
returned from FB graph API as well user Twitter data, will send some
samples later if you're interested.
Sent from my iPhone
Post by Khurrum NasimWhat type of JSON payload size are we talking about here ?
Post by Saikat KanjilalBecause EL gives you the visualization and non Lucene type query
constructs as well and also that it already has a rest API that I plan on
tying into mahout. I plan on wrapping some of the clustering algorithms
that I implement using Mahout and Spark as a service which can then make
calls into other services (namely elasticsearch and neo4j graph service).
Post by Khurrum NasimPost by Saikat KanjilalSent from my iPhone
Post by Khurrum Nasim@Saikat- why use EL instead of Lucene directly.
Post by Saikat KanjilalThis is great information thank you, based on this recommendation I
won't create a JIRA but start work on my project and when the code
approaches the percentages you are describing I will create the appropriate
JIRA's and put together a proposal to send to the list, sound ok? Based on
your latest updates to the wiki i will work on a handful of the clustering
algorithms since I see that the Spark implementations for these are not yet
complete.
Post by Khurrum NasimPost by Saikat KanjilalPost by Khurrum NasimPost by Saikat KanjilalThank you again
Subject: Re: Mahout contributions
Date: Thu, 28 Apr 2016 01:31:09 +0000
Saikat,
One other thing that I should say is that you do not need clearance
or input from the committers to begin work on your project, and the
interest can and should come from the community as a whole. You can write
proposal as you've done, and if you don't see any "+1"s or responses from
the community at whole with in a few days, you may want to explain in more
detail, give examples and use cases. If you are still not seeing +1s or
any responses from others then I think you can assume that there may not be
interest; this is usually how things work.
like you can deliver this should not to stop you. People do not always
with your proposed contribution by following the steps laid out in my
Post by Khurrum NasimPost by Saikat KanjilalPost by Khurrum NasimPost by Saikat Kanjilalhttp://mahout.apache.org/developers/how-to-contribute.html
and create a JIRA. When you have reached a significant amount of
completion (around 70-80%), open a PR for review, this way you can explain
in more detail.
is some expectation of a commitment on your part to complete it.
features. I have spent a good deal of time this week and last already and
am even mocking up code as a sketch of what may become an implementation
before I open a "New Feature" JIRA for it.
opening JIRAs for new features, rather to let you know that when you open
an JIRA for a new issue, It tells others that your are working on it, and
thus may discourage another with a similar idea to contribute this
feature. So it is best to open it once you've begun your work and are
committed to it.
Post by Khurrum NasimPost by Saikat KanjilalPost by Khurrum NasimPost by Saikat KanjilalAndy
________________________________________
Sent: Wednesday, April 27, 2016 8:24 PM
Subject: RE: Mahout contributions
Andrew,Thank you very much for your input, I actually want to start
a new set of JIRAs, here's what I want to work on, I want to build a
framework that ties together search/visualization capability with some
machine learning algorithms, so essentially think of it as tying in
elasticsearch and kibana into mahout , the user can search for their data
with elasticsearch and for deeper analysis on that data they can feed that
data into one or more mahout backends for analysis. Another interesting
tie in might be to hack kibana to render ggplot like graphics based on the
output of mahout algorithms (assuming this can be a kibana plugin).
if there's interest in this initiative. The tool will bring together the
ELK stack with dynamic machine learning algorithms. I can go into a lot
more detail around use cases if there's enough interest.
Post by Khurrum NasimPost by Saikat KanjilalPost by Khurrum NasimPost by Saikat KanjilalLooking forward to your and other committers input.Thanks
Subject: Re: Mahout contributions
Date: Wed, 27 Apr 2016 20:16:38 +0000
Hello Saikat,
#1 and #2 above are already implemented. #4 is tricky so i would
not recommend without a strong knowledge of the codebase, and #5 is now
deprecated. (I've just updated the algorithms grid to reflect this). The
algorithms page includes both algorithms implemented in the math-scala
library and algorithms which have CLI drivers written for them.
http://mahout.apache.org/developers/how-to-contribute.html
best interest to keep messages on list, contacting committers directly is
discouraged.
issue) would be for you to pick a single open issue in the mahout JIRA
which is not already assigned, and start work on it. When your work is
ready for review, just open up a PR and the committers will review it.
Please note that if you do pick up an issue to work on, we do expect some
amount of responsibility and reliability and tangible amount of
satisfactory work since once you've marked a JIRA as something you're
working on, others will pass on it.
that could make to existing code not necessarily open JIRAs that need to be
assigned to you. For example please see the recent contribution and
workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 .
start a new JIRA issue and begin work on it. In this case, when you have
some code that is ready for review, you can simply open up a PR for it and
committers will review it. For new implementations, we generally say that
you should do this when you are at least 70-80% finished with your coding.
Post by Khurrum NasimPost by Saikat KanjilalPost by Khurrum NasimPost by Saikat KanjilalThank You,
Andy
________________________________________
Sent: Tuesday, April 26, 2016 7:17 PM
Subject: RE: Mahout contributions
Hello,Following up on my last email with more specifics, I've
looked through the wiki (
https://mahout.apache.org/users/basics/algorithms.html) and I'm
interested in implementing the one or more of the following algorithms with
Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes 3)
Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from Text 5)
Lucene integration.
where is there the greatest need?2) Should I fork the repo and create
branches for the each of the above implementations?3) Should I go ahead and
create some JIRAs for these?
Post by Khurrum NasimPost by Saikat KanjilalPost by Khurrum NasimPost by Saikat KanjilalWould love to have some pointers to get started?Regards
Subject: Mahout contributions
Date: Wed, 30 Mar 2016 10:23:45 -0700
Hello Committers,I was looking through the current jira tickets
and was wondering if there's a particular area of Mahout that needs some
more help than others, should I focus on contributing some algorithms usign
DSL or Samsara related efforts, I've finally got some bandwidth to do some
work and would love some guidance before assigning myself some
tickets.Regards