Discussion:
[jira] [Created] (MAHOUT-1940) Implementing similarity analysis using co-occurence matrix in java
James Mackey (JIRA)
2017-02-12 15:49:41 UTC
Permalink
James Mackey created MAHOUT-1940:
------------------------------------

Summary: Implementing similarity analysis using co-occurence matrix in java
Key: MAHOUT-1940
URL: https://issues.apache.org/jira/browse/MAHOUT-1940
Project: Mahout
Issue Type: New Feature
Components: Algorithms, cooccurrence
Reporter: James Mackey


We want to port the functionality from org.apache.mahout.math.cf.SimilarityAnalysis.scala to java for easy integration with a java project we will be creating that derives a similarity measure from the co-occurence matrix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Pat Ferrel (JIRA)
2017-02-12 16:26:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862862#comment-15862862 ]

Pat Ferrel commented on MAHOUT-1940:
------------------------------------

This would be Awesome! Let me know if you need help. There are some things that are no longer required. I just duplicated some methods to maintain backward compatibility, while adding new features.

I also implemented some new helper object `apply` functions, which are alternative constructors, outside of Mahout in the PredictionIO Universal Recommender Template. When 0.5.1 of the Template is released concurrent with PIO 0.11.0 and Mahout 0.13.0. The ones in the Template code are all you will need for porting the Template to Java.

To make SimilarityAnalysis complete and accepted into Mahout you'd probably need to port all of the SimilarityAnalysis class and IndexedDatasetSpark.
Post by James Mackey (JIRA)
Implementing similarity analysis using co-occurence matrix in java
------------------------------------------------------------------
Key: MAHOUT-1940
URL: https://issues.apache.org/jira/browse/MAHOUT-1940
Project: Mahout
Issue Type: New Feature
Components: Algorithms, cooccurrence
Reporter: James Mackey
We want to port the functionality from org.apache.mahout.math.cf.SimilarityAnalysis.scala to java for easy integration with a java project we will be creating that derives a similarity measure from the co-occurence matrix.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
James Mackey (JIRA)
2017-02-12 21:53:42 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862990#comment-15862990 ]

James Mackey commented on MAHOUT-1940:
--------------------------------------

Hi Pat! Thanks for the offer - we would really appreciate some guidance from you. Would you mind meeting with us virtually over the next couple of days to go over the file structure and what exactly we have to implement to make this happen?
Post by James Mackey (JIRA)
Implementing similarity analysis using co-occurence matrix in java
------------------------------------------------------------------
Key: MAHOUT-1940
URL: https://issues.apache.org/jira/browse/MAHOUT-1940
Project: Mahout
Issue Type: New Feature
Components: Algorithms, cooccurrence
Reporter: James Mackey
We want to port the functionality from org.apache.mahout.math.cf.SimilarityAnalysis.scala to java for easy integration with a java project we will be creating that derives a similarity measure from the co-occurence matrix.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Pat Ferrel (JIRA)
2017-02-13 18:25:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pat Ferrel updated MAHOUT-1940:
-------------------------------
Summary: Provide a Java API to SimilarityAnalysis and any other needed APIs (was: Implementing similarity analysis using co-occurence matrix in java)
Provide a Java API to SimilarityAnalysis and any other needed APIs
-------------------------------------------------------------------
Key: MAHOUT-1940
URL: https://issues.apache.org/jira/browse/MAHOUT-1940
Project: Mahout
Issue Type: New Feature
Components: Algorithms, cooccurrence
Reporter: James Mackey
We want to port the functionality from org.apache.mahout.math.cf.SimilarityAnalysis.scala to java for easy integration with a java project we will be creating that derives a similarity measure from the co-occurence matrix.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Pat Ferrel (JIRA)
2017-02-13 18:26:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pat Ferrel updated MAHOUT-1940:
-------------------------------
Description: We want to port the functionality from org.apache.mahout.math.cf.SimilarityAnalysis.scala to java for easy integration with a java project we will be creating that derives a similarity measure from the co-occurrence and cross-occurrence matrix. (was: We want to port the functionality from org.apache.mahout.math.cf.SimilarityAnalysis.scala to java for easy integration with a java project we will be creating that derives a similarity measure from the co-occurence matrix. )
Post by Pat Ferrel (JIRA)
Provide a Java API to SimilarityAnalysis and any other needed APIs
-------------------------------------------------------------------
Key: MAHOUT-1940
URL: https://issues.apache.org/jira/browse/MAHOUT-1940
Project: Mahout
Issue Type: New Feature
Components: Algorithms, cooccurrence
Reporter: James Mackey
We want to port the functionality from org.apache.mahout.math.cf.SimilarityAnalysis.scala to java for easy integration with a java project we will be creating that derives a similarity measure from the co-occurrence and cross-occurrence matrix.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Dmitriy Lyubimov (JIRA)
2017-02-14 22:42:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866853#comment-15866853 ]

Dmitriy Lyubimov commented on MAHOUT-1940:
------------------------------------------

Normally, one who is writing in Java, does not have to really port anything from Scala.
For example, Spark's Java APIs are in fact implemented in Scala.

There are normally two ways of going about this:
(1) write API in Java and implement them in Scala (the way Spark does),
(2) write Java-compatible traits in Scala and then implement them in Scala as well. (which is what i do as it saves complexity a bit).

to approach the (2), the APIs should only be using Java-compatible types. That is, no Scala libraries (such as collections) or incompatible language constructs (such as implicits, curried functions, generics context bounds etc. etc.) Implementing API interfaces in Java just verifies this a bit better and allows avoiding a mixed build (which may sometimes be a problem due to circular dependencies between Java and Scala code).
Post by Pat Ferrel (JIRA)
Provide a Java API to SimilarityAnalysis and any other needed APIs
-------------------------------------------------------------------
Key: MAHOUT-1940
URL: https://issues.apache.org/jira/browse/MAHOUT-1940
Project: Mahout
Issue Type: New Feature
Components: Algorithms, cooccurrence
Reporter: James Mackey
We want to port the functionality from org.apache.mahout.math.cf.SimilarityAnalysis.scala to java for easy integration with a java project we will be creating that derives a similarity measure from the co-occurrence and cross-occurrence matrix.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Loading...