Discussion:
[jira] [Updated] (MAHOUT-1958) CityBlockSimilarity.itemSimilarities can overflow
Hao Zhong (JIRA)
2017-03-23 01:26:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hao Zhong updated MAHOUT-1958:
------------------------------
Description:
The CityBlockSimilarity.itemSimilarities method has the following code:
{code:title=CityBlockSimilarity.java|borderStyle=solid}
int preferring2 = dataModel.getNumUsersWithPreferenceFor(itemID2s[i]);
int intersection = dataModel.getNumUsersWithPreferenceFor(itemID1, itemID2s[i]);
{code}

Here, the two methods return long values, and can overflow. Indeed, LogLikelihoodSimilaritydoItemSimilarity once had the same problem. The fixed code is

{code:title=LogLikelihoodSimilaritydoItemSimilarity.java|borderStyle=solid}
long preferring1 = dataModel.getNumUsersWithPreferenceFor(itemID1);
long numUsers = dataModel.getNumUsers();
{code}
Please refer to MAHOUT-738 for details.

was:
The CityBlockSimilarity.itemSimilarities method has the following code:
{code:title=CityBlockSimilarity.java|borderStyle=solid}
int preferring2 = dataModel.getNumUsersWithPreferenceFor(itemID2s[i]);
int intersection = dataModel.getNumUsersWithPreferenceFor(itemID1, itemID2s[i]);
{code}

Here, the two methods return long values, and can overflow. Indeed, LogLikelihoodSimilaritydoItemSimilarity once had the same problem. The fixed code is

{code:title=CityBlockSimilarity.java|borderStyle=solid}
long preferring1 = dataModel.getNumUsersWithPreferenceFor(itemID1);
long numUsers = dataModel.getNumUsers();
{code}
Please refer to MAHOUT-738 for details.
CityBlockSimilarity.itemSimilarities can overflow
-------------------------------------------------
Key: MAHOUT-1958
URL: https://issues.apache.org/jira/browse/MAHOUT-1958
Project: Mahout
Issue Type: Bug
Components: Math
Affects Versions: 1.0.0
Reporter: Hao Zhong
{code:title=CityBlockSimilarity.java|borderStyle=solid}
int preferring2 = dataModel.getNumUsersWithPreferenceFor(itemID2s[i]);
int intersection = dataModel.getNumUsersWithPreferenceFor(itemID1, itemID2s[i]);
{code}
Here, the two methods return long values, and can overflow. Indeed, LogLikelihoodSimilaritydoItemSimilarity once had the same problem. The fixed code is
{code:title=LogLikelihoodSimilaritydoItemSimilarity.java|borderStyle=solid}
long preferring1 = dataModel.getNumUsersWithPreferenceFor(itemID1);
long numUsers = dataModel.getNumUsers();
{code}
Please refer to MAHOUT-738 for details.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Hao Zhong (JIRA)
2017-03-23 01:27:41 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hao Zhong updated MAHOUT-1958:
------------------------------
Status: Patch Available (was: Open)

diff --git a/mr/src/main/java/org/apache/mahout/cf/taste/impl/similarity/CityBlockSimilarity.java b/mr/src/main/java/org/apache/mahout/cf/taste/impl/similarity/CityBlockSimilarity.java
index 88fbe58..760f10c 100644
--- a/mr/src/main/java/org/apache/mahout/cf/taste/impl/similarity/CityBlockSimilarity.java
+++ b/mr/src/main/java/org/apache/mahout/cf/taste/impl/similarity/CityBlockSimilarity.java
@@ -53,9 +53,9 @@
@Override
public double itemSimilarity(long itemID1, long itemID2) throws TasteException {
DataModel dataModel = getDataModel();
- int preferring1 = dataModel.getNumUsersWithPreferenceFor(itemID1);
- int preferring2 = dataModel.getNumUsersWithPreferenceFor(itemID2);
- int intersection = dataModel.getNumUsersWithPreferenceFor(itemID1, itemID2);
+ long preferring1 = dataModel.getNumUsersWithPreferenceFor(itemID1);
+ long preferring2 = dataModel.getNumUsersWithPreferenceFor(itemID2);
+ long intersection = dataModel.getNumUsersWithPreferenceFor(itemID1, itemID2);
return doSimilarity(preferring1, preferring2, intersection);
}

@@ -90,8 +90,8 @@
* @param pref2 number of non-zero values in right vector
* @param intersection number of overlapping non-zero values
*/
- private static double doSimilarity(int pref1, int pref2, int intersection) {
- int distance = pref1 + pref2 - 2 * intersection;
+ private static double doSimilarity(long pref1, long pref2, long intersection) {
+ long distance = pref1 + pref2 - 2 * intersection;
return 1.0 / (1.0 + distance);
}
CityBlockSimilarity.itemSimilarities can overflow
-------------------------------------------------
Key: MAHOUT-1958
URL: https://issues.apache.org/jira/browse/MAHOUT-1958
Project: Mahout
Issue Type: Bug
Components: Math
Affects Versions: 1.0.0
Reporter: Hao Zhong
{code:title=CityBlockSimilarity.java|borderStyle=solid}
int preferring2 = dataModel.getNumUsersWithPreferenceFor(itemID2s[i]);
int intersection = dataModel.getNumUsersWithPreferenceFor(itemID1, itemID2s[i]);
{code}
Here, the two methods return long values, and can overflow. Indeed, LogLikelihoodSimilaritydoItemSimilarity once had the same problem. The fixed code is
{code:title=LogLikelihoodSimilaritydoItemSimilarity.java|borderStyle=solid}
long preferring1 = dataModel.getNumUsersWithPreferenceFor(itemID1);
long numUsers = dataModel.getNumUsers();
{code}
Please refer to MAHOUT-738 for details.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Loading...