Discussion:
[jira] [Created] (MAHOUT-1976) Add Canopy Clustering Algorithm
Trevor Grant (JIRA)
2017-05-03 23:24:04 UTC
Permalink
Trevor Grant created MAHOUT-1976:
------------------------------------

Summary: Add Canopy Clustering Algorithm
Key: MAHOUT-1976
URL: https://issues.apache.org/jira/browse/MAHOUT-1976
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.2
Reporter: Trevor Grant


Primarily, we need to lay out the clustering section of the Algorihtms Framework.

The Canopy Clustering Algorithm is very simple and yet very useful as a preprocessing step for more advanced clustering algorithms such as KMeans and Hierarchical Clustering.

https://en.wikipedia.org/wiki/Canopy_clustering_algorithm

The majority of the "work" on this PR will be creating the framework.

It is also one of the Legacy MR algorithms that would be nice to port.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-05-03 23:40:04 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant reassigned MAHOUT-1976:
------------------------------------

Assignee: Trevor Grant
Post by Trevor Grant (JIRA)
Add Canopy Clustering Algorithm
-------------------------------
Key: MAHOUT-1976
URL: https://issues.apache.org/jira/browse/MAHOUT-1976
Project: Mahout
Issue Type: Bug
Components: Algorithms
Affects Versions: 0.13.2
Reporter: Trevor Grant
Assignee: Trevor Grant
Primarily, we need to lay out the clustering section of the Algorihtms Framework.
The Canopy Clustering Algorithm is very simple and yet very useful as a preprocessing step for more advanced clustering algorithms such as KMeans and Hierarchical Clustering.
https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
The majority of the "work" on this PR will be creating the framework.
It is also one of the Legacy MR algorithms that would be nice to port.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-05-06 18:10:04 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant updated MAHOUT-1976:
---------------------------------
Issue Type: Improvement (was: Bug)
Post by Trevor Grant (JIRA)
Add Canopy Clustering Algorithm
-------------------------------
Key: MAHOUT-1976
URL: https://issues.apache.org/jira/browse/MAHOUT-1976
Project: Mahout
Issue Type: Improvement
Components: Algorithms
Affects Versions: 0.13.2
Reporter: Trevor Grant
Assignee: Trevor Grant
Primarily, we need to lay out the clustering section of the Algorihtms Framework.
The Canopy Clustering Algorithm is very simple and yet very useful as a preprocessing step for more advanced clustering algorithms such as KMeans and Hierarchical Clustering.
https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
The majority of the "work" on this PR will be creating the framework.
It is also one of the Legacy MR algorithms that would be nice to port.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-05-06 20:55:04 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999581#comment-15999581 ]

ASF GitHub Bot commented on MAHOUT-1976:
----------------------------------------

GitHub user rawkintrevo opened a pull request:

https://github.com/apache/mahout/pull/314

MAHOUT-1976 Add CanopyClustering

MAHOUT-1976 Add Canopy Clustering

### Purpose of PR:
1 . Primarily, this PR adds CanopyClustering to Algorithms Framework.
2. This PR introduces the "clustering" framework of the algorithms framework
3. this PR introduces distance metrics and ports two metrics from the old MR code base.

### Important ToDos
Please mark each with an "x"
- [x] Opening PR against `develop` NOT `master` (OR `feature-name` if this is part of an ongoing feature development). **need to delete this requirement, JIRA needed**
- [x] A JIRA ticket exists (if not, please create this first)[https://issues.apache.org/jira/browse/ZEPPELIN/]
- [x] Title of PR is "MAHOUT-XXXX Brief Description of Changes" where XXXX
is the JIRA number.
- [x] Created unit tests where appropriate
- [x] Added licenses correct on newly added files
- [x] Assigned JIRA to self
- [x] Added documentation in scala docs/java docs, (and website once that
is merged to dev)
- [x] Successfully built and ran all unit tests, verified that all tests
pass locally.


Oh by the way, does this change break earlier versions?
No

Is this the beginning of a larger project for which a feature branch should be made?
No

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rawkintrevo/mahout mahout-1976

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #314

----
commit 7f18775afae639c1b291fb0273d92dc71de24884
Author: rawkintrevo <***@gmail.com>
Date: 2017-05-04T14:25:42Z

MAHOUT-1976 Add CanopyClustering

MAHOUT-1976 Add Canopy Clustering

forgot unit tests

----
Post by Trevor Grant (JIRA)
Add Canopy Clustering Algorithm
-------------------------------
Key: MAHOUT-1976
URL: https://issues.apache.org/jira/browse/MAHOUT-1976
Project: Mahout
Issue Type: Improvement
Components: Algorithms
Affects Versions: 0.13.2
Reporter: Trevor Grant
Assignee: Trevor Grant
Primarily, we need to lay out the clustering section of the Algorihtms Framework.
The Canopy Clustering Algorithm is very simple and yet very useful as a preprocessing step for more advanced clustering algorithms such as KMeans and Hierarchical Clustering.
https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
The majority of the "work" on this PR will be creating the framework.
It is also one of the Legacy MR algorithms that would be nice to port.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
ASF GitHub Bot (JIRA)
2017-05-21 03:37:04 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018681#comment-16018681 ]

ASF GitHub Bot commented on MAHOUT-1976:
----------------------------------------

Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/314
Post by Trevor Grant (JIRA)
Add Canopy Clustering Algorithm
-------------------------------
Key: MAHOUT-1976
URL: https://issues.apache.org/jira/browse/MAHOUT-1976
Project: Mahout
Issue Type: Improvement
Components: Algorithms
Affects Versions: 0.13.2
Reporter: Trevor Grant
Assignee: Trevor Grant
Primarily, we need to lay out the clustering section of the Algorihtms Framework.
The Canopy Clustering Algorithm is very simple and yet very useful as a preprocessing step for more advanced clustering algorithms such as KMeans and Hierarchical Clustering.
https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
The majority of the "work" on this PR will be creating the framework.
It is also one of the Legacy MR algorithms that would be nice to port.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-05-21 03:50:04 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant resolved MAHOUT-1976.
----------------------------------
Resolution: Fixed
Post by Trevor Grant (JIRA)
Add Canopy Clustering Algorithm
-------------------------------
Key: MAHOUT-1976
URL: https://issues.apache.org/jira/browse/MAHOUT-1976
Project: Mahout
Issue Type: Improvement
Components: Algorithms
Affects Versions: 0.13.2
Reporter: Trevor Grant
Assignee: Trevor Grant
Primarily, we need to lay out the clustering section of the Algorihtms Framework.
The Canopy Clustering Algorithm is very simple and yet very useful as a preprocessing step for more advanced clustering algorithms such as KMeans and Hierarchical Clustering.
https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
The majority of the "work" on this PR will be creating the framework.
It is also one of the Legacy MR algorithms that would be nice to port.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Hudson (JIRA)
2017-05-21 04:41:04 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018690#comment-16018690 ]

Hudson commented on MAHOUT-1976:
--------------------------------

SUCCESS: Integrated in Jenkins build Mahout-Quality #3488 (See [https://builds.apache.org/job/Mahout-Quality/3488/])
MAHOUT-1976 Canopy Clustering closes apache/mahout#314 (rawkintrevo: rev c29496cb11372baddbb76acdee51530347525645)
* (add) math-scala/src/main/scala/org/apache/mahout/math/algorithms/clustering/ClusteringModel.scala
* (add) flink/src/test/scala/org/apache/mahout/flinkbindings/standard/ClusteringSuite.scala
* (add) math-scala/src/main/scala/org/apache/mahout/math/algorithms/common/distance/DistanceMetrics.scala
* (add) website/docs/algorithms/clustering/canopy/SampleData.png
* (add) website/docs/algorithms/clustering/index.md
* (edit) website/docs/_includes/algo_navbar.html
* (add) spark/src/test/scala/org/apache/mahout/math/algorithms/ClusteringSuite.scala
* (add) website/docs/algorithms/clustering/canopy/index.md
* (add) math-scala/src/test/scala/org/apache/mahout/math/algorithms/ClusteringSuiteBase.scala
* (add) h2o/src/test/scala/org/apache/mahout/math/algorithms/ClusteringSuite.scala
* (add) website/docs/algorithms/clustering/canopy/Canopy10.png
* (add) website/docs/algorithms/clustering/canopy/Canopy.png
* (add) math-scala/src/main/scala/org/apache/mahout/math/algorithms/clustering/Canopy.scala
* (add) website/docs/algorithms/clustering/distance-metrics.md
* (edit) website/docs/algorithms/map-reduce/clustering/canopy-clustering.md
Post by Trevor Grant (JIRA)
Add Canopy Clustering Algorithm
-------------------------------
Key: MAHOUT-1976
URL: https://issues.apache.org/jira/browse/MAHOUT-1976
Project: Mahout
Issue Type: Improvement
Components: Algorithms
Affects Versions: 0.13.2
Reporter: Trevor Grant
Assignee: Trevor Grant
Primarily, we need to lay out the clustering section of the Algorihtms Framework.
The Canopy Clustering Algorithm is very simple and yet very useful as a preprocessing step for more advanced clustering algorithms such as KMeans and Hierarchical Clustering.
https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
The majority of the "work" on this PR will be creating the framework.
It is also one of the Legacy MR algorithms that would be nice to port.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Trevor Grant (JIRA)
2017-06-23 04:36:02 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Trevor Grant updated MAHOUT-1976:
---------------------------------
Fix Version/s: 0.13.1
Post by Trevor Grant (JIRA)
Add Canopy Clustering Algorithm
-------------------------------
Key: MAHOUT-1976
URL: https://issues.apache.org/jira/browse/MAHOUT-1976
Project: Mahout
Issue Type: Improvement
Components: Algorithms
Affects Versions: 0.13.2
Reporter: Trevor Grant
Assignee: Trevor Grant
Fix For: 0.13.1
Primarily, we need to lay out the clustering section of the Algorihtms Framework.
The Canopy Clustering Algorithm is very simple and yet very useful as a preprocessing step for more advanced clustering algorithms such as KMeans and Hierarchical Clustering.
https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
The majority of the "work" on this PR will be creating the framework.
It is also one of the Legacy MR algorithms that would be nice to port.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Loading...