About reuters-fkmeans-centroids

Discussion:

Prakash Poudyal

2016-04-27 17:57:41 UTC

Hi!

I am using fuzzy clustering, but I could not understand " -c
reuters-fkmeans-centroids ". How to calculate this ?

$ /bin/mahout fkmeans -i reuters-vectors/tfidf-vectors/ -c
reuters-fkmeans-centroids -o reuters-fkmeans-clusters -cd 1.0 -k 21 -m 2
-ow -x 10 -dm
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure

--
Regards
Prakash Poudyal

Suneel Marthi

2016-04-28 17:26:57 UTC

Permalink

First thing, most of this code is legacy MapReduce and is not supported
anymore. Hence you r not seeing answers.

Back to ur question: -c specifies the folder for the initial centroids that
r randomly generated. IIR, the centroids are generated when u execute the
Clustering Driver.

Post by Prakash Poudyal
Hi!
I am using fuzzy clustering, but I could not understand " -c
reuters-fkmeans-centroids ". How to calculate this ?
$ /bin/mahout fkmeans -i reuters-vectors/tfidf-vectors/ -c
reuters-fkmeans-centroids -o reuters-fkmeans-clusters -cd 1.0 -k 21 -m 2
-ow -x 10 -dm
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
--
Regards
Prakash Poudyal

Prakash Poudyal

2016-04-28 17:54:37 UTC

Permalink

Dear Suneel,

Thank you so much for your reply, I was waiting for long time.

Actually, I need to use fuzzy clustering to cluster the sentence in my
research. I found fuzzy k clustering algorithm in Apache Mahout, thus, I
am trying to use it for my purpose.

Regarding your reply, of "first thing" if I cannot see the answer what I am
doing, than I may be in wrong direction. Can tell me, give some guideline
to the requirement as I mention above.

Next, about -c centroids, you we get the -c centroids after we execute the
Clustering Driver only. If you know the some helpful link, can you share.

Thank you so much. I was being stuck since last two days. Hope you will
reply me sooner.

Prakash

Post by Suneel Marthi
First thing, most of this code is legacy MapReduce and is not supported
anymore. Hence you r not seeing answers.
Back to ur question: -c specifies the folder for the initial centroids that
r randomly generated. IIR, the centroids are generated when u execute the
Clustering Driver.

--
Regards
Prakash Poudyal

Ted Dunning

2016-04-28 17:58:51 UTC

Permalink

Post by Prakash Poudyal
Actually, I need to use fuzzy clustering to cluster the sentence in my
research. I found fuzzy k clustering algorithm in Apache Mahout, thus, I
am trying to use it for my purpose.

That's great.

But that code is no longer supported.

Prakash Poudyal

2016-04-28 18:05:54 UTC

Permalink

Hi! Ted,

You mean Mahout is no more supporting "fuzzy K clustering for the
sentences". Can you clarify in more detail . :(

Prakash

On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal <

That's great.
But that code is no longer supported.

--
Regards
Prakash Poudyal

Suneel Marthi

2016-04-28 18:09:12 UTC

Permalink

Yes, the entire MapReduce code (which includes the fuzzy clustering that u
r looking at) is not supported anymore as of Mahout 0.10.0 (suggest reading
the release notes on mahout.apache.org)

Post by Prakash Poudyal
Hi! Ted,
You mean Mahout is no more supporting "fuzzy K clustering for the
sentences". Can you clarify in more detail . :(
Prakash

On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal <

Post by Prakash Poudyal
Actually, I need to use fuzzy clustering to cluster the sentence in my
research. I found fuzzy k clustering algorithm in Apache Mahout,

thus, I

Post by Prakash Poudyal
am trying to use it for my purpose.

That's great.
But that code is no longer supported.

--
Regards
Prakash Poudyal

Dmitriy Lyubimov

2016-04-28 18:10:38 UTC

Permalink

Prakash,

if you are using any Mahout Mapreduce algorithm for research, please make
sure to make this disclosure:

all Mahout MapReduce algorithms are officially not supported and deprecated
since February, 2014 (IIRC). I can dig up a specific issue regarding this.
There also has been an announcement.

So before you really start drawing any comparisons, please be advised that
you are starting with algoritms 2+ years even since their EOL (let alone
inception).

Thanks.
-D

Post by Prakash Poudyal
Hi! Ted,
You mean Mahout is no more supporting "fuzzy K clustering for the
sentences". Can you clarify in more detail . :(
Prakash

On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal <

Post by Prakash Poudyal
Actually, I need to use fuzzy clustering to cluster the sentence in my
research. I found fuzzy k clustering algorithm in Apache Mahout,

thus, I

Post by Prakash Poudyal
am trying to use it for my purpose.

That's great.
But that code is no longer supported.

--
Regards
Prakash Poudyal

Suneel Marthi

2016-04-28 18:13:35 UTC

Permalink

That's correct, deprecated as of Feb 2014 and will be completely purged in
one of the upcoming releases (0.13.0)

Post by Dmitriy Lyubimov
Prakash,
if you are using any Mahout Mapreduce algorithm for research, please make
all Mahout MapReduce algorithms are officially not supported and deprecated
since February, 2014 (IIRC). I can dig up a specific issue regarding this.
There also has been an announcement.
So before you really start drawing any comparisons, please be advised that
you are starting with algoritms 2+ years even since their EOL (let alone
inception).
Thanks.
-D
On Thu, Apr 28, 2016 at 11:05 AM, Prakash Poudyal <

Post by Prakash Poudyal
Hi! Ted,
You mean Mahout is no more supporting "fuzzy K clustering for the
sentences". Can you clarify in more detail . :(
Prakash

On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal <

Post by Prakash Poudyal
Actually, I need to use fuzzy clustering to cluster the sentence in

Post by Prakash Poudyal

Post by Prakash Poudyal
research. I found fuzzy k clustering algorithm in Apache Mahout,

thus, I

Post by Prakash Poudyal
am trying to use it for my purpose.

That's great.
But that code is no longer supported.

--
Regards
Prakash Poudyal

Prakash Poudyal

2016-04-28 19:02:27 UTC

Permalink

Hi!

Thank you for your emails !!

Actually, I need to use fuzzy clustering to cluster the sentence in my
research. This is my goal.

I started to use Fuzzy K means clustering of Mahout since last week !!! I
found several blogs links, and many other helpful documents !!!! I was
going through, as being new, I realize this the best, easy and fast way to
know about Mahout works. In my opinion, many new commers do the same as I
do. After being used to the tools, than only people focus on the works and
go deeply.

I had gone through many blogs and sites to know about Mahout, some of them
are below :

http://technobium.com/introduction-to-clustering-using-apache-mahout/

http://tuxdna.github.io/pages/mahout.html

https://github.com/tdunning/MiA/blob/master/src/main/java/mia/clustering/ch09/FuzzyKMeansExample.java

http://www.programering.com/a/MDNwgTMwATI.html

https://www.safaribooksonline.com/library/view/apache-mahout-clustering/9781783284436/ch04.html

https://ymnliu.wordpress.com/2015/11/05/install-apache-mahout-in-eclipse/

https://mahout.apache.org/

What do you say about these sites !! Is these sites are not appropriate ???

I raise my problem several time, in mailing list and even IRC but I got
response !! just today :(

So finally, it would be great, if you could reply the answers of my
following question .

Is Apache Mahout appropriate tool for clustering sentences through
fuzzy-clustering ?

If answer is "YES"

Which version of Mahout ?

Can you write the steps that I need to followed, or give me appropriate
documentation (links) ?

Thanks
Prakash Poudyal
Portugal

Post by Suneel Marthi
That's correct, deprecated as of Feb 2014 and will be completely purged in
one of the upcoming releases (0.13.0)

Post by Dmitriy Lyubimov
Prakash,
if you are using any Mahout Mapreduce algorithm for research, please make
all Mahout MapReduce algorithms are officially not supported and

deprecated

Post by Dmitriy Lyubimov
since February, 2014 (IIRC). I can dig up a specific issue regarding

this.

Post by Dmitriy Lyubimov
There also has been an announcement.
So before you really start drawing any comparisons, please be advised

that

Post by Dmitriy Lyubimov
you are starting with algoritms 2+ years even since their EOL (let alone
inception).
Thanks.
-D
On Thu, Apr 28, 2016 at 11:05 AM, Prakash Poudyal <

Post by Prakash Poudyal
Hi! Ted,
You mean Mahout is no more supporting "fuzzy K clustering for the
sentences". Can you clarify in more detail . :(
Prakash

On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal <

Post by Prakash Poudyal
Actually, I need to use fuzzy clustering to cluster the sentence in

Post by Prakash Poudyal

Post by Prakash Poudyal
research. I found fuzzy k clustering algorithm in Apache Mahout,

thus, I

Post by Prakash Poudyal
am trying to use it for my purpose.

That's great.
But that code is no longer supported.

--
Regards
Prakash Poudyal

Prakash Poudyal

2016-04-28 20:37:59 UTC

Permalink

Dear Suneel, Dmitriy and Ted,

This is just gentle remainder to answer my confusion that I mention in my
previous email. It would be great if you could response me sooner, so that
I can go ahead.

Thank you so much.

Prakash

Post by Prakash Poudyal
Hi!
Thank you for your emails !!
Actually, I need to use fuzzy clustering to cluster the sentence in my
research. This is my goal.
I started to use Fuzzy K means clustering of Mahout since last week !!! I
found several blogs links, and many other helpful documents !!!! I was
going through, as being new, I realize this the best, easy and fast way to
know about Mahout works. In my opinion, many new commers do the same as I
do. After being used to the tools, than only people focus on the works and
go deeply.
I had gone through many blogs and sites to know about Mahout, some of them
http://technobium.com/introduction-to-clustering-using-apache-mahout/
http://tuxdna.github.io/pages/mahout.html
https://github.com/tdunning/MiA/blob/master/src/main/java/mia/clustering/ch09/FuzzyKMeansExample.java
http://www.programering.com/a/MDNwgTMwATI.html
https://www.safaribooksonline.com/library/view/apache-mahout-clustering/9781783284436/ch04.html
https://ymnliu.wordpress.com/2015/11/05/install-apache-mahout-in-eclipse/
https://mahout.apache.org/
What do you say about these sites !! Is these sites are not appropriate ???
I raise my problem several time, in mailing list and even IRC but I got
response !! just today :(
So finally, it would be great, if you could reply the answers of my
following question .
Is Apache Mahout appropriate tool for clustering sentences through
fuzzy-clustering ?
If answer is "YES"
Which version of Mahout ?
Can you write the steps that I need to followed, or give me
appropriate documentation (links) ?
Thanks
Prakash Poudyal
Portugal

Post by Suneel Marthi
That's correct, deprecated as of Feb 2014 and will be completely purged in
one of the upcoming releases (0.13.0)

Post by Dmitriy Lyubimov
Prakash,
if you are using any Mahout Mapreduce algorithm for research, please

make

Post by Dmitriy Lyubimov
all Mahout MapReduce algorithms are officially not supported and

deprecated

Post by Dmitriy Lyubimov
since February, 2014 (IIRC). I can dig up a specific issue regarding

this.

Post by Dmitriy Lyubimov
There also has been an announcement.
So before you really start drawing any comparisons, please be advised

that

Post by Dmitriy Lyubimov
you are starting with algoritms 2+ years even since their EOL (let alone
inception).
Thanks.
-D
On Thu, Apr 28, 2016 at 11:05 AM, Prakash Poudyal <

Post by Prakash Poudyal
Hi! Ted,
You mean Mahout is no more supporting "fuzzy K clustering for the
sentences". Can you clarify in more detail . :(
Prakash

On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal <

Post by Prakash Poudyal
Actually, I need to use fuzzy clustering to cluster the sentence

Post by Dmitriy Lyubimov
my

Post by Prakash Poudyal

Post by Prakash Poudyal
research. I found fuzzy k clustering algorithm in Apache Mahout,

thus, I

Post by Prakash Poudyal
am trying to use it for my purpose.

That's great.
But that code is no longer supported.

--
Regards
Prakash Poudyal

Khurrum Nasim

2016-04-28 20:51:51 UTC

Permalink

@Prakash - Albeit I’m a Mahout noob - if you can represent your problem as a network with 2d input then yes Mahout can be used (so i’ve heard).
IMO - every machine based computation problem can be represented as a graph - although this may not always be optimal.

Taking this notion of fuzzy clustering a bit further - Can it be applied to topics such as demand prediction ?

Post by Prakash Poudyal
Dear Suneel, Dmitriy and Ted,
This is just gentle remainder to answer my confusion that I mention in my
previous email. It would be great if you could response me sooner, so that
I can go ahead.
Thank you so much.
Prakash

Post by Suneel Marthi
That's correct, deprecated as of Feb 2014 and will be completely purged in
one of the upcoming releases (0.13.0)

Post by Dmitriy Lyubimov
Prakash,
if you are using any Mahout Mapreduce algorithm for research, please

make

Post by Dmitriy Lyubimov
all Mahout MapReduce algorithms are officially not supported and

deprecated

Post by Dmitriy Lyubimov
since February, 2014 (IIRC). I can dig up a specific issue regarding

this.

Post by Dmitriy Lyubimov
There also has been an announcement.
So before you really start drawing any comparisons, please be advised

that

Post by Dmitriy Lyubimov
you are starting with algoritms 2+ years even since their EOL (let alone
inception).
Thanks.
-D
On Thu, Apr 28, 2016 at 11:05 AM, Prakash Poudyal <

Post by Prakash Poudyal
Hi! Ted,
You mean Mahout is no more supporting "fuzzy K clustering for the
sentences". Can you clarify in more detail . :(
Prakash

On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal <

Post by Prakash Poudyal
Actually, I need to use fuzzy clustering to cluster the sentence

Post by Dmitriy Lyubimov
my

Post by Prakash Poudyal

Post by Prakash Poudyal
research. I found fuzzy k clustering algorithm in Apache Mahout,

thus, I

Post by Prakash Poudyal
am trying to use it for my purpose.

That's great.
But that code is no longer supported.

--
Regards
Prakash Poudyal

Dmitriy Lyubimov

2016-04-28 21:13:32 UTC

Permalink

Prakash,

(1) to be clear, the ASF trademark and branding policy is not to endorse
views of the 3rd party publications and to ask 3rd party writers to do a
disclosure that their views are not endorsed by ASF project. To that end,
ASF project can't really tell you that some publication is
"(in)appropriate". 3rd party publications are of their own account and
cannot be by default tied to the ASF views. That said, committers have
their opinions, which of course exhibit certain variation, and some things
do get linked on the site or mentioned on Twitter via Mahout account. But
some do not. Best practice is always to ask for pointers on the list first.

(2) I am not sure what your definition of "appropriate" is, but on personal
note, most of these links were quite "appropriate" at the time in the sense
that they were published prior to release 0.10 and 2/2014 or before 0.10,
and therefore were describing what was in the project at that time. Thus,
MIA fuzzy k-means example in your very link is dated back of June 2011 and
is relevant to release circa 0.6 or 0.7. So if you mean whether those
algorithms were "in the fold" back then, the answer is yes, they were. I
see no contradiction between these publications and the current reality.

(3) If something deprecated reasonably works for a particular purpose, I
think there's no reason not to use/write about it.

*However, I just don't think most of these particular deprecated Java-based
MR algorithms work for the purposes of an established benchmark or a
standard in a research -- modern edgy ML is usually much more faster (and
often, more convenient too). *

Don't mean to come across as preachy, but research is usually held to quite
different standard as it comes to claims, than an ad-hoc industrial
application or a blog entry. I simply can't see how any of MR stuff can
work for that purpose today.

(4) if your "appropriate"-ness question is really about why they were
deprecated, well, there are two main reasons for that. First, it seems that
the realization of MR limitations w.r.t. iterative applications quickly
caught up with both users and contributors, and, second, most contributors
abandoned their MR contributions (most likely for the same reason). I
contributed a couple of MR algorithms back in 2010-2011 but i am absolutely
fine with them being deprecated and written off the books. If something is
not being used, or people (exactly as your case has demonstrated) don't get
answers to their questions, or bugs are not being fixed, it is difficult to
justify keeping the code. It is much easier to focus on what is actually
being used and maintained instead. Here, the very banal and boring reason
for the deprecations.

(5) Finally, If your goal is simply to learn "how the project works", just
like Suneel said, i'd suggest to follow release notes and the project site
(news and howtos) -- your last link in fact should perhaps be your first.
And the list, of coure.

As you probably can tell by release notes, the last two years were
practically exclusively about multiplatform Mahout involvement with Spark,
Flink and H20 backends, as well as the Samsara environment for general
numeric analysis (but no MR stuff beyond very nominal fixes).

I also agree that it looks like the Mahout site perhaps should be more
clear about the status of MR algorithms (it used to be more clear, I think,
but every news eventually becomes an old news).

Hope this clarifies.

-d

Prakash Poudyal

2016-04-28 21:43:32 UTC

Permalink

Dear Dmitriy,

I really appreciate you as you write so long to clarify my confusion. Much
appreciated. Thank you so much :)

Regards
Prakash Poudyal

Post by Dmitriy Lyubimov
Prakash,
(1) to be clear, the ASF trademark and branding policy is not to endorse
views of the 3rd party publications and to ask 3rd party writers to do a
disclosure that their views are not endorsed by ASF project. To that end,
ASF project can't really tell you that some publication is
"(in)appropriate". 3rd party publications are of their own account and
cannot be by default tied to the ASF views. That said, committers have
their opinions, which of course exhibit certain variation, and some things
do get linked on the site or mentioned on Twitter via Mahout account. But
some do not. Best practice is always to ask for pointers on the list first.
(2) I am not sure what your definition of "appropriate" is, but on
personal note, most of these links were quite "appropriate" at the time in
the sense that they were published prior to release 0.10 and 2/2014 or
before 0.10, and therefore were describing what was in the project at that
time. Thus, MIA fuzzy k-means example in your very link is dated back of
June 2011 and is relevant to release circa 0.6 or 0.7. So if you mean
whether those algorithms were "in the fold" back then, the answer is yes,
they were. I see no contradiction between these publications and the
current reality.
(3) If something deprecated reasonably works for a particular purpose, I
think there's no reason not to use/write about it.
*However, I just don't think most of these particular deprecated
Java-based MR algorithms work for the purposes of an established benchmark
or a standard in a research -- modern edgy ML is usually much more faster
(and often, more convenient too). *
Don't mean to come across as preachy, but research is usually held to
quite different standard as it comes to claims, than an ad-hoc industrial
application or a blog entry. I simply can't see how any of MR stuff can
work for that purpose today.
(4) if your "appropriate"-ness question is really about why they were
deprecated, well, there are two main reasons for that. First, it seems that
the realization of MR limitations w.r.t. iterative applications quickly
caught up with both users and contributors, and, second, most contributors
abandoned their MR contributions (most likely for the same reason). I
contributed a couple of MR algorithms back in 2010-2011 but i am absolutely
fine with them being deprecated and written off the books. If something is
not being used, or people (exactly as your case has demonstrated) don't get
answers to their questions, or bugs are not being fixed, it is difficult to
justify keeping the code. It is much easier to focus on what is actually
being used and maintained instead. Here, the very banal and boring reason
for the deprecations.
(5) Finally, If your goal is simply to learn "how the project works", just
like Suneel said, i'd suggest to follow release notes and the project site
(news and howtos) -- your last link in fact should perhaps be your first.
And the list, of coure.
As you probably can tell by release notes, the last two years were
practically exclusively about multiplatform Mahout involvement with Spark,
Flink and H20 backends, as well as the Samsara environment for general
numeric analysis (but no MR stuff beyond very nominal fixes).
I also agree that it looks like the Mahout site perhaps should be more
clear about the status of MR algorithms (it used to be more clear, I think,
but every news eventually becomes an old news).
Hope this clarifies.
-d
On Thu, Apr 28, 2016 at 12:02 PM, Prakash Poudyal <

Post by Prakash Poudyal
Hi!
Thank you for your emails !!
Actually, I need to use fuzzy clustering to cluster the sentence in my
research. This is my goal.
I started to use Fuzzy K means clustering of Mahout since last week !!! I
found several blogs links, and many other helpful documents !!!! I was
going through, as being new, I realize this the best, easy and fast way to
know about Mahout works. In my opinion, many new commers do the same as I
do. After being used to the tools, than only people focus on the works and
go deeply.
I had gone through many blogs and sites to know about Mahout, some of
http://technobium.com/introduction-to-clustering-using-apache-mahout/
http://tuxdna.github.io/pages/mahout.html
https://github.com/tdunning/MiA/blob/master/src/main/java/mia/clustering/ch09/FuzzyKMeansExample.java
http://www.programering.com/a/MDNwgTMwATI.html
https://www.safaribooksonline.com/library/view/apache-mahout-clustering/9781783284436/ch04.html
https://ymnliu.wordpress.com/2015/11/05/install-apache-mahout-in-eclipse/
https://mahout.apache.org/
What do you say about these sites !! Is these sites are not appropriate ???
I raise my problem several time, in mailing list and even IRC but I got
response !! just today :(
So finally, it would be great, if you could reply the answers of my
following question .
Is Apache Mahout appropriate tool for clustering sentences through
fuzzy-clustering ?
If answer is "YES"
Which version of Mahout ?
Can you write the steps that I need to followed, or give me
appropriate documentation (links) ?
Thanks
Prakash Poudyal
Portugal

--
Regards
Prakash Poudyal

Suneel Marthi

2016-04-28 18:08:10 UTC

Permalink

Post by Prakash Poudyal
Dear Suneel,
Thank you so much for your reply, I was waiting for long time.
Actually, I need to use fuzzy clustering to cluster the sentence in my
research. I found fuzzy k clustering algorithm in Apache Mahout, thus, I
am trying to use it for my purpose.
Regarding your reply, of "first thing" if I cannot see the answer what I am
doing, than I may be in wrong direction. Can tell me, give some guideline
to the requirement as I mention above.

What I meant to convey was - u have not been seeing responses to ur
question since this is all legacy MR code that's not supported anymore.

Post by Prakash Poudyal
Next, about -c centroids, you we get the -c centroids after we execute the
Clustering Driver only. If you know the some helpful link, can you share.

I suggest u look at the code as opposed to just reading someone's blog
instructions. It should give u a better understanding of the
implementation details.

In the CLI that u r running, the -c is a folder for the generated
centroids. I suggest you look at the code to see how that's being done.

feel free to pose more questions.

Post by Prakash Poudyal
Thank you so much. I was being stuck since last two days. Hope you will
reply me sooner.
Prakash

that

Post by Suneel Marthi
r randomly generated. IIR, the centroids are generated when u execute

the

Post by Suneel Marthi
Clustering Driver.
On Wed, Apr 27, 2016 at 1:57 PM, Prakash Poudyal <

Post by Suneel Marthi

Post by Prakash Poudyal
-ow -x 10 -dm
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
--
Regards
Prakash Poudyal

--
Regards
Prakash Poudyal