Discussion:
[jira] [Created] (MAHOUT-1871) Kmeans - java.lang.IllegalStateException: No input clusters found..... Check your -c argument
Juan Carlos Sipan Robles (JIRA)
2016-06-12 16:40:20 UTC
Permalink
Juan Carlos Sipan Robles created MAHOUT-1871:
------------------------------------------------

Summary: Kmeans - java.lang.IllegalStateException: No input clusters found..... Check your -c argument
Key: MAHOUT-1871
URL: https://issues.apache.org/jira/browse/MAHOUT-1871
Project: Mahout
Issue Type: Question
Components: Clustering
Affects Versions: 0.12.1
Environment: S.O. Centos 6.5
hadoop 2.7.2

Reporter: Juan Carlos Sipan Robles
Priority: Critical
Fix For: 0.12.1


By using the kmeans with the following parameters gives the following error.

16/06/12 17:35:43 INFO KMeansDriver: convergence: 0.5 max Iterations: 10
16/06/12 17:35:43 INFO CodecPool: Got brand-new decompressor [.deflate]
Exception in thread "main" java.lang.IllegalStateException: No input clusters found in /mdb/clustered_data/part-randomSeed. Check your -c argument.
at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:213)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:110)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
[SSH] exit-status: 1
Finished: FAILURE


Command Execution:
hdfs dfs -rm -R /mdb/mahout_vectors/
hdfs dfs -rm -R /mdb/mahout_seq/
hdfs dfs -rm -R /mdb/mahout_data/
hdfs dfs -rm -R /mdb/clustered_data/
echo ##### SE ELIMINAN LAS CARPETAS DE HDFS#####
hdfs dfs -mkdir /mdb/mahout_vectors/
hdfs dfs -mkdir /mdb/mahout_seq/
hdfs dfs -mkdir /mdb/mahout_data/
hdfs dfs -mkdir /mdb/clustered_data/
echo ##### subimos el fichero #####
hdfs dfs -put $fichero /mdb/mahout_data/
echo ##### generamos ficheros secuenciales#####
mahout seqdirectory -i /mdb/mahout_data/ -o /mdb/mahout_seq -c UTF-8 -chunk 64 -xm sequential
echo ##### generamos los vectores #####
mahout seq2sparse -i /mdb/mahout_seq/ -o /mdb/mahout_vectors/ --namedVector
echo ##### ejecutamos el kmeans #####
mahout kmeans -i /mdb/mahout_vectors/tfidf-vectors/ -c /mdb/clustered_data -o /mdb/mahout_data -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -x 10 -k 20 -ow --clustering






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-09-07 04:36:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1871:
----------------------------------
Fix Version/s: (was: 0.12.1)
0.13.0
Post by Juan Carlos Sipan Robles (JIRA)
Kmeans - java.lang.IllegalStateException: No input clusters found..... Check your -c argument
---------------------------------------------------------------------------------------------
Key: MAHOUT-1871
URL: https://issues.apache.org/jira/browse/MAHOUT-1871
Project: Mahout
Issue Type: Question
Components: Clustering
Affects Versions: 0.12.1
Environment: S.O. Centos 6.5
hadoop 2.7.2
Reporter: Juan Carlos Sipan Robles
Priority: Critical
Fix For: 0.13.0
By using the kmeans with the following parameters gives the following error.
16/06/12 17:35:43 INFO KMeansDriver: convergence: 0.5 max Iterations: 10
16/06/12 17:35:43 INFO CodecPool: Got brand-new decompressor [.deflate]
Exception in thread "main" java.lang.IllegalStateException: No input clusters found in /mdb/clustered_data/part-randomSeed. Check your -c argument.
at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:213)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:110)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
[SSH] exit-status: 1
Finished: FAILURE
hdfs dfs -rm -R /mdb/mahout_vectors/
hdfs dfs -rm -R /mdb/mahout_seq/
hdfs dfs -rm -R /mdb/mahout_data/
hdfs dfs -rm -R /mdb/clustered_data/
echo ##### SE ELIMINAN LAS CARPETAS DE HDFS#####
hdfs dfs -mkdir /mdb/mahout_vectors/
hdfs dfs -mkdir /mdb/mahout_seq/
hdfs dfs -mkdir /mdb/mahout_data/
hdfs dfs -mkdir /mdb/clustered_data/
echo ##### subimos el fichero #####
hdfs dfs -put $fichero /mdb/mahout_data/
echo ##### generamos ficheros secuenciales#####
mahout seqdirectory -i /mdb/mahout_data/ -o /mdb/mahout_seq -c UTF-8 -chunk 64 -xm sequential
echo ##### generamos los vectores #####
mahout seq2sparse -i /mdb/mahout_seq/ -o /mdb/mahout_vectors/ --namedVector
echo ##### ejecutamos el kmeans #####
mahout kmeans -i /mdb/mahout_vectors/tfidf-vectors/ -c /mdb/clustered_data -o /mdb/mahout_data -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -x 10 -k 20 -ow --clustering
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Suneel Marthi (JIRA)
2016-10-14 03:35:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/MAHOUT-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi resolved MAHOUT-1871.
-----------------------------------
Resolution: Not A Bug

U need to provide initial centroids - IIRC u can either provide the initial centroids or provide a folder with -c option to generate random initial centroids . You may want to check the examples/bin/cluster-reuters.sh KMeans to see how its being done.
Post by Juan Carlos Sipan Robles (JIRA)
Kmeans - java.lang.IllegalStateException: No input clusters found..... Check your -c argument
---------------------------------------------------------------------------------------------
Key: MAHOUT-1871
URL: https://issues.apache.org/jira/browse/MAHOUT-1871
Project: Mahout
Issue Type: Question
Components: Clustering
Affects Versions: 0.12.1
Environment: S.O. Centos 6.5
hadoop 2.7.2
Reporter: Juan Carlos Sipan Robles
Priority: Critical
Fix For: 0.13.0
By using the kmeans with the following parameters gives the following error.
16/06/12 17:35:43 INFO KMeansDriver: convergence: 0.5 max Iterations: 10
16/06/12 17:35:43 INFO CodecPool: Got brand-new decompressor [.deflate]
Exception in thread "main" java.lang.IllegalStateException: No input clusters found in /mdb/clustered_data/part-randomSeed. Check your -c argument.
at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:213)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:110)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
[SSH] exit-status: 1
Finished: FAILURE
hdfs dfs -rm -R /mdb/mahout_vectors/
hdfs dfs -rm -R /mdb/mahout_seq/
hdfs dfs -rm -R /mdb/mahout_data/
hdfs dfs -rm -R /mdb/clustered_data/
echo ##### SE ELIMINAN LAS CARPETAS DE HDFS#####
hdfs dfs -mkdir /mdb/mahout_vectors/
hdfs dfs -mkdir /mdb/mahout_seq/
hdfs dfs -mkdir /mdb/mahout_data/
hdfs dfs -mkdir /mdb/clustered_data/
echo ##### subimos el fichero #####
hdfs dfs -put $fichero /mdb/mahout_data/
echo ##### generamos ficheros secuenciales#####
mahout seqdirectory -i /mdb/mahout_data/ -o /mdb/mahout_seq -c UTF-8 -chunk 64 -xm sequential
echo ##### generamos los vectores #####
mahout seq2sparse -i /mdb/mahout_seq/ -o /mdb/mahout_vectors/ --namedVector
echo ##### ejecutamos el kmeans #####
mahout kmeans -i /mdb/mahout_vectors/tfidf-vectors/ -c /mdb/clustered_data -o /mdb/mahout_data -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -x 10 -k 20 -ow --clustering
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Loading...