Name

sxk_means_groups - determine 'best' number of clusters in the data using K-means classification of a set of images

Usage

Usage in command lines:

sxk_means_groups.py sstackfile output_file <maskfile> --K1=Min_number_of_Cluster --K2=Max_number_of_Clusters --trials=Number_of_trials_of_K-means --CTF --rand_seed=1000 --maxit=Maximum_number_of_iterations --MPI --debug

Usage in python programming:

k_means_groups(stack, out_file, maskname,"SSE", K1, K2, rand_seed, maxit, trials, CTF, MPI=False, DEBUG=False, flagnorm=False)

MPI Note: the MPI version is parallized with number of trial.

Example:

sxk_means_groups.py hri_stack.hdf RES mask2d_23.hdf --K1=2 --K2=10 --maxit=500 --trials=5

mpirun -np 5 sxk_means_groups.py bdd:hri_stack RES mask2d_23.hdf --K1=2 --K2=10 --maxit=1000 --rand_seed=100 --MPI

Note: when 2D input images were aligned (see sxali2d), the program will apply the 2D alignment parameters (xform.align2d) stored in headers prior to clustering.

Input

stackfile
The input stack of images
output_file
text file in which values of clustering criteria are be stored
mask

filename for input image mask. The input image are considered only for pixels mask that have value > 0.5. Note: has to have the same dimensions as the input (default = None, entire images will be used)

K1
minimum requested number of clusters
K2
maximum requested number of clusters
trials
number of trials of K-means (see description below) (default one trial). In mpi version, the program ignore --trials option and internally set trials as the number of cpu used.
CTF
if set, CTF information stored in file headers will be used (default no CTF).
rand_seed
the seed used to generating random numbers (set to -1, means different and pseudo-random each time)
MPI
to use MPI version of k-means groups

Output

output_file
text file will contain columns according the criteria chosen, for example if crit='CHD', the columns of numbers: (1) number of clusters, (2) values of Coleman criterion, (3) values of Harabasz criterion and (4) values of Davies-Bouldin criterion
output_file.p
file contain a gnuplot script, this file allow plot directly the values of all criteria with the same range. Use this command in gnuplot: load 'output_file.p'
WATCH_GRP_KMEANS or WATCH_MPI_GRP_KMEANS
file contain the progress of k-means groups. This file can be read in real-time to watch the evolution of criteria.

Description

Reference

Author / Maintainer

Julien Bert, Guozhi Tao

Keywords

category 1
APPLICATIONS

Files

statistics.py, sxk_means_groups.py

See also

sxk_means sxk_means_stable

Maturity

alpha
works, even if slowly.

Bugs

None. It is perfect.

sxk_means_groups (last edited 2015-05-28 20:12:01 by penczek)