Name
sxk_means_stable - Collect stable class averages with several independent runs of k-means
Usage
Usage in command line:
sxk_means_stable.py stack outdir <maskfile> --K=2 --nb_part=5 --th_nobj=10 --rand_seed=10 --maxit=1000 --normalize --CTF --MPI
Usage in python programming:
Normal version:
k_means_stab_stream(stack, outdir, maskfile, K, nb_part, th_onobj, rand_seed, CTF)
- SSE is the optimization method we recommend.
MPI version:
k_means_stab_MPI_stream(stack, outdir, maskfile, K, nb_part, th_nobj, rand_seed, CTF)
'Note: when 2D input images were aligned (see sxali2d), the program will apply the 2D alignment parameters (xform.align2d) stored in headers prior to clustering.
MPI Note: MPI version is under development.
To use MPI || version:
- 1. set the flag --MPI in command line
- 2. mpirun -np 32 sxk_means.py and the remaining parameters
- The above example is for mympi.
Examples:
sxk_means_stable.py data.hdf kmeans_stab mask2d_26.hdf --K=8 --nb_part=5
mpirun -np 5 sxk_means_stable.py 'bdb:data' kmeans_stab mask2d_26.hdf --K=8 --th_nobj=5 --MPI
Input
- stack
- the input stack images (bdb, hdf or txt)
- outdir
- name of directory where the results are written
- maskfile
- optional mask file to be used (bdb or hdf)
Parameters preceded with -- are optional and their default values are given in parenthesis.
minimum number of objects per class average required in the stable partition. All classes with a number of images per group < th_nobj will not transfer to the final partition (default 1, meaning keep all averages)
Output
- outdir/main_log.txt
- the main logfile, all steps are written in order to watch the progress of the program
- outdir/averages.hdf
- the final stable class averages
- outdir/averages_**.hdf
- intermediate class averages produce by the independent runs of clustering. '**' correspond to the number of cluster with a format '00', if the number of partitions is 5, for example, the directory outdir/ will contain averages_00.hdf through averages_04.hdf
Txt case
This function is able to use a text file format as input. The structure file must contain one 'image' per line, and data of image must be separate by a space, ex:
0.34 5.46 2.34 6.78
3.78 2.23 1.78 5.67
this file contain 2 'images' of 4 'pixels'. As this data will be convert to an image structure you can still use a hdf file to mask it. But the output directory will contain only text file format with the membership of each group for all partitions:
- outdir/kmeans_grp_***.txt
- the final stable membership, for each group *** (format to '000'), the file store the list of id of images in the group. If K is equal to 3, it will have tree files k_means_grp_000.txt, k_means_grp_001.txt and k_means_grp_002.txt
- outdir/k_means_part_**_grp_***.txt
- intermediate membership produce by independent run of clustering. For each partition ** format ('00') a file is produce for every groups containing the list of id of images.
Description
K_means_stable will use the function k_means to repeat independents clustering of the data set. For a complete details of k-means parameters (K, option method, ...) see the function page sxk_means. The random seed of each clustering appears to the mainlog.txt file. After repeat nb_part clustering, all partitions will be matched together to compare their membership. If there are two partitions, the matching algorithm used will be the optimal Hungarian algorithm. Otherwise, if the number of partitions is more than two, then the matching algorithm used is an in-house branching algorithm. Images appearing to the same cluster each time will be kept to create a stable averages. A coefficient of stability in percent appears to the mainlog.txt file, whose value reflects the similarity between membership. If the number of images in a stable group is under the value th_nobj, those images will be consider as not used , and this average will be remove of the final stable class averages. th_nobj allows to remove useless averages which contains fewer images. Number of final averages appears also to the mainlog.txt file.
Reference
This program used the function Munkres algorithm (or Hungarian algorithm) when the number of partition is two, http://www.clapper.org/bmc, BSD-like Licence, copyright (c) 2008 Brian M. Clapper. Otherwise the in house branch algorithm will be used.
Author / Maintainer
Julien Bert
Keywords
Files
applications.py
See also
Maturity
- beta
- works for author, often works for others.
Bugs
HDF file limitation to the number of attributes see bug to sxk_means