As the symbol of the partition clustering method, K-Means is well known and widely used in many fields for the easily implemented and high efficiency. However, the initial center problem may affect the final cluster result, sometimes the final cluster result might contain some empty clusters. In this paper, a new K-Mean initialization method is proposed which combines the statistical information and the distance computation. The statistical information contains the mean, median, and Gaussian kernel density estimation. At first, the high density points are selected for each dimension. Then the distance and the density are used to measure every possible initial centers. After this process works from high variance dimension to low variance ones, the final initial cluster centers are constructed with the K nearest neighbors. Experiments on public datasets show that this method can achieve comparable results compared with other conventional methods.
Discussion(0)
No comments yet. Be the first to comment.