We introduce a scalable algorithm for clustering lines and affine subspaces of fixed dimensions via data reduction using a coreset approximating scheme. A key characteristic of the proposed algorithm is exploiting a scatter-then-aggregate scheme for data partitioning, which allows to maintain a constant coreset approximation size. This key feature enables a novel algorithm which streams as well as distributes workload. We study how the proposed algorithm scales in a practical setting and compare it with other clustering methods. Results of experiments executed on the PARAM supercomputer confirm that the time complexity of algorithm is log-linear in problem size and decreases linearly as the number of processors increase. Source code for the algorithm, the experiments, and other potential applications is provided.
Discussion(0)
No comments yet. Be the first to comment.