Incremental model-update strategies are widely used in machine learning and data mining.By "incremental update" we refer to models that are updated many times using small subsets of the training data.Two wellknown examples are stochastic gradient and MCMC.Both provide fast sequential performance and have generated many of the best-performing methods for particular problems (logistic regression, SVM, LDA etc.).But these methods are difficult to adapt to parallel or cluster settings because of the overhead of distributing model updates through the network.Updates can be locally batched to reduce communication overhead, but convergence typically suffers as the batch size increases.In this paper we introduce and analyze butterfly mixing, an approach which interleaves communication with computation.We evaluate butterfly mixing on stochastic gradient algorithms for logistic regression and SVM, on two datasets.Results show that butterfly mix steps are fast and failure-tolerant, and overall we achieved a 3.3x speedup over full mix (AllReduce) on an Amazon EC2 cluster.
Discussion(0)
No comments yet. Be the first to comment.