Malicious domain identification is an important task in the field of cyberspace security. However, most of existing work for this task heavily relies on expert experience when constructing machine learning features. What makes matters worse is that these features can be deliberately changed by attackers. As a result, such malicious domain identification methods are easily bypassed by cyber criminals. To solve this problem, in this paper, we propose a novel method for malicious domain identification by effectively learning time series shapelets, the discriminative local patterns of time series. More specifically, our method consists of two main components: 1) modeling user's habits of accessing domains by learning shapelets from domain time series. As the domain time series is generated by the crowd visiting websites, the learned user's habits of accessing domains can potentially reflect what type of service a domain provides, such as pornography, gambling and so on. 2) an outlier correction algorithm designed for a single time series and independent of the model which can enhance the robustness of shapelet initialization. We integrate shapelet learning and outlier correction in our model. Extensive experiments on real-world dataset demonstrates that our proposed method has better performance compared with state-of-the-art methods.
Discussion(0)
No comments yet. Be the first to comment.