EnClass: Ensemble-Based Classification Model for Network Anomaly Detection in Massive Datasets
Article 2017 en
Authors
SG
Sahil Garg
AS
Amritpal Singh
SB
Shalini Batra
Abstract
1 min read
With an exponential increase in the Internet traffic over the network, there are growing concerns of identification of legitimate users which are the bulk sources of Internet traffic generation. However, due to the occurrence of anomalies in the network traffic, normal operations or the functionalities (traffic classification, resource allocation, and service management) of network get affected. Thus, in a given time frame, there is a requirement of anomalies detection in the network. The efficiency of any anomaly detection model mainly depends on the selection of relevant features and the learning algorithms which are used for classification of the network traffic patterns. However, due to curse of dimensionality, imbalance between classes, and variations in the types of anomalies, most of the existing solutions reported in the literature fail to deal with problems that occurs while detecting anomalies in large-scale network data. So, to remove these gaps in the existing solutions, we propose a new hybrid anomaly detection scheme called as Ensemble-based Classification Model for Network Anomaly Detection (EnClass) to detect anomalies in real- world networking datasets. EnClass has three modules as (i) Hoeffding-bound based clustering to identify the optimal subset of features to be taken for classification of network traffic (ii) Eigenvalues computation module to refine the features set for removal of unnecessary attributes and (iii) Very-fast decision tree for network traffic classification. In order to validate the proposed anomaly detection model, experimental evaluation is performed using real-world Knowledge Discovery and Data Mining (KDD'99) dataset with respect to parameters such as-detection rate, false positive rate, and F-score. The comparison with existing approaches clearly demonstrates the effectiveness of the EnClass in terms of detection rate (98.58%), false positive rate (0.42%), and F-score (96.06%).
Discussion(0)
No comments yet. Be the first to comment.