2,979 publications from this institution
This paper introduces a concept and design of decision trees based on information granules - multivariable entities characterized by high homogeneity (low variability). As such granules are developed via fuzzy clustering and play a pivotal role in the growth of the decision trees, they will be referred to as C-fuzzy decision trees. In contrast with "standard" decision trees in which one variable (feature) is considered at a time, this form of decision trees involves all variables that are considered at each node of the tree. Obviously, this gives rise to a completely new geometry of the partition of the feature space that is quite different from the guillotine cuts implemented by standard decision trees. The growth of the C-decision tree is realized by expanding a node of tree characterized by the highest variability of the information granule residing there. This paper shows how the tree is grown depending on some additional node expansion criteria such as cardinality (number of data) at a given node and a level of structural dependencies (structurability) of data existing there. A series of experiments is reported using both synthetic and machine learning data sets. The results are compared with those produced by the "standard" version of the decision tree (namely, C4.5).
The accuracy and integrity of the actual production data influence the reliability and stability of sintering process in steel industry. However, the actual production data may encounter various outliers due to noise, sensor failure, and operator negligence existing in this process. To tackle this issue, this article develops an original framework for the detection and correction of abnormal production data in the sintering process. First, an improved kernel-based Fuzzy C-Means algorithm is developed to effectively divide normal production data under multiple operating conditions. Then, different one-class support vector machine (SVM) classifiers are constructed for different operating conditions. According to which operating condition the actual production data belongs to, the one-class SVM under this operating condition is called to accurately detect abnormal production data. Finally, the most similar normal historical data in the operating condition is obtained to correct the abnormal data by using k nearest neighbor algorithm based on the Mahalanobis distance. Simulation results involving actual production data illustrate the effectiveness of the proposed method. By taking two existing models of the sintering process as examples, their prediction performance becomes improved after detecting and correcting the abnormal production data, so that the proposed framework has important engineering application impact.