526 publications from this institution
Stacking is a general ensemble method in which a number of base classifiers are combined using one meta-classifier which learns their outputs. Such an approach provides certain advantages: simplicity; performance that is similar to the best classifier; and the capability of combining classifiers induced by different inducers. The disadvantage of stacking is that on multiclass problems, stacking seems to perform worse than other meta-learning approaches. In this paper we present Troika, a new stacking method for improving ensemble classifiers. The new scheme is built from three layers of combining classifiers. The new method was tested on various datasets and the results indicate the superiority of the proposed method to other legacy ensemble schemes, Stacking and StackingC, especially when the classification task consists of more than two classes.
Time series classification is a research area which has drawn much attention over the past decade. A novel approach for classification of time series uses shapelets. A shapelet is a subsequence extracted from one of the time series in the dataset which best separates between time series coming from different classes of the data set. A disadvantage of current shapelet-based classification approaches is their high time and memory consumption, which results from the examination of all possible subsequences. In this study, our initial goal was to find an evaluation order of the shapelets space which enables fast generation of an accurate classification model with a small memory footprint. The comparative analysis we conducted clearly indicates that a random evaluation order yields the best results. We present an algorithm for randomized model generation for shapelet-based classification that can generate a model with surprisingly high accuracy after evaluating only an exceedingly small fraction (∼ 10-3) of the shapelets space and has modest memory requirements. We propose several methods for estimating the number of shapelets to examine, and present extensive evaluation on 51 data sets establishing the effectiveness of our approach.