Traditional bag-of-words model and recent word-sequence kernel are two well-known techniques in the field of text categorization. Bag-of-words representa-tion neglects the word order, which could result in less computation accuracy for some types of documents. Word-sequence kernel takes into account word order, but does not include all information of the word fre-quency. A weighted kernel model that combines these two models was proposed by the authors [1]. This pa-per is focused on the optimization of the weighting pa-rameters, which are functions of word frequency. Ex-periments have been conducted with Reuter’s data-base and show that the new weighted kernel achieves better classification accuracy.
Discussion(0)
No comments yet. Be the first to comment.