logo

SCIENCE CHINA Information Sciences, Volume 60, Issue 12: 122103(2017) https://doi.org/10.1007/s11432-016-9020-4

PSVM: a preference-enhanced SVM model using preference data for classification

More info
  • ReceivedOct 26, 2016
  • AcceptedJan 21, 2017
  • PublishedJul 11, 2017

Abstract

Classification is an essential task in data mining, machine learning and pattern recognition areas. Conventional classification models focus on distinctive samples from different categories. There are fine-grained differences between data instances within a particular category. These differences form the preference information that is essential for human learning, and, in our view, could also be helpful for classification models. In this paper, we propose a preference-enhanced support vector machine (PSVM), that incorporates preference-pair data as a specific type of S information into SVM. Additionally, we propose a two-layer heuristic sampling method to obtain effective preference-pairs, and an extended sequential minimal optimization (SMO) algorithm to fit PSVM. To evaluate our model, we use the task of knowledge base acceleration-cumulative citation recommendation (KBA-CCR) on the TREC-KBA-2012 dataset and seven other datasets from UCI, StatLib and mldata.org. The experimental results show that our proposed PSVM exhibits high performance with official evaluation metrics.


Acknowledgment

This work was supported by National Key Research and Development Program of China (Grant No. 2016YFB1000902), National Basic Research Program of China (973 Program) (Grant No. 2013CB329 600), National Natural Science Foundation of China (Grant No. 61472040), and National Natural Science Basic Research Plan in Shaanxi Province of China (Grant No. 2016JM6082).


References

[1] Vapnik V, Vashist A, Pavlovitch N. Learning using hidden information (learning with teacher). In: Proceedings of International Joint Conference on Neural Networks, Atlanta, 2009. 3188--3195. Google Scholar

[2] Sharmanska V, Quadrianto N, Lampert C H. Learning to rank using privileged information. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), Sydney, 2013. 825--832. Google Scholar

[3] Wang Z, Gao T, Ji Q. Learning with hidden information using a max-margin latent variable model. In: Proceedings of the 22nd International Conference on Pattern Recognition (ICPR), Stockholm, 2014. 1389--1394. Google Scholar

[4] Feyereisl J, Kwak S, Son J, et al. Object localization based on structural SVM using privileged information. Adv Neural Inf Process Syst, 2014, 1: 208--216. Google Scholar

[5] Vladimir V N, Vapnik V. The nature of statistical learning theory. IEEE Trans Neural Netw, 1995, 8: 1564--1564. Google Scholar

[6] Tsang I W, Kwok J T, Cheung P M. Core vector machines: fast SVM training on very large data sets. J Mach Learn Res, 2005, 6: 363--392. Google Scholar

[7] Sun C Y, Mu C X, Li X M. A weighted LS-SVM approach for the identification of a class of nonlinear inverse systems. Sci China Ser F-Inf Sci, 2009, 52: 770-779 CrossRef Google Scholar

[8] Yang T, Li Y F, Mahdavi M, et al. Nystrom method vs random fourier features: a theoretical and empirical comparison. Adv Neural Inf Process Syst, 2012, 1: 476--484. Google Scholar

[9] Qu A P, Chen J M, Wang L W, et al. Segmentation of hematoxylin-eosin stained breast cancer histopathological images based on pixel-wise SVM classifier. Sci China Inf Sci, 2015, 58: 092105. Google Scholar

[10] Pechyony D, Izmailov R, Vashist A, et al. SMO-style algorithms for learning using privileged information. In: Proceedings of the 2010 International Conference on Data Mining, Las Vegas, 2010. 235--241. Google Scholar

[11] Kuo T M, Lee C P, Lin C J. Large-scale kernel rankSVM. In: Proceedings of the SIAM International Conference on Data Mining. New York: ACM, 2014. 812--820. Google Scholar

[12] Herbrich R, Graepel T, Obermayer K. Large margin rank boundaries for ordinal regression. In: Advances in Large Margin Classfiers. Cambridge: MIT Press, 2000. 115--132. Google Scholar

[13] Cao Y, Xu J, Liu T Y, et al. Adapting RankingSVM to document retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2006. 186--193. Google Scholar

[14] Yu H, Kim Y, Hwang S. RV-SVM: an efficient method for learning ranking SVM. In: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2009. 426--438. Google Scholar

[15] Schohn G, Cohn D. Less is more: active learning with support vector machines. In: Proceedings of the 17th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 2000. 839--846. Google Scholar

[16] Tong S, Koller D. Support vector machine active learning with applications to text classification. J Mach Learn Res, 2001, 2: 45--66. Google Scholar

[17] Brinker K. Incorporating diversity in active learning with support vector machines. In: Proceedings of the 20th International Conference on Machine Learning, Washington, 2003. 59--66. Google Scholar

[18] Brinker K. Active learning of label ranking functions. In: Proceedings of the 21st International Conference on Machine Learning. New York: ACM, 2004. 17. Google Scholar

[19] Furnkranz J, Hullermeier E. Pairwise preference learning and ranking. In: Proceedings of the 14th European Conference on Machine Learning. Berlin: Springer, 2003. 145--156. Google Scholar

[20] Yu H. SVM selective sampling for ranking with application to data retrieval. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York: ACM, 2005. 354--363. Google Scholar

[21] Yu H. Selective sampling techniques for feedback-based data retrieval. Data Min Knowl Disc, 2011, 22: 1-30 CrossRef Google Scholar

[22] Lin K Y, Jan T K, Lin H T. Data selection techniques for large-scale rank SVM. In: Proceedings of International Conference on Technologies and Applications of Artificial Intelligence (TAAI), Taipei, 2013. 25--30. Google Scholar

[23] Scholkopf B, Herbrich R, Smola A J. A generalized representer theorem. In: Proceedings of the 14th Annual Conference on Computational Learning Theory. London: Springer, 2001. 416--426. Google Scholar

[24] John P. Fast training of support vector machines using sequential minimal optimization. Cambridge: MIT Press, 1999. 185--208. Google Scholar

[25] Fan R E, Chen P H, Lin C J. Working set selection using second order information for training support vector machines. J Mach Learn Res, 2005, 6: 1889--1918. Google Scholar

[26] Robertson S E, Soboroff I. The Trec 2002 Filtering Track Report. Technical Report, In TREC02. 2003. Google Scholar

[27] Wang J G, Song D D, Lin C Y, et al. Bit and Msra at Trec Kba Ccr Track 2013. Technical Report, DTIC Document. 2013. Google Scholar

[28] Wang J G, Liao L J, Song D D, et al. Resorting relevance evidences to cumulative citation recommendation for knowledge base acceleration. In: Proceedings of International Conference on Web-Age Information Management. Berlin: Springer, 2015. 169--180. Google Scholar

[29] Kjersten B, McNamee P. The Hltcoe Approach to the Trec 2012 Kba Track. Technical Report, In TREC12. 2013. Google Scholar

[30] Liu X, Fang H. Entity Profile Based Approach in Automatic Knowledge Finding. Technical Report, In TREC12. 2013. Google Scholar

[31] Balog K, Ramampiaro H, Takhirov N, et al. Multi-step classification approaches to cumulative citation recommendation. In: Proceedings of the 10th Conference on Open Research Areas in Information Retrieval. New York: ACM, 2013. 121--128. Google Scholar

  • Table 1   Datasets used in experiments
    Name #Instances #Attributes Source bigstrut
    pyrim 74 27 UCI
    ailerons 8694 40 UCI
    concrete 1030 8 UCI
    bank8fm 8192 9 Mldata.org
    Wisconsin 194 33 Mldata.org
    bodyfat 252 14 StatLib
    kin8nm 8192 9 Mldata.org
  • Table 2   F1 performance of different methods on seven datasets
    Dataset PSVM-L PSVM-R SVM RankingSVM SVM+bigstrut
    pyrim 0.896 0.774 0.758 0.722 0.838
    ailerons 0.833 0.805 0.728 0.717 0.806
    concrete 0.849 0.748 0.744 0.729 0.894
    bank8fm 0.935 0.928 0.851 0.766 0.896
    Wisconsin 0.705 0.647 0.647 0.658 0.647
    bodyfat 0.969 0.913 0.853 0.847 0.968
    kin8nm 0.879 0.814 0.794 0.799 0.877
  • Table 3   Accuracy performance of different methods on seven datasets
    Dataset PSVM-L PSVM-R SVM RankingSVM SVM+bigstrut
    pyrim 0.885 0.731 0.731 0.615 0.807
    ailerons 0.815 0.764 0.765 0.603 0.805
    concrete 0.832 0.772 0.764 0.627 0.886
    bank8fm 0.937 0.930 0.853 0.695 0.896
    Wisconsin 0.667 0.621 0.636 0.591 0.621
    bodyfat 0.965 0.907 0.849 0.791 0.965
    kin8nm 0.889 0.798 0.787 0.755 0.879
  • Table 4   Learning time of evaluated models for all datasets (ms)
    Dataset PSVM-L PSVM-R SVM RankingSVM SVM+bigstrut
    pyrim 14 3 4 91 21
    ailerons 2399 2396 5618 11946 1410733
    concrete 57 60 52 176 28264
    bank8fm 3108 2699 2587 5813 37767
    Wisconsin 1 6 2 8 5
    bodyfat 2 1 5 3 7
    kin8nm 1330 7048 1744 17639 16400
  • Table 5   Four-level relevance estimation scale
    Level Definition
    Garbage Not relevant; e.g., spam
    Neutral Not relevant; nothing can be learned about the target entity
    Relevant Relates indirectly to the target entity; e.g., mentions topics or events that are likely to have an impact on the entity
    Central Relates directly to the target entity; e.g., the entity is a central figure in the mentioned topics or events
  • Table 6   Number of training and testing instances labelled
    Level #Training instances #Testing instances #Subtotal #Total
    Garbage 9382 20439 29821
    4[8]*57755
    Neutral 1757 2470 4227 bigstrut
    Relevant 6500 8426 14926 bigstrut
    Central 3525 5256 8781 bigstrut
  • Table 7   Overall results of evaluated methods
    2[4]*Method Central Only Central+Relevant
    bigstrut
    F1 SU $\sigma$ Time (ms) F1 SU $\sigma$ Time (ms) bigstrut
    PSVM-2L_10 0.320 0.344 0.229 1660 0.713 0.710 0.214 2230
    PSVM-2L_30 0.321 0.370 0.227 7620 0.690 0.682 0.277 6270
    PSVM-2L_50 0.322 0.349 0.231 17660 0.701 0.712 0.253 8090
    PSVM-2L_100 0.318 0.358 0.228 5710 0.689 0.682 0.279 2380
    PSVM-2L_250 0.335 0.338 0.213 30260 0.694 0.694 0.232 17750
    PSVM-2L_300 0.333 0.354 0.217 20580 0.717 0.714 0.220 23570
    PSVM-R_10 0.320 0.344 0.225 3410 0.688 0.680 0.279 2820
    PSVM-R_30 0.327 0.333 0.222 2090 0.688 0.680 0.279 5020
    PSVM-R_50 0.318 0.319 0.228 8680 0.688 0.680 0.279 13430
    PSVM-R_100 0.322 0.315 0.227 21060 0.688 0.680 0.279 14440
    PSVM-R_250 0.327 0.342 0.225 24720 0.688 0.680 0.279 10490
    PSVM-R_300 0.318 0.320 0.228 18030 0.688 0.680 0.279 17260
    SVM 0.338 0.371 0.222 2190 0.688 0.681 0.279 2110
    RankingSVM 0.325 0.291 0.235 44867 0.613 0.604 0.276 44867
    SVM+ 0.329 0.327 0.226 259161 0.689 0.682 0.278 438359
    HLTCOE 0.359 0.402 0.242 0.492 0.555 0.256
    UDel 0.355 0.331 0.208 0.597 0.591 0.213
    3-step RF 0.351 0.347 0.215 0.691 0.673 0.260

    “–`”means that it is a reference baseline, so we can not conduct the experiments again.

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1