logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 1 : 112103(2020) https://doi.org/10.1007/s11432-019-2633-y

On some aspects of minimum redundancy maximum relevance feature selection

More info
  • ReceivedApr 25, 2019
  • AcceptedAug 21, 2019
  • PublishedDec 24, 2019

Abstract

The feature selection is an important challenge in many areas of machine learning because it plays a crucial role in the interpretations of machine-driven decisions. There are various approaches to the feature selection problem and methods based on the information theory comprise an important group. Here, the minimum redundancy maximum relevance (mRMR) feature selection is undoubtedly the most popular one with widespread application. In this paper, we prove in contrast to an existing finding that the mRMR is not equivalent to Max-Dependency criterion for first-order incremental feature selection. We present another form of equivalence leading to a generalization of mRMR feature selection. Additionally, we compare several feature selection methods based on mRMR, Max-Dependency, and feature ranking, employing different measures of dependency. The results on high-dimensional real-world datasets show that the distance correlation is the suitable measure for dependency-based feature selection methods. The results also indicate that the Max-Dependency incremental algorithm combined with distance correlation appears to be a promising feature selection approach.


Acknowledgment

This work was supported by Slovak Research and Development Agency (Grant No. APVV-16-0211).


References

[1] Li Y, Li T, Liu H. Recent advances in feature selection and its applications. Knowl Inf Syst, 2017, 53: 551-577 CrossRef Google Scholar

[2] Li J D, Liu H. Challenges of Feature Selection for Big Data Analytics. IEEE Intell Syst, 2017, 32: 9-15 CrossRef Google Scholar

[3] Bolón-Canedo V, Sánchez-Maro?o N, Alonso-Betanzos A. Recent advances and emerging challenges of feature selection in the context of big data. Knowledge-Based Syst, 2015, 86: 33-45 CrossRef Google Scholar

[4] Ang J C, Mirzal A, Haron H. Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection.. IEEE/ACM Trans Comput Biol Bioinf, 2016, 13: 971-989 CrossRef PubMed Google Scholar

[5] Li J D, Cheng K W, Wang S H. Feature Selection. ACM Comput Surv, 2018, 50: 1-45 CrossRef Google Scholar

[6] Battiti R. Using mutual information for selecting features in supervised neural net learning.. IEEE Trans Neural Netw, 1994, 5: 537-550 CrossRef PubMed Google Scholar

[7] Kwak N, Chong-Ho Choi N. Input feature selection for classification problems.. IEEE Trans Neural Netw, 2002, 13: 143-159 CrossRef PubMed Google Scholar

[8] Cai R C, Hao Z F, Yang X W. An efficient gene selection algorithm based on mutual information. Neurocomputing, 2009, 72: 991-999 CrossRef Google Scholar

[9] Fleuret F. Fast binary feature selection with conditional mutual information. J Mach Learn Res, 2004, 5: 1531--1555. Google Scholar

[10] Cheng H R, Qin Z G, Feng C S. Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy. ETRI J, 2011, 33: 210-218 CrossRef Google Scholar

[11] Yang H H, Moody J. Data visualization and feature selection: new algorithms for nongaussian data. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999. 687--693. Google Scholar

[12] Vergara J R, Estévez P A. A review of feature selection methods based on mutual information. Neural Comput Applic, 2014, 24: 175-186 CrossRef Google Scholar

[13] Brown G, Pocock A, M-Zhao J, et al. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. Mach J Learn Res, 2012, 13, 27--66. Google Scholar

[14] Peng H C, Long F H, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.. IEEE Trans Pattern Anal Machine Intell, 2005, 27: 1226-1238 CrossRef PubMed Google Scholar

[15] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol, 2005, 03: 185-205 CrossRef Google Scholar

[16] Corredor G, Wang X, Zhou Y. Spatial Architecture and Arrangement of Tumor-Infiltrating Lymphocytes for Predicting Likelihood of Recurrence in Early-Stage Non-Small Cell Lung Cancer.. Clin Cancer Res, 2019, 25: 1526-1534 CrossRef PubMed Google Scholar

[17] Toyoda A, Ogawa T, Haseyama M. Favorite Video Estimation Based on Multiview Feature Integration via KMvLFDA. IEEE Access, 2018, 6: 63833-63842 CrossRef Google Scholar

[18] Berrendero J R, Cuevas A, Torrecilla J L. The mRMR variable selection method: a comparative study for functional data. J Statistical Computation Simul, 2016, 86: 891-907 CrossRef Google Scholar

[19] Guyon I, Elisseeff A. An introduction to variable and feature selection. Mach J Learn Res, 3, 1157--1182, 3 2003. Google Scholar

[20] Golub T R, Slonim D K, Tamayo P. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.. Science, 1999, 286: 531-537 CrossRef PubMed Google Scholar

[21] Gordon G J G, Jensen R V R, Hsiao L-L L, et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res, 2002, 62: 4963--4967. Google Scholar

[22] Alon U, Barkai N, Notterman D A. Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc Natl Acad Sci USA, 1999, 96: 6745-6750 CrossRef PubMed ADS Google Scholar

[23] Tian E, Zhan F, Walker R. The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma.. N Engl J Med, 2003, 349: 2483-2494 CrossRef PubMed Google Scholar

[24] Burczynski M E, Peterson R L, Twine N C. Molecular Classification of Crohn's Disease and Ulcerative Colitis Patients Using Transcriptional Profiles in Peripheral Blood Mononuclear Cells. J Mol Diagnostics, 2006, 8: 51-61 CrossRef Google Scholar

[25] Pomeroy S L, Tamayo P, Gaasenbeek M. Prediction of central nervous system embryonal tumour outcome based on gene expression.. Nature, 2002, 415: 436-442 CrossRef PubMed Google Scholar

[26] Singh Y N, Singh S K, Ray A K. Bioelectrical Signals as Emerging Biometrics: Issues and Challenges. ISRN Signal Processing, 2012, 2012(2): 1-13 CrossRef Google Scholar

[27] Chowdary D, Lathrop J, Skelton J. Prognostic Gene Expression Signatures Can Be Measured in Tissues Collected in RNAlater Preservative. J Mol Diagnostics, 2006, 8: 31-39 CrossRef Google Scholar

[28] Székely G J, Rizzo M L, Bakirov N K. Measuring and testing dependence by correlation of distances. Ann Statist, 2007, 35: 2769-2794 CrossRef Google Scholar

[29] Robnik-?ikonja M, Kononenko I. Machine Learning, 2003, 53: 23-69 CrossRef Google Scholar

  • Figure 1

    (Color online) The average $F_1$ score of four classifiers (NB, SVC, RF, kNN) on eight high-dimensional datasets. Comparison of multiple FS approaches. (a) Alon; (b) Tian; (c) Gordon; (d) Golub; (e) Burczynski; (f) Pomeroy; (g) Singh; (h) Chowdary.

  • Figure 2

    (Color online) The highest $F_1$ score of four classifiers (NB, SVC, RF, kNN) on eight high-dimensional datasets. Comparison of multiple FS approaches. (a) Alon; (b) Tian; (c) Gordon; (d) Golub; (e) Burczynski; (f) Pomeroy; (g) Singh; (h) Chowdary.

  • Table 1   Dataset to compare mRMR and Max-Dependency algorithms
    $X_1$$X_2$$X_3$Y $X_1$$X_2$$X_3$Y
    1 0 0 0 0 9 0 0 1 0
    2 0 0 0 0 10 0 0 1 0
    3 0 1 0 1 11 0 1 1 1
    4 0 1 0 1 12 0 1 1 1
    5 1 0 0 1 13 1 0 1 1
    6 1 0 0 1 14 1 0 1 1
    7 1 1 0 0 15 1 1 1 0
    8 2 1 0 0 16 2 1 1 0
  • Table 2   Contingency tables for counterexample to Theorem
    $(X_1,~X_2)$$Y=0$$Y=1$$\Sigma$ $(X_1,~X_3)$$Y=0$$Y=1$$\Sigma$
    (0, 0) 4 0 4 (0, 0) 2 2 4
    (1, 0) 0 4 4 (1, 0) 1 2 3
    (2, 0) 0 0 0 (2, 0) 1 0 1
    (0, 1) 0 4 4 (0, 1) 2 2 4
    (1, 1) 2 0 2 (1, 1) 1 2 3
    (2, 1) 2 0 2 (2, 1) 1 0 1
    $\Sigma$ 8 8 16 $\Sigma$ 8 8 16
  • Table 3   Characteristics of datasets used in this study
    Dataset (Source) Number of samples Number of features Number of Class 0 Number of Class 1
    Alon [20] 62 2000 40 22
    Tian [21] 173 12625 36 137
    Gordon [22] 181 12533 94 87
    Golub [23] 72 7129 47 25
    Burczynski [24] 127 22283 85 42
    Pomeroy [25] 60 7128 39 21
    Singh [26] 102 12600 52 50
    Chowdary [27] 104 22283 62 42
  • Table 4   Contingency table for Gordon dataset
    (v12113, v3333, v3249) $Y=0$ $Y=1$ $\sum$
    (0,1,0) 17 0 17
    (0,1,2) 4 0 4
    (0,2,0) 9 0 9
    (0,2,2) 0 7 7
    (1,2,0) 0 1 1
    (1,2,1) 0 5 5
    (1,2,2) 0 11 11
    (2,1,2) 1 0 1
    (2,2,0) 0 1 1
    (2,2,1) 0 23 23
    (2,2,2) 0 102 102
    $\sum$ 31 150 181
  • Table 5   Summary of results of prediction performance for different FS methods
    WTL Mean rank Standardized mean Standardized median
    mRMR/MI 183/55/114 5.4 0.44 0.41
    mRMR/chi2 174/44/134 5.9 0.33 0.41
    mRMR/Pear 249/51/52 3.4 0.7 0.73
    mRMR/Spea 222/46/84 4.3 0.59 0.59
    mRMR/dCor 234/53/65 3.9 0.63 0.63
    MD/MI 33/9/310 10.8 $-$1.54 $-$1.53
    MD/chi2 19/12/321 11.2 $-$1.86 $-$1.97
    MD/dCor 249/28/75 3.8 0.64 0.81
    Rank/MI 146/28/178 7.0 0.11 0.26
    Rank/chi2 130/38/184 7.4 0.04 0.17
    Rank/dCor 134/38/180 7.2 0.12 0.25
    ReliefF 123/30/199 7.7 $-$0.21 0.01
  • Table 6   Generalization of mRMR. Mean rank of maximum and average $F_1$ score versus parameter $\lambda$
    Parameter $\lambda$ 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
    $F_1$ score MAX 7.25 5.81 5.44 4.19 4.38 4.69 3.63 4.06 5.56
    $F_1$ score AVG 6.88 5.38 4.50 3.81 4.06 3.94 4.69 5.13 6.63

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备17057255号       京公网安备11010102003388号