logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 5 : 743-765(2020) https://doi.org/10.1360/N112018-00302

An effective gas sensor array optimization method based on dynamic feature importance

More info
  • ReceivedNov 12, 2018
  • AcceptedMar 21, 2019
  • PublishedApr 27, 2020

Abstract

Gas sensor array optimization is a key problem in the field of electronic noses, and it is also a special feature selection problem. In this paper, we propose a novel measure of sensor (or feature) importance, named dynamic feature importance, based on feature correlation and feature importance. Also, we propose an effective electronic nose sensor array optimization algorithm SAO_DFI based on the dynamic feature importance. We analyze the effects of repeated sensors, sensor (or feature) importance, sensor (or feature) correlation, and sensor characteristic parameters, based on the proposed SAO_DFI algorithm using data collected in two different gas environments. The optimization results demonstrate the effectiveness, robustness, and interpretability of the array optimization algorithm.


Funded by

国家自然科学基金(61174007)

烟台市科技发展计划(2016ZH053,2017ZH063)


References

[1] Gardner J W, Bartlett P N. A brief history of electronic noses. Senss Actuators B-Chem, 1994, 18: 210-211 CrossRef Google Scholar

[2] Persaud K, Dodd G. Analysis of discrimination mechanisms in the mammalian olfactory system using a model nose. Nature, 1982, 299: 352-355 CrossRef ADS Google Scholar

[3] Evans G P, Buckley D J, Adedigba A L. Controlling the Cross-Sensitivity of Carbon Nanotube-Based Gas Sensors to Water Using Zeolites. ACS Appl Mater Interfaces, 2016, 8: 28096-28104 CrossRef Google Scholar

[4] Huang X, Meng F, Pi Z. Gas sensing behavior of a single tin dioxide sensor under dynamic temperature modulation. Senss Actuators B-Chem, 2004, 99: 444-450 CrossRef Google Scholar

[5] Huang X, Liu J, Shao D. Rectangular mode of operation for detecting pesticide residue by using a single SnO2-based gas sensor. Senss Actuators B-Chem, 2003, 96: 630-635 CrossRef Google Scholar

[6] Kabir K M M, Sabri Y M, Matthews G I. Cross sensitivity effects of volatile organic compounds on a SAW-based elemental mercury vapor sensor. Senss Actuators B-Chem, 2015, 212: 235-241 CrossRef Google Scholar

[7] Zhu H, Li Q, Ren Y. Small, 2018, 14: 1703974 CrossRef PubMed Google Scholar

[8] Gardner J W. Detection of vapours and odours from a multisensor array using pattern recognition Part 1. Principal component and cluster analysis. Senss Actuators B-Chem, 1991, 4: 109-115 CrossRef Google Scholar

[9] Zhang H, Wang J, Tian X. Optimization of sensor array and detection of stored duration of wheat by electronic nose. J Food Eng, 2007, 82: 403-408 CrossRef Google Scholar

[10] Zhang S, Xie C, Zeng D. A sensor array optimization method for electronic noses with sub-arrays. Senss Actuators B-Chem, 2009, 142: 243-252 CrossRef Google Scholar

[11] Zhang S P. Study on dynamic reaction of metal oxide gas sensor and sensor array optimization. Dissertation for Ph.D. Degree. Wuhan: Huazhong University of Science and Technology, 2009. Google Scholar

[12] Gardner J W, Boilot P, Hines E L. Enhancing electronic nose performance by sensor selection using a new integer-based genetic algorithm approach. Senss Actuators B-Chem, 2005, 106: 114-121 CrossRef Google Scholar

[13] Pearce T C, Gardner J W, Göpel W. Strategies for mimicking olfaction: the next generation of electronic noses? Senss Update 1998, 3: 61--130. Google Scholar

[14] Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res, 2003, 3: 1157--1182. Google Scholar

[15] Xie J Y, Xie W X. Several feature selection algorithms based on the discernibility of a feature subset and support vector Machines. Chin J Comput, 2014, 37: 1704--1718. Google Scholar

[16] Sun G-L, Song Z-C, Liu J-L, et al. Feature selection method based on maximum information coeffcient and approximate Markov blanket. Act Autom Sin, 2017, 43: 795--805. Google Scholar

[17] Whitney A W. A Direct Method of Nonparametric Measurement Selection. IEEE Trans Comput, 1971, C-20: 1100-1103 CrossRef Google Scholar

[18] Pudil P, Novovičová J, Kittler J. Floating search methods in feature selection. Pattern Recognition Lett, 1994, 15: 1119-1125 CrossRef Google Scholar

[19] Marill T, Green D. On the effectiveness of receptors in recognition systems. IEEE Trans Inform Theor, 1963, 9: 11-17 CrossRef Google Scholar

[20] Fleuret F. Fast binary feature selection with conditional mutual information. J Mach Learn Res, 2004, 5: 1531--1555. Google Scholar

[21] Hall M A. Correlation-based feature selection for machine learning. Dissertation for Ph.D. Degree. Hamilton: The university of Waikato, 1999. Google Scholar

[22] Peng H C, Long F H, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.. IEEE Trans Pattern Anal Machine Intell, 2005, 27: 1226-1238 CrossRef PubMed Google Scholar

[23] Mundra P A, Rajapakse J C. SVM-RFE with MRMR filter for gene selection.. IEEE Transon NanoBiosci, 2010, 9: 31-37 CrossRef PubMed Google Scholar

[24] Nagle H T, Gutierrez-Osuna R, Schiffman S S. The how and why of electronic noses. IEEE Spectr, 1998, 35: 22-31 CrossRef Google Scholar

[25] Peng J. Study on optimization of sensor array for E-nose system. Dissertation for M.S. Degree. Wuhan: Huazhong University of Science and Technology, 2008. Google Scholar

[26] Reshef D N, Reshef Y A, Finucane H K. Detecting Novel Associations in Large Data Sets. Science, 2011, 334: 1518-1524 CrossRef PubMed ADS Google Scholar

[27] Cortes, Corinna, Vladimir V. Support-vector machine. Mach Learn, 1995, 20: 273--297. Google Scholar

[28] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323: 533-536 CrossRef ADS Google Scholar

[29] Breiman L. Random Forests. Machine Learning, 2001, 45: 5-32 CrossRef Google Scholar

[30] Zhang M L, Zhou Z H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40: 2038-2048 CrossRef Google Scholar

[31] Guyon I, Weston J, Barnhill S. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46: 389-422 CrossRef Google Scholar

[32] Xue Y, Zhang L, Wang B. Nonlinear feature selection using Gaussian kernel SVM-RFE for fault diagnosis. Appl Intell, 2018, 48: 3306-3331 CrossRef Google Scholar

[33] Chou P A. Optimal partitioning for classification and regression trees. IEEE Trans Pattern Anal Machine Intell, 1991, 13: 340-354 CrossRef Google Scholar

[34] Strobl C, Boulesteix A L, Zeileis A. Bias in random forest variable importance measures: illustrations, sources and a solution.. BMC BioInf, 2007, 8: 25-46 CrossRef PubMed Google Scholar

[35] Janitza S, Tutz G, Boulesteix A L. Random forest for ordinal responses: Prediction and variable selection. Comput Stat Data Anal, 2016, 96: 57-73 CrossRef Google Scholar

[36] Altmann A, Tolosi L, Sander O. Permutation importance: a corrected feature importance measure.. Bioinformatics, 2010, 26: 1340-1347 CrossRef PubMed Google Scholar

  • Figure 1

    (Color online) Schematic diagram of human olfaction system and electronic nose system

  • Figure 2

    (Color online) Test system diagram

  • Figure 3

    (Color online) Steady responses of sensor array in environment I. (a) 1%, 4%, 6%和8% $\rm~CO_2$; (b) 10, 30, 50, 80 ppm $\rm~SO_2$; (c) 100$\sim$400 ppm $\rm~SO_2$; (d) 30 ppm $\rm~SO_2$ + 1%, 4%, 6%和8% $\rm~CO_2$; (e) 50 ppm $\rm~SO_2$ + 1%, 4%, 6%和8% $\rm~CO_2$

  • Figure 4

    (Color online) Steady responses of sensor array in environment II. (a) 500$\sim$2000 ppm $\rm~CH_4$; (b) 50$\sim$200 ppm CO; (c) 50 ppm CO + 500$\sim$2000 ppm $\rm~CH_4$; (d) 100 ppm CO + 500$\sim$2000 ppm $\rm~CH_4$; (e) 150 ppm CO + 500$\sim$2000 ppm $\rm~CH_4$

  • Figure 5

    (Color online) Distribution of steady state response values in environment I. Feature value of (a) TGS2600; protectłinebreak (b) TGS2600_1; (c) TGS2610; (d) TGS2610_1; (e) TGS2611; (f) TGS2603; (g) TGS2603_1; (h) 4SO2-2000

  • Figure 6

    (Color online) Pearson correlation coefficientbetween steady state response values of sensors in environment I

  • Figure 7

    (Color online) Gini importance of steady state response values of sensors in environment I

  • Figure 8

    Maximal information coefficient between steady state response values of sensors in environment I

  • Figure 9

    (Color online) 3D array recognition rate with different algorithms (including 4SO2-2000). (a) Above the blank diagonal area is the result of BP algorithm, under is the result of SVM algorithm; (b) above the blank diagonal area is the result of RF algorithm, and under is the result of KNN algorithm

  • Figure 10

    (Color online) 2D array recognition rate with different conditions (including 4SO2-2000)

  • Figure 11

    (Color online) Gini importance of steady state response values and derivative values of sensors in environment I

  • Figure 12

    (Color online) Maximal information coefficient between two characteristic parameters of sensors in protectłinebreak environment I

  • Figure 13

    (Color online) Distribution of steady state response values in environment II. Feature value of (a) TGS2602; (b) TGS2600; (c) TGS2610; (d) TGS2611; (e) TGS2603; (f) TGS2620

  • Figure 14

    (Color online) Gini importance of steady state response values of sensors in environment II

  • Figure 15

    (Color online) Maximal information coefficient between steady state response values of sensors in environment II

  • Figure 16

    (Color online) 2D array recognition rate by different algorithms. (a) Above the blank diagonal area is the result of the BP algorithm, under is the result of SVM algorithm; (b) above the blank diagonal area is the result of RF algorithm, and under is the result of KNN algorithm

  •   

    Algorithm 1 The sensor array optimization algorithm based on Dynamic Feature Importance SAO_DFI

    Require:Raw data of initial array $A_{\rm}$, variance threshold $v_{\rm~th}$, type of FI, FFC and sensor feature parameters;

    Output:

    Phase 1:

    for $s_i$ in $A_{\rm}$

    $vs~\Leftarrow~0$;

    for $g_j$ in ${G}$

    Calculate the variance of the response curve for $s_i$ to $g_j$ and record it as $v_{ij}$;

    $vs~\Leftarrow~vs~+~v_{ij}$;

    end for

    if $vs~>~v_{\rm~th}$ then

    Add the sensor $s_i$ to the array $A_{\rm}'$;

    end if

    end for

    Select the data corresponding to the sensors in array $A_{\rm}'$ from the raw data and extract the feature parameters to form a new dataset $D_{n\times (m+1)}$;

    Phase 2:

    Initialize $A_{\rm~sub}~\Leftarrow~\emptyset$;

    Calculate the accuracy of the current array by training on the dataset $D_{\rm}$, denoted as ${\rm~Acc}_{\rm~th}$;

    Calculate the FI of each feature;

    Calculate the matrix of FFC;

    Select the feature with the largest FI and add it to $A_{\rm~sub}$;

    for $i$ in $1:m-1$

    Calculate the accuracy of the array $A_{\rm~sub}$ by training on the subset $D[A_{\rm~sub}]$, denoted as ${\rm~Acc}_{\rm~sub}$;

    if ${\rm~Acc}_{\rm~sub}~\ge~{\rm~Acc}_{\rm~th}$ then

    break;

    else

    Update DFI of candidate features according to 16;

    Select the feature with the largest DFI and add it to $A_{\rm~sub}$;

    end if

    end for

    Denote the optimized array as $A_{\rm}$* that consists of the corresponding sensors in the $A_{\rm~sub}$, and delete the repetitive sensors in $A_{\rm}$*;Output: the optimized array $A_{\rm}$*.

  • Table 1   Type and parameter of gas sensors
    Type Sensor Main response gases Range Accuracy/Sensitivity
    Metal oxide TGS2600 $\rm~H_2S$, $\rm~C_2H_5OH$, CO, et al 1$\sim$30 ppm 0.3$\sim$0.6 ppm
    Metal oxide TGS2602 VOC, $\rm~NH_3$, $\rm~H_2S$, et al 1$\sim$30 ppm 0.08$\sim$0.5 ppm
    Metal oxide TGS2603 Trimethylamine, Methyl mercaptan, $\rm~H_2S$, et al 1$\sim$10 ppm $<0.5$ ppm
    Metal oxide TGS2610 $\rm~C_3H_8$, $\rm~C_4H_{10}$, et al 1%$\sim$25% 0.56$\pm$0.06 ppm
    Metal oxide TGS2611 $\rm~CH_4$, Natural gas, et al 1%$\sim$25% 0.6$\pm$0.06 ppm
    Metal oxide TGS2620 $\rm~C_2H_5OH$, Organic solvent, et al 50$\sim$500 ppm 0.3$\sim$0.5 ppm
    Electrochemical 4SO2-2000 $\rm~SO_2$ 0$\sim$2000 ppm $0.02\pm0.08~\mu$A/ppm
  • Table 2   Test gas composition and concentration settings in environment I
    Gas Target gas configuration Time control Batch
    $\rm~CO_2$ (%) $\rm~SO_2$ (ppm) Inject synthetic air (min) Inject target gases (min)
    $\rm~CO_2$ 1, 4, 6, 8 10 6 5
    $\rm~SO_2$ low 10, 30, 50, 80 10 6 5
    $\rm~SO_2$ high 100$\sim$400 interval 100 10 6 5
    Mixture 1, 4, 6, 8 30, 50 10 6 5
  • Table 3   Test gas composition and concentration settings in environment II
    Gas Target gas configuration Time control Batch
    $\rm~CH_4$ (ppm) CO (ppm) Inject synthetic air (min) Inject target gases (min)
    $\rm~CH_4$ 500$\sim$2000 interval 500 20 8 5
    $\rm~CO$ 50$\sim$200 interval 50 20 8 5
    Mixture 500$\sim$2000 interval 500 50$\sim$150 interval 50 20 8 5
  • Table 4   Feature importance of sensors in environment I
    Sensor SVMI BPI KNNI RFI GI
    TGS2600 0.602 0.556 0.684 0.602 0.1618
    TGS2610 0.746 0.782 0.654 0.632 0.1559
    TGS2611 0.774 0.786 0.722 0.726 0.1017
    TGS2603 0.550 0.768 0.620 0.546 0.1501
    4SO2-2000 0.872 0.886 0.862 0.826 0.4305
  • Table 5   Optimization process of SAO_DFI algorithm with different settings
    FFC Epoch SVMI BPI KNNI RFI GI
    First 4SO2-2000 4SO2-2000 4SO2-2000 4SO2-2000 4SO2-2000
    PCC Second TGS2610 TGS2610 TGS2610 TGS2610 TGS2610
    Third TGS2603 TGS2603 TGS2603 TGS2603 TGS2603
    First 4SO2-2000 4SO2-2000 4SO2-2000 4SO2-2000 4SO2-2000
    MIC Second TGS2603 TGS2603 TGS2603 TGS2603 TGS2603
    & Third TGS2610

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号