logo

SCIENTIA SINICA Informationis, Volume 51 , Issue 2 : 279(2021) https://doi.org/10.1360/SSI-2019-0175

Deep linear discriminant analysis hashing

More info
  • ReceivedAug 21, 2019
  • AcceptedSep 18, 2019
  • PublishedFeb 1, 2021

Abstract


Funded by

科技部重点研发计划(2018YFB1107400)

国家自然科学基金(61871470,61772427)


Supplement

Appendix

证明

Theorem 2

最大化基于深度特征的线性判别分析目标等价于最小化其最小均方的线性回归误差, 即 \begin{equation} \begin{array}{l} \mathop {\max }\limits_f {\rm Tr}( {{{\left( {{S_t} + \mu I} \right)}^{{\rm{ - }}1}}{S_b}} ) \Leftrightarrow \mathop {\min }\limits_{f,W,b} \| {{X^{\rm T}}W + 1{b^{\rm T}} - \widetilde Y} \|_F^2 + \mu \left\| W \right\|_F^2, \end{array} \tag{19}\end{equation} 其中 $\widetilde~Y~=~Y{\left(~{{Y^{\rm~T}}Y}~\right)^{~-~{1~\mathord{\left/~{\vphantom~{1~2}}~\right.~\kern-\nulldelimiterspace}~2}}}$.

Proof.\begin{equation} g\left( {W,b} \right) = \| {{X^{\rm T}}W + 1{b^{\rm T}} - \widetilde Y} \|_F^2 + \mu \left\| W \right\|_F^2. \tag{20}\end{equation}

根据拉格朗日(Lagrange)乘子法, 令式(20)关于$b$的偏导为0, \begin{equation} \frac{{\partial g\left( {W,b} \right)}}{{\partial b}} = {\left( {{X^{\rm T}}W + 1{b^{\rm T}} - \widetilde Y} \right)^{\rm T}} \cdot 1 = 0, \tag{21}\end{equation} 进而 \begin{equation} b = \frac{1}{n}\left( {{{\widetilde Y}^{\rm T}} - {W^{\rm T}}X} \right) \cdot 1. \tag{22}\end{equation} 将$b$带入到式(20)中, 式(20)变为 \begin{equation} \begin{array}{l} g\left( W \right) = \left\| {{X^{\rm T}}W + \dfrac{1}{n}1 \cdot {1^{\rm T}}( {{X^{\rm T}}W - \widetilde Y} ) - \widetilde Y} \right\|_F^2 + \mu \left\| W \right\|_F^2 = \| {H{X^{\rm T}}W - H\widetilde Y} \|_F^2 + \mu \left\| W \right\|_F^2. \end{array} \tag{23}\end{equation} 类似地, 令式(20)关于$W$的偏导为0, 则 \begin{equation} \frac{{\partial g\left( W \right)}}{{\partial W}} = 2XH\left( {H{X^{\rm T}}W - H\widetilde Y} \right) + 2\mu W=0. \tag{24}\end{equation} 进而 \begin{equation} W = {\left( {XH{X^{\rm T}} + \mu I} \right)^{ - 1}}XH\tilde Y. \tag{25}\end{equation}

将式(25)带入到式(23)中, 可以得到 \begin{align}\| {H{X^{\rm T}}W - H\widetilde Y} \|_F^2 + \mu \left\| W \right\|_F^2 & = {\rm Tr}[ {{W^{\rm T}}XH{X^{\rm T}}W - {W^{\rm T}}XH\widetilde Y - {{\widetilde Y}^{\rm T}}H{X^{\rm T}}W + \widetilde YH\widetilde Y} ] + \mu {\rm Tr}( {{W^{\rm T}}W} ) \\ & = {\rm Tr}[ {{W^{\rm T}}( {XH{X^{\rm T}} + \mu I} )W - {W^{\rm T}}XH\widetilde Y - {{\widetilde Y}^{\rm T}}H{X^{\rm T}}W} ] + {\rm Tr}( {\widetilde YH\widetilde Y} ) \\ & = {\rm Tr}( {\widetilde YH\widetilde Y} ) - {\rm Tr}[ {{{\widetilde Y}^{\rm T}}H{X^{\rm T}}W} ] \\ & = {\rm Tr}( {\widetilde YH\widetilde Y} ) - {\rm Tr}[ {{{( {XH{X^{\rm T}} + \mu I} )}^{ - 1}}XH\tilde Y {{\widetilde Y}^{\rm T}}H{X^{\rm T}}} ]. \tag{26} \end{align} 由于${\rm~Tr}(~{\widetilde~YH\widetilde~Y}~)$项是固定的, 原本的最小化问题被转化为一个线性判别分析的最大化问题 \begin{equation} \mathop {\max }\limits_f {\rm Tr}( {{{\left( {{S_t} + \mu I} \right)}^{{\rm{ - }}1}}{S_b}} ). \tag{27}\end{equation}


References

[1] Kafai M, Eshghi K, Bhanu B. Discrete Cosine Transform Locality-Sensitive Hashes for Face Retrieval. IEEE Trans Multimedia, 2014, 16: 1090-1103 CrossRef Google Scholar

[2] Zhang R M, Lin L, Zhang R. Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification. IEEE Trans Image Process, 2015, 24: 4766-4779 CrossRef PubMed ADS arXiv Google Scholar

[3] Li X L, Hu D, Nie F P. Deep binary reconstruction for cross-modal hashing. ACM on Multimedia Conference, 2017: 1398-1406 DOI: 10.1145/3123266.3123355. Google Scholar

[4] Andoni A, Indyk P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM, 2008, 51: 117 CrossRef Google Scholar

[5] Kulis B, Grauman K. Kernelized locality-sensitive hashing.. IEEE Trans Pattern Anal Mach Intell, 2012, 34: 1092-1104 CrossRef PubMed Google Scholar

[6] Datar M, Immorlica N, Indyk P, et al. Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th Annual Symposium on Computational Geometry, 2004. 253--262. Google Scholar

[7] Liu W, Mu C, Kumar S, et al. Discrete graph hashing. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3419--3427. Google Scholar

[8] Gong Y, Lazebnik S, Gordo A. Iterative quantization: a Procrustean approach to learning binary codes for large-scale image retrieval.. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 2916-2929 CrossRef PubMed Google Scholar

[9] Kong W, Li W. Isotropic hashing. In: Proceedings of Advances in Neural Information Processing Systems, 2012. 1646--1654. Google Scholar

[10] Kulis B, Darrell T. Learning to hash with binary reconstructive embeddings. In: Proceedings of Advances in Neural Information Processing Systems, 2009. 1042--1050. Google Scholar

[11] Carreira-Perpinán M, Raziperchikolaei R. Hashing with binary autoencoders. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 557--566. Google Scholar

[12] Weiss Y, Torralba A, Fergus R. Spectral hashing. In: Proceedings of Advances in Neural Information Processing Systems, 2009. 1753--1760. Google Scholar

[13] Li X L, Hu D, Nie F P. Large graph hashing with spectral rotation. In: Proceedings of AAAI Conference on Artificial Intelligence, 2017. 2203--2209. Google Scholar

[14] Shen F M, Shen C H, Liu W, et al. Supervised discrete hashing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 557--566. Google Scholar

[15] Koutaki G, Shirai K, Ambai M. Fast supervised discrete hashing and its analysis. 2016,. arXiv Google Scholar

[16] Chopra S, Hadsell R, LeCun Y. Learning a similarity metric discriminatively with application to face verification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2005. 539--546. Google Scholar

[17] Li W J, Wang S, Kang W C. Feature learning based deep supervised hashing with pairwise labels. 2015,. arXiv Google Scholar

[18] Zhu H, Long M S, Wang J M, et al. Deep hashing network for efficient similarity retrieval. In: Proceedings of AAAI Conference on Artificial Intelligence, 2016. 2415--2421. Google Scholar

[19] Yang H F, Lin K, Chen C S. Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 437-451 CrossRef PubMed Google Scholar

[20] Li Q, Sun Z N, He R, et al. Deep supervised discrete hashing. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 2479--2488. Google Scholar

[21] Yao T, Long F C, Mei T, et al. Deep semantic-preserving and ranking-based hashing for image retrieval. In: Proceedings of International Joint Conference on Artificial Intelligence, 2016. 3931--3937. Google Scholar

[22] Mika S, Ratsch G, Weston J, et al. Fisher discriminant analysis with kernels. IEEE Signal Processing Society Workshop, 1999. 41--48. Google Scholar

[23] Xiong H Y, Cheng W, Hu W Q, et al. AWDA: an adaptive wishart discriminant analysis. In: Proceedings of IEEE International Conference on Data Mining, 2017. 525--534. Google Scholar

[24] Strecha C, Bronstein A M, Bronstein M M. LDAHash: Improved Matching with Smaller Descriptors.. IEEE Trans Pattern Anal Mach Intell, 2012, 34: 66-78 CrossRef PubMed Google Scholar

[25] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436-444 CrossRef PubMed ADS Google Scholar

[26] Wang Q, Qin Z Q, Nie F P, et al. Convolutional 2D LDA for nonlinear dimensionality reduction. In: Proceedings of International Joint Conference on Artificial Intelligence, 2017. 2929--2935. Google Scholar

[27] Dorfer M, Kelz R, Widmer G. Deep linear discriminant analysis,. arXiv Google Scholar

[28] Jia Y Q, Nie F P, Zhang C S. Trace ratio problem revisited.. IEEE Trans Neural Netw, 2009, 20: 729-735 CrossRef PubMed Google Scholar

[29] Friedman J H. Regularized Discriminant Analysis. J Am Statistical Association, 1989, 84: 165-175 CrossRef Google Scholar

[30] Lu J, Plataniotis K N, Venetsanopoulos A N. Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recognition Lett, 2005, 26: 181-191 CrossRef Google Scholar

[31] Xiong H Y, Cheng W, Fu Y J, et al. De-biasing covariance-regularized discriminant analysis. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018. 2889--2897. Google Scholar

[32] Ye J P. Least squares linear discriminant analysis. In: Proceedings of International Conference on Machine Learning, 2007. 1087--1093. Google Scholar

[33] Sun L, Ceran B, Ye J P. A scalable two-stage approach for a class of dimensionality reduction techniques. In: Proceedings of ACM International Conference on Knowledge Discovery and Data mining, 2010. 313--322. Google Scholar

[34] Lee K, Kim J. On the equivalence of linear discriminant analysis and least squares. In: Proceedings of AAAI Conference on Artificial Intelligence, 2015. 2736--2742. Google Scholar

[35] Lecun Y, Bottou L, Bengio Y. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278-2324 CrossRef Google Scholar

[36] Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report, University of Toronto, 2009. Google Scholar

[37] Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009. 248--255. Google Scholar

[38] Wang X, Shi Y, Kitani K. Deep supervised hashing with triplet labels. In: Proceedings of Asian Conference on Computer Vision, 2016. 70--84. Google Scholar

[39] Cao Z J, Long M S, Wang J M, et al. Hashnet: deep learning to hash by continuation. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 5608--5617. Google Scholar

[40] Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res, 2008, 9: 2579--2605. Google Scholar

[41] Salakhutdinov R, Hinton G. Semantic hashing. Int J Approximate Reasoning, 2009, 50: 969-978 CrossRef Google Scholar

  • Figure 1

    (Color online) An illustration of the classification (a) and LDA projection (b). The projected data by LDA have greater distance between class centers than the ones by classification.

  • Figure 2

    (Color online) The hash lookup performance of different hashing methods on MNIST ((a), (b)) and ImageNet ((c), (d)) datasets with varying code lengths

  • Figure 3

    The hash lookup performance and t-SNE visualization on CIFAR-10 dataset with varying code lengths. protect łinebreak (a) Hashing on CIFAR-10 (Recall); (b) hashing on CIFAR-10 (F-measure); (c) DTSH@32 bits on CIFAR-10; (d) DLDAH@32 bits on CIFAR-10

  • Table 1   The comparison results of different hashing methods in MAP and Precision@top100 on MNIST
    Metric MAP Precision@top100
    Code $\#$bits $8$ $24$ $32$ $64$ $128$ $8$ $24$ $32$ $64$ $128$tabularnewline
    LSH 0.1561 0.2089 0.2031 0.3004 0.3359 0.1798 0.4143 0.4840 0.6910 0.7915 tabularnewline SH 0.2929 0.2699 0.2606 0.2476 0.2496 0.4382 0.6942 0.7285 0.7666 0.7938 tabularnewline ITQ 0.4012 0.4193 0.4372 0.4452 0.4639 0.5671 0.7805 0.8190 0.8571 0.8824 tabularnewline
    LDAH 0.4486 0.2260 0.2036 0.1685 0.1497 0.5917 0.5488 0.5053 0.4278 0.4107 tabularnewline SDH 0.8506 0.9148 0.9228 0.9309 0.9391 0.4853 0.7603 0.8061 0.8539 0.9080 tabularnewline FSDH 0.9165 0.9215 0.9165 0.9215 0.7113 0.7372 0.7113 0.7372 tabularnewline
    DHN 0.9560 0.9803 0.9783 0.9804 0.9857 0.7932 0.9623 0.9662 0.9754 0.9850 tabularnewline HashNet 0.8908 0.9817 0.9809 0.9803 0.9842 0.6873 _0.9689 _0.9736 0.9791 _0.9852 tabularnewline DTSH 0.9360 0.9807 0.9838 0.9880 0.9882 0.8615 0.9014 0.9224 0.9568 0.9808 tabularnewline DSDH _0.9749 _0.9842 _0.9887 _0.9902 _0.9910 _0.9082 0.9309 0.9454 _0.9810 0.9834 tabularnewline
    DLDAH 0.9792 0.9887 0.9913 0.9935 0.9919 0.9314 0.9800 0.9826 0.9867 0.9860
  • Table 2   The comparison results of different hashing methods in MAP and Precision@top100 on CIFAR-10
    Metric MAP Precision@top100
    Code $\#$bits $8$ $24$ $32$ $64$ $128$ $8$ $24$ $32$ $64$ $128$tabularnewline
    LSH 0.1374 0.1454 0.1693 0.1643 0.2062 0.1201 0.2114 0.2702 0.3076 0.4018 tabularnewline SH 0.1920 0.1739 0.1700 0.1690 0.1696 0.2300 0.3571 0.3699 0.4021 0.4392 tabularnewline ITQ 0.2308 0.2494 0.2540 0.2750 0.2892 0.2649 0.4161 0.4385 0.5031 0.5349 tabularnewline
    LDAH 0.1677 0.1298 0.1243 0.1162 0.1120 0.1926 0.1937 0.1789 0.1695 0.1608 tabularnewline SDH 0.4881 0.5663 0.5807 0.5925 0.6067 0.2461 0.3212 0.3466 0.3867 0.5327 tabularnewline FSDH 0.5552 0.5616 0.5616 0.5616 0.3060 0.3072 0.3072 0.3072tabularnewline
    DHN 0.5750 0.6799 0.7004 0.7027 0.7078 0.2341 0.6813 _0.7330 0.7552 0.7785 tabularnewline HashNet 0.5813 0.6973 0.7128 0.6809 0.7074 0.2482 _0.6909 0.7322 _0.7695 _0.7873tabularnewline DTSH 0.6214 0.7411 0.7613 0.7743 0.7213 _0.4076 0.6772 0.6953 0.6792 0.4573 tabularnewline DSDH _0.6774 _0.7542 _0.7623 _0.7905 _0.7502 0.4044 0.6643 0.6870 0.6849 0.7197 tabularnewline
    DLDAH 0.7006 0.7697 0.7724 0.7964 0.8002 0.4632 0.7167 0.7580 0.7740 0.8252
  • Table 3   The comparison results of different hashing methods in MAP and Precision@top100 on ImageNet
    Metric MAP Precision@top100
    Code $\#$bits $8$ $16$ $32$ $48$ $8$ $16$ $32$ $48$tabularnewline
    LSH 0.0211 0.0260 0.0475 0.0815 0.0289 0.0623 0.1371 0.2293 tabularnewline SH 0.0815 0.1118 0.1578 0.1869 0.0728 0.2090 0.3321 0.3974 tabularnewline ITQ 0.1163 0.1883 0.2723 0.3152 0.0876 0.2767 0.4375 0.4993 tabularnewline
    LDAH 0.0659 0.1122 0.1946 0.2540 0.0591 0.1767 0.3406 0.4352 tabularnewline SDH 0.1840 0.3172 0.4088 0.4514 0.1223 0.4280 0.5353 0.5717tabularnewline
    DHN 0.2388 0.3650 0.4475 0.4884 0.1062 0.4355 0.5693 0.5996 tabularnewline HashNet _0.2439 _0.3819 0.4756 0.5262 0.1042 _0.4538 _0.5767 _0.6158tabularnewline DTSH 0.2118 0.3497 0.4433 0.5035 0.1287 0.3817 0.5230 0.5548 tabularnewline DSDH 0.2313 0.3720 _0.4810 _0.5326 _0.1362 0.3917 0.5238 0.5818tabularnewline
    DLDAH 0.2930 0.3953 0.5208 0.5624 0.1592 0.4548 0.5964 0.6635