SCIENTIA SINICA Informationis, Volume 49 , Issue 2 : 204-215(2019) https://doi.org/10.1360/N112018-00249

Ethnic painting analysis based on deep learning

More info
  • ReceivedSep 10, 2018
  • AcceptedOct 29, 2018
  • PublishedFeb 18, 2019


Besides conveying rich semantic information, images (especially artistic images) directly express peoples' emotions and influence the emotional levels of others. As the emotional reactions to different visual stimuli differ among individuals, the emotions contained in images must be properly understood. Taking ethnic painting image data as the research objects, this study analyzes how the hue, brightness, saturation and contrast of the paintings influences the classification results of convolutional neural networks. The analysis is performed by a fine-tuning method. When experimentally evaluated on a Twitter image dataset, the absolute accuracy of our method was 3.4% higher than a previous state-of-the-art method. Finally, we proposed pre-training strategies for related tasks, which significantly improve the emotion classification of ethnic paintings, and experimentally evaluated them on visualization structures.

Funded by






[1] Picard R W. Affective Computing. London: MIT Press, 1997. Google Scholar

[2] Pang B, Lee L. Opinion Mining and Sentiment Analysis. FNT Inf Retrieval, 2008, 2: 1-135 CrossRef Google Scholar

[3] Yang Y H, Chen H H. Machine Recognition of Music Emotion: A review. ACM Trans Intell Syst Technol, 2012, 3: 1-30 CrossRef Google Scholar

[4] Wang W N, He Q H. A survey on emotional semantic image re-trieval. In: Proceedings of IEEE International Conference on Image Processing, San Diego, 2008. 117--120. Google Scholar

[5] Joshi D, Datta R, Fedorovskaya E. Aesthetics and Emotions in Images. IEEE Signal Process Mag, 2011, 28: 94-115 CrossRef ADS Google Scholar

[6] Wang S, Ji Q. Video Affective Content Analysis: A Survey of State-of-the-Art Methods. IEEE Trans Affective Comput, 2015, 6: 410-430 CrossRef Google Scholar

[7] Lee J, Park E J. Fuzzy Similarity-Based Emotional Classification of Color Images. IEEE Trans Multimedia, 2011, 13: 1031-1039 CrossRef Google Scholar

[8] Lu X, Suryanarayan P, Adams R B, et al. On shape and the computability of emotions. In: Proceedings of ACM International Conference on Multimedia, 2012. 229--238. Google Scholar

[9] Machajdik J, Hanbury A. Affective image classification using features inspired by psychology and art theory. In: Proceedings of ACM International Conference on Multi-media, Firenze, 2010. 83--92. Google Scholar

[10] Solli M, Lenz R. Color based bags-of-emotions. In: Proceedings of International Conference on Computer Analysis of Images and Patterns, Munster, 2009. 573--580. Google Scholar

[11] Zhao S C, Gao Y, Jiang X L, et al. Exploring principles-of-art features for image emotion recogni-tion. In: Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, 2014. 47--56. Google Scholar

[12] Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives.. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 1798-1828 CrossRef PubMed Google Scholar

[13] You Q Z, Luo J B, Jin H L, et al. Robust image sentiment analysis using progressively trained and domain transferred deep networks. 2015,. arXiv Google Scholar

[14] Campos V, Jou B, Giró-i-Nieto X. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image Vision Computing, 2017, 65: 15-22 CrossRef Google Scholar

[15] Chang L, Chen Y F, Li F X, et al. Affective image classification using multi-scale emotion factorization features. In: Proceedings of International Conference on Virtual Reality and Visualization (ICVRV), 2016. 170--174. Google Scholar

[16] Rao T R, Xu M, Liu H Y, et al. Multi-scale blocks based image emotion classification using multiple instance learning. In: Proceedings of IEEE International Conference on Image Processing (ICIP), 2016. 634--638. Google Scholar

[17] Chen M, Zhang L, Allebach J P. Learning deep features for image emotion classification. In: Proceedings of IEEE International Con-ference on Image Processing, 2015. 4491--4495. Google Scholar

[18] You Q Z, Luo J B, Jin H L, et al. Building a large scale dataset for image emotion recognition: the fine print and the bench-mark. In: Proceedings of the 30th AAAI Conference on Artificial Intelli-Gence, 2016. 308--314. Google Scholar

[19] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classi-fication with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012. 1097--1105. Google Scholar

[20] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar

[21] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolu-tions. 2014. arXiv:1409.4842. Google Scholar

[22] Girshick R, Donahue J, Darrell T, et al. Rich feature hier-archies for accurate object detection and semantic seg-mentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014. 580--587. Google Scholar

[23] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector. In: Proceedings of European Conference on Computer Vision, 2015. 21--37. Google Scholar

[24] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection. 2016,. arXiv Google Scholar

[25] Long J, Shelhamer E, Darrell T. Fully convolutional net-works for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3431--3440. Google Scholar

[26] Chatfield K, Simonyan K, Vedaldi A, et al. Return of the devil in the details: delving deep into convolutional nets. 2014,. arXiv Google Scholar

[27] Radenovi F, Tolias G, Chum O. CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Proceedings of European Conference on Computer Vision, 2016. Google Scholar

[28] Tajbakhsh N, Shin J Y, Gurudu S R. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?. IEEE Trans Med Imag, 2016, 35: 1299-1312 CrossRef PubMed Google Scholar

[29] Jung H, Lee S, Yim J, et al. Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 2983--2991. Google Scholar

[30] 赵贝贝. 基于可视化语义的云南民族绘画情感标注系统的设计与实现. 云南大学, 2017. Google Scholar

[31] 叶小妹, 陈贺. 颜色语和文化. 江西社会科学, 2006, 3: 171--173. Google Scholar

  • Figure 1

    (Color online) Examples of oversampling mode. (a) Original image; (b) random cutting; (c) image brightness changed; (d) image color changed

  • Figure 2

    (Color online) Pre-training strategies for related tasks. The fine-tuning VGG16 model was trained on the Twitter image dataset firstly, and then, trained on the dataset of ethnic painting

  • Figure 3

    (Color online) Replace the last three full connection layers of the CNNs with three convolution layers: Conv14 contains 4096 channels, kernnel size 7$\times$7; Conv15 contains 4096 channels, kernnel size 1$\times$1; Conv16 contains 2 channels, kernnel size 1$\times$1. The last convolutional layer is used for prediction of two kinds of emotions

  • Figure 4

    (Color online) Examples of ethnic painting image dataset, the first line behaviors positive emotion, while the second behaviors negative emotion

  • Figure 5

    (Color online) Partial results obtained by VGG16-based FCN, the first line is the original images, the generated prediction maps show in the second line, and the last is actual labels. Green represents positive predictions, while red is negative. The stronger the color, the higher prediction probability of CNNs

  • Table 1   Performance of the fine-tuning model VGG on the ethnic painting image dataset
    Model The ethnic painting image dataset
    Fine-tuning VGG16 (without oversampling) 0.701+0.020
    Fine-tuning VGG16 (with oversampling) 0.723+0.013
  • Table 2   Performance of several oversampling methods on the ethnic painting image dataset
    Oversampling mode The ethnic painting image dataset
    Baseline 0.702+0.020
    Cutting + Flipping 0.709+0.014
    Brightness 0.714+0.029
    Hue 0.701+0.037
    Saturation 0.711+0.011
    Contrast 0.693+0.041
  • Table 3   Performance of several oversampling methods on the Twitter image dataset
    Oversampling mode 5-agree
    Baseline 0.865+0.020
    Brightness 0.874+0.013
    Hue 0.848+0.031
    Saturation 0.867+0.017
    Contrast 0.870+0.025
  • Table 4   Average classification performance and standard deviation on the Twitter image dataset
    Model 3-agree 4-agree 5-agree
    Baseline PCNN from [13] 0.687 0.714 0.783
    Paper [14] (without oversampling) 0.8390.029
    Paper [14] (with oversampling) 0.8440.026
    Ours (without oversampling) 0.762+0.032 0.814+0.028 0.858+0.029
    Ours (with oversampling) 0.784+0.021 0.834+0.017 0.878+0.013
  • Table 5   Performance of pre-training strategies for related tasks
    Model Without oversampling With oversampling
    Fine-tuning MXNet 0.701+0.020 0.723+0.013
    PS CNN 0.736+0.017 0.753+0.013

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号