SCIENCE CHINA Information Sciences, Volume 60, Issue 9: 092103(2017) https://doi.org/10.1007/s11432-015-0905-6

A Gaussian copula regression model for movie box-office revenues prediction

More info
  • ReceivedApr 11, 2016
  • AcceptedAug 15, 2016
  • PublishedApr 25, 2017


In this article, we revisit the task of movie box-office revenues prediction using multi-type features. The movie box-office revenues are affected by numerous factors. Previous work with discriminative models assumes these factors are identically and independently distributed. The correlations between these factors are rarely considered, which limited the performances of discriminative models in this task. To address these problems, we investigate a novel Gaussian copula regression model. Based on this model, we do not need to make any prior assumptions about the marginal distributions of the features. In particular, we perform a cumulative probability estimation on each of the smoothed features. The estimation learns the marginal distributions and maps all features into a uniform vector space. Sequentially, we bridge the marginal distributions with a copula function to create their joint distribution, and learn the dependency structure between them. Moreover, we propose a computational-efficient approximate algorithm for responsible variable inference. Experimental results on two movie datasets from Chinese and U.S. market show that our approach outperforms strong discriminative regression baselines.


This work was supported by National Basic Research Program of China (Grant No. 2014CB340503), and National Natural Science Foundation of China (Grant Nos. 71532004, 61133012, 61472107).

  • Figure 1

    The framework of our proposed method.

  • Figure 2

    Performance of our approach on Dataset S2 combining different features. NS: number of screens; PR: post rate; PIR: purchase intention rate.


    Algorithm 1 Gaussian copula regression algorithm

    Require: % Training and testing data Training data: ( X_rm meta^rm trX_rm text^rm try^rm tr ); Testing data: ( X_rm meta^rm teX_rm text^rm tey^rm te ); Output: the predicted value $y$;

    % Normalize the data

    normalize$(X_{\rm meta}^{\rm tr},X_{\rm meta}^{\rm te})$;

    normalize$(X_{\rm text}^{\rm tr},X_{\rm text}^{\rm te})$;

    normalize$(y^{\rm tr},y^{\rm te})$; % Kernel-based CDF estimation

    $U_{\rm meta}^{\rm tr} = {\rm GaussianKernel}(X_{\rm meta}^{\rm tr},X_{\rm meta}^{\rm tr})$;

    $U_{\rm text}^{\rm tr} = {\rm BoxKernel}(X_{\rm text}^{\rm tr},X_{\rm text}^{\rm tr})$;

    $U_{\rm meta}^{\rm te} = {\rm GaussianKernel}(X_{\rm meta}^{\rm tr},X_{\rm meta}^{\rm te})$;

    $U_{\rm text}^{\rm te} = {\rm BoxKernel}(X_{\rm text}^{\rm tr},X_{\rm text}^{\rm te})$;

    $U_{y}^{\rm tr} = {\rm GaussianKernel}(y^{\rm tr},y^{\rm tr})$;

    $U_{y}^{\rm te} = {\rm GaussianKernel}(y^{\rm tr},y^{\rm te})$;

    $Z^{\rm tr} = {\rm GaussianInverseCDF}(U_{\rm meta}^{\rm tr},U_{\rm text}^{\rm tr},U_{y}^{\rm tr})$;

    $\Sigma = {\rm MLE}(Z^{\rm tr})$; % Approximate inference

    for $i = 1 \rightarrow m {\rm testing} {\rm examples}$

    ${\rm max\_{density} } = 0$;

    ${\rm probability} = 0$;

    for $k=0.01 \rightarrow 1$

    ${\rm dens = Density}(k) * {\rm CopulaDensity}(U_{\rm meta}^{\rm te},U_{\rm text}^{\rm te},k)$;

    if ${\rm dens} \geq {\rm max\_{density} }$ then

    ${\rm max\_{density} = dens}$;

    ${\rm probability} = k$;

    end if

    end for

    $y = {\rm InverseCDF}(y^{\rm tr},{\rm probability});$

    end for

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有