logo

SCIENTIA SINICA Informationis, Volume 49, Issue 1: 17-41(2019) https://doi.org/10.1360/N112017-00157

Response prediction via integration of heterogeneous information

More info
  • ReceivedAug 21, 2017
  • AcceptedJan 31, 2018
  • PublishedJan 2, 2019

Abstract

In recent years, the tensor factorization model has been used to model complicated feature interactions involving multiple aspects, such as the user, publisher, and advertiser, for response prediction in real-time bidding. Among numerous challenges, data sparsity and cold start problems always bother researchers, particularly for ad conversion rate prediction. Such problems in prediction become difficult if only one or several types of information are considered. All types of heterogeneous information must be simultaneously integrated to address these problems. This paper proposes an availability solution for integrating heterogeneous information in the tensor factorization model to efficiently alleviate data sparsity and cold start problems. It proposes different integration strategies and implementation methods for various types of information depending on their property, category, structure, form, and function. This solution efficiently alleviates data sparsity and cold start problems, and enhances the prediction reliability and precision for the tensor factorization model in real-time bidding systems. Finally, this solution achieves a significant improvement in response prediction compared to baselines methods on the selection datasets.


Funded by

国家自然科学基金青年科学基金项目(61602131)

国家自然科学基金面上项目(61572151)

国家自然科学基金面上项目(61672192)

国家高技术研究发展计划(2015AA015405)


References

[1] McMahan H B, Holt G, Sculley D, et al. Ad click prediction: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, 2013. 1222--1230. Google Scholar

[2] Graepel T, Candela J Q, Borchert T, et al. Web-scale bayesian clickthrough rate prediction for sponsored search advertising in microsoft's bing search engine. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), New York, 2010. 13--20. Google Scholar

[3] Alekh A, Olivier C, Miroslav D, et al. A reliable effective terascale linear learning system. J Mach Learn Res, 2014, 15: 1111--1133. Google Scholar

[4] Chapelle O, Manavoglu E, Rosales R. Simple and scalable response prediction for display advertising. ACM Trans Intel Syst Technol, 2015, 5: 1-34 CrossRef Google Scholar

[5] Wu W C H, Yeh M Y, Chen M S. Predicting winning price in real time bidding with censored data. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1305--1314. Google Scholar

[6] Li C, Lu Y, Mei Q Z, et al. Click-through prediction for advertising in twitter timeline. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1959--1968. Google Scholar

[7] Menon A K, Chitrapura K P, Garg S, et al. Response prediction using collaborative filtering with hierarchies and side-information. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 141--149. Google Scholar

[8] Wu K W, Ferng C S, Ho C H, et al. A two-stage ensemble of diverse models for advertisement ranking in KDD Cup 2012. https://www.csie.ntu.edu.tw/~htlin/paper/doc/wskdd12cup.pdf. Google Scholar

[9] Li S, Kawale J, Fu Y. Predicting user behavior in display advertising via dynamic collective matrix factorization. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, 2015. 875--878. Google Scholar

[10] Trofimov I, Kornetova A, Topinskiy V. Using boosted trees for click-through rate prediction for sponsored search. In: Proceedings of the 6th International Workshop on Data Mining for Online Advertising and Internet Economy, Beijing, 2012. Google Scholar

[11] Agarwal D, Agrawal R, Khanna R, et al. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, 2010. 213--222. Google Scholar

[12] Zhang W N, Yuan S, Wang J. Real-time bidding benchmarking with iPinYou dataset. 2014,. arXiv Google Scholar

[13] Zou Y, Jin X, Li Y. Mariana: tencent deep learning platform and its applications. Proc VLDB Endow, 2014, 7: 1772-1777 CrossRef Google Scholar

[14] Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009, 42: 30-37 CrossRef Google Scholar

[15] Linden G, Smith B, York J. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Int Comput, 2003, 7: 76-80 CrossRef Google Scholar

[16] Chen T Q, Tang L P, Liu Q, et al. Combining factorization model and additive forest for collaborative followee recommendation. 2012. http://www.cs.princeton.edu/~linpengt/papers/kddcup2012.pdf. Google Scholar

[17] Symeonidis P, Nanopoulos A, Manolopoulos Y. Tag recommendations based on tensor dimensionality reduction. In: Proceedings of the 2008 ACM Conference on Recommender Systems, Lausanne, 2008. 43--50. Google Scholar

[18] Rendle S, Balby M L, Nanopoulos A, et al. Learning optimal ranking with tensor factorization for tag recommendation. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, 2009. 727--736. Google Scholar

[19] Shen S, Hu B, Chen W Z, et al. Personalized click model through collaborative filtering. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining, Seattle, 2012. 323--332. Google Scholar

[20] Shan L L, Lin L, Shao D, et al. CTR prediction for DSP with improved cube factorization model from historical bidding log. In: Proceedings of International Conference on Neural Information Processing, Kuching, 2014. 17--24. Google Scholar

[21] Shan L L, Lin L, Sun C J. Predicting ad click-through rates via feature-based fully coupled interaction tensor factorization. Electron Com Res Appl, 2016, 16: 30-42 CrossRef Google Scholar

[22] Shan L L, Lin L, Sun C J. Optimizing ranking for response prediction via triplet-wise learning from historical feedback. Int J Mach Learn Cybern, 2017, 8: 1777-1793 CrossRef Google Scholar

[23] Lee K, Orten B, Dasdan A, et al. Estimating conversion rate in display advertising from past erformance data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 768--776. Google Scholar

[24] Oentaryo R J, Lim E P, Low J W, et al. Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, 2014. 123--132. Google Scholar

[25] Agarwal D, Broder A Z, Chakrabarti D, et al. Estimating rates of rare events at multiple resolutions. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, 2007. 16--25. Google Scholar

[26] Wang X R, Li W, Cui Y, et al. Click-through rate estimation for rare events in online advertising. In: Online Multimedia Advertising: Techniques and Technologies. Hershey: IGI Global, 2010. Google Scholar

[27] Kota N, Agarwal D. Temporal multi-hierarchy smoothing for estimating rates of rare events. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 1361--1369. Google Scholar

[28] Vargiu E, Giuliani A, Armano G. Improving contextual advertising by adopting collaborative filtering. ACM Trans Web, 2013, 7: 1-22 CrossRef Google Scholar

[29] Dave K S, Varma V. Learning the click-through rate for rare/new Ads from similar Ads. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, 2010. 897--898. Google Scholar

[30] Agarwal D, Chen B C, Elango P. Spatio-temporal models for estimating click-through rate. In: Proceedings of the 18th International Conference on World Wide Web, Madrid, 2009. 21--30. Google Scholar

[31] Regelson M, Fain D. Predicting click-through rate using keyword clusters. In: Proceedings of the 2nd Workshop on Sponsored Search Auctions. New York: ACM, 2006. Google Scholar

[32] Richardson M, Dominowska E, Ragno R. Predicting clicks: estimating the click-through rate for new ADs. In: Proceedings of the 16th International Conference on World Wide Web, Banff, 2007. 521--530. Google Scholar

[33] Kolesnikov A, Logachev Y, Topinskiy V. Predicting CTR of new Ads via click prediction. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, 2012. 2547--2550. Google Scholar

[34] Cheng H, Zwol R V, Azimi J, et al. Multimedia features for click prediction of new Ads in display advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 777--785. Google Scholar

[35] Koren Y. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, 2008. 426--434. Google Scholar

[36] Menon A K, Elkan Charles. A log-linear model with latent features for dyadic prediction. In: Proceedings of the 10th International Conference on Data Mining, Piscataway, 2010. 364--373. Google Scholar

[37] Yang S H, Long B, Smola A, et al. Like like alike: joint friendship and interest propagation in social networks. In: Proceedings of the 20th International Conference on World Wide Web, Hyderabad, 2011. 537--546. Google Scholar

[38] Chen T Q, Zheng Z, Lu Q X, et al. Feature-based matrix factorization. 2011,. arXiv Google Scholar

[39] Yan L, Li W J, Xue G R, et al. Coupled group lasso for web-scale CTR prediction in display advertising. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, 2014. 802--810. Google Scholar

[40] Tagami Y, Ono S, Yamamoto K, et al. CTR prediction for contextual advertising: learning-to-rank approach. In: Proceedings of the 7th International Workshop on Data Mining for Online Advertising, Chicago, 2013. Google Scholar

[41] Rendle S, Freudenthaler C, Gantner Z, et al. BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Montreal, 2009. 452--461. Google Scholar

[42] Liao H, Peng L X, Liu Z C, et al. iPinYou global RTB bidding algorithm competition dataset. In: Proceedings of the 8th International Workshop on Data Mining for Online Advertising, New York, 2014. Google Scholar

[43] Zhang W N, Yuan S, Wang J. Optimal real-time bidding for display advertising. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 1077--1086. Google Scholar

[44] Zhang W N, Wang J. Statistical arbitrage mining for display advertising. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1465--1474. Google Scholar

[45] Hanley J A, McNeil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143: 29-36 CrossRef PubMed Google Scholar

[46] Bradley A P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn, 1997, 30: 1145-1159 CrossRef Google Scholar

[47] Fawcett T. ROC graphs: notes and practical considerations for researchers. Mach Learn, 2004, 31: 1--38. Google Scholar

  • Figure 1

    An illustrative example of data hierarchies

  • Figure 2

    The third-order response tensor

  • Figure 3

    The taxonomy of ad responses

  • Figure 4

    (Color online) The integrating framework for heterogeneous information

  • Figure 5

    CP decomposition for ad CTR prediction

  • Figure 6

    Tucker decomposition for ad CTR prediction

  • Figure 7

    (Color online) Performance of matrix factorization integrated different level features

  • Figure 8

    (Color online) Performance of CP integrated different level features

  • Figure 9

    (Color online) The performance of Tucker factorization integrated different level features

  • Figure 10

    (Color online) Performance of all models integrating heterogeneous information in quarter 1

  • Figure 11

    (Color online) Performance of all models integrating heterogeneous information in quarter 2

  • Figure 12

    (Color online) Performance of all models integrating heterogeneous information in quarter 3

  • Table 1   Charateristics for three-quarters datasets
    Quarter Dataset Date Impression No. Click No. Click-through rate (%)
    2[1]*1
    Training dataset May 11 $\sim$ May 17 9262861 7482 0.076
    Test dataset May 18 $\sim$ May 20 2594386 8934 0.075
    2[0]*2
    Training dataset June 6 $\sim$ June 12 12237229 8961 0.073
    Test dataset June 3 $\sim$ June 15 2524630 1873 0.074
    2[1]*3
    Training dataset October 19 $\sim$ October 27 3158171 2709 0.086
    Test dataset October 21 $\sim$ October 28 1579086 1120 0.071
  •   

    Algorithm 1 Triplet-wise learning algorithm

    Input: Weighting coefficient $\alpha$, learning rate $\eta$, regularization coefficient $\lambda_\Theta$, training dataset $D$. Output: Learned parameter $\Theta$.

    Initialize parameter $\Theta$;

    Construct $N^{++}$, $N^+$ and $N^-$ according to $D$;

    repeat

    Draw uniformly $x_i\in~N^{++}$;

    Draw uniformly $x_j\in~N^+$;

    Draw uniformly $x_k\in~N^-$;

    $//$ Then we have a tuple $(x_i,x_j,x_k)\in~D_t$;

    Calculate $\Delta~y_{ij}$ and $\Delta~y_{jk}$;

    for each $\theta~\in~\Theta$

    Calculate $\frac{{\partial~(~{\Delta~{{y}_{ij}}}~)}}{{\partial~{{\theta~}}}}$ and $\frac{{\partial~(~{\Delta~{{y}_{jk}}}~)}}{{\partial~{{\theta~}}}}$;

    Calculate gradients $\frac{{\partial~L}}{{\partial~\theta~}}$;

    Update $\theta$;

    end for

    until convergence;

    return $\Theta$.

  • Table 2   The statistics of advertiser categories and ad numbers
    Advertiser Quarter Industry type Campaign Creative
    df6f61b2409f4e2f16b6873a7eb50444 1 Consumer packaged goods (CPG) 1 14
    3a7eb50444df6f61b2409f4e2f16b687 1 Chinese vertical e-commerce 1 12
    9f4e2f16b6873a7eb504df6f61b24044 1 Vertical online media 1 7
    1458 2 Chinese vertical e-commerce 1 8
    3358 2 Software 3 25
    3386 2 International e-commerce 1 19
    3427 2 Oil 2 13
    3476 2 Tire 11 11
    2259 3 Milk powder 1 22
    2261 3 Telecom 1 9
    2821 3 Footwear 1 3
    2997 3 Mobile e-commerce app install 1 23
  • Table 3   Training dataset statistics
    Quarter Advertiser Impression No. Click No. Click-through rate (%)
    1 9f4e2f16b6873a7eb504df6f61b24044 3251782 3055 0.094
    1 3a7eb50444df6f61b2409f4e2f16b687 3182633 2644 0.083
    1 df6f61b2409f4e2f16b6873a7eb50444 2828446 1303 0.046
    2 1458 3083056 2454 0.080
    2 3358 1742104 1358 0.078
    2 3386 2847802 2076 0.073
    2 3427 2593765 1926 0.074
    2 3476 1970360 1027 0.052
    3 2259 835556 280 0.034
    3 2261 687617 207 0.030
    3 2821 1322561 843 0.064
    3 2997 312437 1386 0.444
    Total 12 24658119 18559 0.075
  • Table 4   Test dataset statistics
    Quarter Advertiser Impression No. Click No. Click-through rate (%)
    1 9f4e2f16b6873a7eb504df6f61b24044 896908 850 0.095
    1 3a7eb50444df6f61b2409f4e2f16b687 918846 679 0.074
    1 df6f61b2409f4e2f16b6873a7eb50444 778632 403 0.052
    2 1458 614638 543 0.088
    2 3358 300928 339 0.113
    2 3386 542421 496 0.091
    2 3427 536795 395 0.074
    2 3476 523848 302 0.058
    3 2259 417179 131 0.031
    3 2261 343862 97 0.028
    3 2821 661964 394 0.060
    3 2997 153063 533 0.348
    Total 12 6689084 5162 0.077
  • Table 5   Dimensonality of main features for three-quarters datasets
    Quarter Dataset Impression No. User No. Tag No. Slot No. Page No. Advertiser No. Campaign No. Creative No.
    2[1]*1
    Training dataset 9262861 6799908
    2[1]*null
    124684 2082249 3 3 32
    Test dataset 2594386 2164525 58945 811585 3 3 33
    2[0]*2
    Training dataset 12237229 10146491 45 141515 2362123 5 18 74
    Test dataset 2524630 2310303 68 48458 663218 5 18 74
    2[1]*3
    Training dataset 3158171 2818424 69 53518 963576 4 4 57
    Test dataset 1579086 1490321 58 43603 552694 4 4 54
  • Table 6   Interpretation for model notation
    Notation Object feature Hierarchy and Global feature Click and
    category feature conversion feature
    X_0
    X_1
    X_2
    Featured-based X
    X_IHI
  • Table 7   RMSE values for all models merging the first three types of features
    RMSE LR Feature-based MF Feature-based tucker Feature-based CP
    Quarter 1 0.0274 0.0261 0.0235 0.0275
    Quarter 2 0.0262 0.0261 0.0260 0.0262
    Quarter 3 0.0268 0.0267 0.0267 0.0266
  • Table 8   RMSE values for all models merging all features
    RMSE LR MF_IHI Tucker_IHI CP_IHI
    Quarter 1 0.0274 0.0371 0.0371 0.0371
    Quarter 2 0.0262 0.0362 0.0362 0.0363
    Quarter 3 0.0268 0.0372 0.0361 0.0362

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1