logo

SCIENTIA SINICA Informationis, Volume 49 , Issue 12 : 1586-1605(2019) https://doi.org/10.1360/SSI-2019-0119

Bayesian network-based high-dimensional crowdsourced data publication with local differential privacy

More info
  • ReceivedJun 4, 2019
  • AcceptedAug 26, 2019
  • PublishedDec 16, 2019

Abstract

High-dimensional crowdsourced data is pervasive in crowdsensing systems and with the development of IoTs it can produce rich knowledge for the society. However, it also creates serious privacy threats for crowdsourcing participants. To mitigate the privacy concerns in crowdsensing systems, local differential privacy has been derived from the de facto standard of differential privacy in order to achieve strong privacy guaranteed in distributed systems. However, directly achieving local differential privacy on high-dimensional crowdsourced data may lead not only to a prohibitive computational burden but also low data utility. Therefore, in this paper, we propose a local private high-dimensional data publication scheme for crowdsensing systems. In particular, on the participants' side, high-dimensional records are locally perturbed to protect privacy, while on the server's side, the probability distribution of original data is recovered by taking advantage of both the expectation maximization algorithm and the theory of the Bayesian network. Extensive experiments on real-world datasets demonstrated the effectiveness of the proposed scheme that can synthesize approximate datasets with local differential privacy.


Funded by

国家自然科学基金(61572398,61772410,61802298,U1811461,11690011)

国家重点研发计划(2017YFB 1010004)

中央高校基本科研业务费(xjj2018237)

中国博士后基金(2017M623177)


References

[1] Guo B, Wang Z, Yu Z. Mobile Crowd Sensing and Computing. ACM Comput Surv, 2015, 48: 1-31 CrossRef Google Scholar

[2] Yurur O, Liu C H, Sheng Z. Context-Awareness for Mobile Sensing: A Survey and Future Directions. IEEE Commun Surv Tutorials, 2016, 18: 68-93 CrossRef Google Scholar

[3] Li G, Wang J, Zheng Y. Crowdsourced Data Management: A Survey. IEEE Trans Knowl Data Eng, 2016, 28: 2296-2319 CrossRef Google Scholar

[4] Mohammed N, Chen R, Fung B, et al. Differentially private data release for data mining. In: Proceedings of ACM SIGKDD. New York: ACM, 2011. 493--501. Google Scholar

[5] Naveed M, Ayday E, Clayton E W. Privacy in the Genomic Era.. ACM Comput Surv, 2015, 48: 1-44 CrossRef PubMed Google Scholar

[6] Kohavi R, Provost F. Applications of data mining to electronic commerce. Data Min Knowl Discov, 2001, 5: 5--10. Google Scholar

[7] Clarke R. What's `privacy'. 2006. http://www.rogerclarke.com/DV/Privacy.html. Google Scholar

[8] Sweeney L, Abu A, Winn J. Identifying participants in the personal genome project by name. 2013,. arXiv Google Scholar

[9] Zhou X, Demetriou S, He D, et al. Identity, location, disease and more: Inferring your secrets from android public resources. In: Proceedings of ACM CCS. New York: ACM, 2013. 1017--1028. Google Scholar

[10] Jin H M, Su L, Ding B L, et al. Enabling privacy-preserving incentives for mobile crowd sensing systems. In: Proceedings of IEEE ICDCS, Nara, 2016. 344--353. Google Scholar

[11] Sweeney L. k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY. Int J Unc Fuzz Knowl Based Syst, 2002, 10: 557-570 CrossRef Google Scholar

[12] Rongxing Lu , Xiaohui Liang , Xu Li . EPPA: An Efficient and Privacy-Preserving Aggregation Scheme for Secure Smart Grid Communications. IEEE Trans Parallel Distrib Syst, 2012, 23: 1621-1631 CrossRef Google Scholar

[13] Mármol F, Sorge C, Ugus O. Do not snoop my habits: preserving privacy in the smart grid. IEEE Commun Mag, 2012, 50: 166-172 CrossRef Google Scholar

[14] Narayanan A, Shmatikov V. Robust de-anonymization of large sparse datasets. In: Proceedings of IEEE S&P, Washington, 2008. 111--125. Google Scholar

[15] Dwork C, Roth A. The Algorithmic Foundations of Differential Privacy. FNT Theor Comput Sci, 2014, 9: 211-407 CrossRef Google Scholar

[16] Dwork C. Differential privacy. In: Proceedings of ICALP. Berlin: Springer, 2006. 1--12. Google Scholar

[17] Acs G, Castelluccia C. I have a dream (differentially private smart metering). In: International Workshop on Information Hiding. Berlin: Springer, 2011. 118--132. Google Scholar

[18] Tianqing Zhu , Ping Xiong , Gang Li . Correlated Differential Privacy: Hiding Information in Non-IID Data Set. IEEE TransInformForensic Secur, 2015, 10: 229-242 CrossRef Google Scholar

[19] McSherry F D. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of ACM SIGMOD. New York: ACM, 2009. 19--30. Google Scholar

[20] Zhang J, Cormode G, Procopiuc C M, et al. Privbayes: private data release via Bayesian networks. In: Proceedings of ACM SIGMOD. New York: ACM, 2014. 1423--1434. Google Scholar

[21] Chen R, Xiao Q, Zhang Y, et al. Differentially private high-dimensional data publication via sampling-based inference. In: Proceedings of ACM SIGKDD. New York: ACM, 2015. 129--138. Google Scholar

[22] Sun L, Zhang L, Ge C. An algorithm for differential privacy streaming data publication based on matrix mechanism under exponential decay mode. Sci Sin-Inf, 2017, 47: 1493-1509 CrossRef Google Scholar

[23] Su D, Cao J, Li N. PrivPfC: differentially private data publication for classification. VLDB J, 2018, 27: 201-223 CrossRef Google Scholar

[24] Zhu T, Li G, Zhou W. Differentially Private Data Publishing and Analysis: A Survey. IEEE Trans Knowl Data Eng, 2017, 29: 1619-1638 CrossRef Google Scholar

[25] Qardaji W, Yang W N, Li N H. Priview: practical differentially private release of marginal contingency tables. In: Proceedings of ACM SIGMOD. New York: ACM, 2014. 1435--1446. Google Scholar

[26] Wang L, Wang W P, Meng D. Privacy preserving data publishing via weighted Bayesian networks, J Comput Res Develop, 2016, 53: 2343--2353. Google Scholar

[27] Zhang X J, Chen L, Jin K Z, et al. Private high-dimensional data publication with junction tree. J Comput Res Develop, 2018, 55: 2794--2809. Google Scholar

[28] Ye Q Q, Meng X F, Zhu M J, et al. Survey on local differential privacy. J Soft, 2018, 29: 159--183. Google Scholar

[29] Erlingsson Ú Korolova A, Pihur V. Rappor: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of ACM CCS. New York: ACM, 2014. 1054--1067. Google Scholar

[30] Chen R, Li H R, Qin A K, et al. Private spatial data aggregation in the local setting. In: Proceedings of IEEE ICDE, Piscataway, 2016. 289--300. Google Scholar

[31] Cormode G, Kulkarni T, Srivastava D. Marginal release under local differential privacy. In: Proceedings of ACM SIGMOD. New York: ACM, 2018. 131--146. Google Scholar

[32] Wang N, Xiao X K, Yang Y, et al. Privtrie: effective frequent term discovery under local differential privacy. In: Proceedings of IEEE ICDE, Piscataway, 2018. 821--832. Google Scholar

[33] Ye Q Q, Hu H B, Meng X F, et al. Privkv: key-value data collection with local differential privacy. In: Proceedings of IEEE S&P, Piscataway, 2019. 1--15. Google Scholar

[34] Zhang Z K, Wang T H, Li N H, et al. CALM: consistent adaptive local marginal for marginal release under local differential privacy. In: Proceedings of ACM CCS. New York: ACM, 2018. 212--229. Google Scholar

[35] Xiong S J, Sarwate A D, Mandayam N B. Randomized requantization with local differential privacy. In: Proceedings of IEEE ICASSP, Shanghai, 2016. 2189--2193. Google Scholar

[36] Sarwate A D, Sankar L. A rate-disortion perspective on local differential privacy. In: Proceedings of IEEE Allerton Conference on Communication, Control, and Computing, Monticello, 2015. 903--908. Google Scholar

[37] Warner S L. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J Am Statistical Association, 1965, 60: 63-69 CrossRef Google Scholar

[38] Kairouz P, Bonawitz K, Ramage D. Discrete distribution estimation under local privacy. In: Proceedings of ICML, 2016. 2436--2444. Google Scholar

[39] Kairouz P, Oh S, Viswanath P. Extremal mechanisms for local differential privacy. In: Proceedings of NIPS. Cambridge: MIT Press, 2014. 2879--2887. Google Scholar

[40] Wang S W, Huang L S, Wang P Z, et al. Mutual information optimally local private discrete distribution estimation. 2016,. arXiv Google Scholar

[41] Qin Z, Yu T, Yang Y, et al. Generating synthetic decentralized social graphs with local differential privacy. In: Proceedings of ACM CCS. New York: ACM, 2017. 425--438. Google Scholar

[42] Qin Z, Yang Y, Yu T, et al. Heavy hitter estimation over set-valued data with local differential privacy. In: Proceedings of ACM CCS. New York: ACM, 2016. 192--203. Google Scholar

[43] 王璞玉, 张海. 分布式隐私保护-Logistic回归. 中国科学: 信息科学, https://doi.org/10.1360/N112018-00214. Google Scholar

[44] Duchi J C, Jordan M I, Wainwright M J. Local privacy and statistical minimax rates. In: Proceedings of IEEE FOCS, Washington, 2013. 429--438. Google Scholar

[45] Fanti G, Pihur V, Erlingsson . Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries. Proc Privacy Enhancing Technologies, 2016, 2016(3): 41-61 CrossRef Google Scholar

[46] Ren X, Yu C M, Yu W. IEEE TransInformForensic Secur, 2018, 13: 2151-2166 CrossRef Google Scholar

[47] Zhang J, Cormode G, Procopiuc C M, et al. PrivBayes: private data release via Bayesian networks. ACM Trans Database Syst, 2017, 42: 25. Google Scholar

[48] Acs G, Castelluccia C, Chen R. Differentially private histogram publishing through lossy compression. In: Proceedings of IEEE ICDE, Washington, 2012. 1--10. Google Scholar

  • Table 1   Details of datasets
    Dataset Data type Dataset size Dimension Domain size (processed)
    NLTCS Binary 25174 16 ${2^{16}}$
    Adult Non-Binary 45222 15 $~\approx~{2^{26}}$
    TPC-E Non-Binary 40000 24 $~\approx~{2^{38}}$

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备17057255号       京公网安备11010102003388号