logo

SCIENTIA SINICA Informationis, Volume 47, Issue 11: 1493-1509(2017) https://doi.org/10.1360/N112017-00111

An algorithm for differential privacy streaming data publication based on matrix mechanism under exponential decay mode

More info
  • ReceivedMay 17, 2017
  • AcceptedJun 13, 2017
  • PublishedNov 14, 2017

Abstract

At present, many practical applications require the continuous release of statistical streaming data, and the importance of current data is higher than historical data. The solution to this problem is to assign weights to the data and propose a differential privacy data release method under exponential decay. However, existing methods only consider a single query, and cannot effectively use the correlation between queries in the continuous statistical publishing background to further improve the accuracy of the query. In this paper, we present a differential privacy data release algorithm (DMFDA) in exponential decay mode based on a matrix mechanism, which uses the advantages of the matrix to deal with relevant queries. Firstly, we use the construction method to generate the matrix decomposition strategy to meet the real-time requirements of streaming data. Secondly, the diagonal matrix is used to adjust the structure of the constructed strategy matrix so as to improve the release accuracy. Finally, according to the substructure of the constructed strategy matrix, a fast method of solving the diagonal matrix is proposed. The experiment is designed to compare DMFDA and similar algorithms for streaming data release in exponential decay. Experimental results show that the DMFDA algorithm is effective and feasible.


Funded by

国家自然科学基金(61300026)

福建省自然科学基金(2017J01754)


References

[1] Fung B, Wang K, Chen R, et al. Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv, 2010, 42: 2623--2627. Google Scholar

[2] Dwork C. Differential privacy. In: Proceedings of the 33rd International Conference on Automata, Languages and Programming, Venice, 2006. 1--12. Google Scholar

[3] Zhou S G, Li F, Tao Y F. Privacy preservation in database applications: a survey. Chin J Comp, 2009, 32: 847-861 CrossRef Google Scholar

[4] Xiong P, Zhu T Q, Wang X F. A survey on differential privacy and applications. Chin J Comput, 2014, 37: 101--122. Google Scholar

[5] Zhang X J, Meng X F. Differential privacy in data publication and analysis. Chin J Comput, 2014, 37: 927--949. Google Scholar

[6] Dwork C, Naor M, Pitassi T, et al. Differential privacy under continual observation. In: Proceedings of the 42nd ACM Symposium on Theory of Computing, Cambridge, 2010. 715--724. Google Scholar

[7] Chan T H H, Shi E, Song D. Private and continual release of statistics. ACM Trans Inf Syst Secur, 2011, 14: 405--417. Google Scholar

[8] Cao J, Xiao Q, Ghinita G, et al. Efficient and accurate strategies for differentially-private sliding window queries. In: Proceedings of the 16th International Conference on Extending Database Technology, Genoa, 2013. 191--202. Google Scholar

[9] Zhang X J, Meng X F. Stream histogram publication method with differential privacy. J Softw, 2016, 27: 381--393. Google Scholar

[10] Bolot J, Fawaz N, Muthukrishnan S, et al. Private decayed predicate sums on streams. In: Proceedings of the 16th International Conference on Database Theory, Genoa, 2013. 284--295. Google Scholar

[11] Li C, Hay M, Rastogi V, et al. Optimizing linear counting queries under differential privacy. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Indianapolis, 2010. 123--134. Google Scholar

[12] Yuan G, Zhang Z, Winslett M. Low-rank mechanism. Proc VLDB Endow, 2012, 5: 1352-1363 CrossRef Google Scholar

[13] Hay M, Rastogi V, Miklau G. Boosting the accuracy of differentially private histograms through consistency. Proc VLDB Endow, 2010, 3: 1021-1032 CrossRef Google Scholar

[14] Kellaris G, Papadopoulos S, Xiao X. Differentially private event sequences over infinite streams. Proc VLDB Endow, 2014, 7: 1155-1166 CrossRef Google Scholar

  • Figure 1

    Noise accumulation

  • Figure 2

    Comparison of query distortion with different moments (Search Logs). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 3

    Comparison of query distortion with different moments (Nettrace). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 4

    Comparison of query distortion with different moments (WorldCup98). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 5

    Comparison of query distortion with different decay factors (Search Logs). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 6

    Comparison of query distortion with different decay factors (Nettrace). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 7

    Comparison of query distortion with different decay factors (WorldCup98). (a) $\epsilon=1.0$; (b) $\epsilon=0.1$; (c) $\epsilon=0.01$

  • Figure 8

    Comparison of different algorithm efficiency

  • Figure 9

    Run time of diagonal optimization algorithm

  • Table 1   Data sets
    Data set Search Logs NetTrace WorldCup98
    Size 32768 65536 7518579
  •   

    Algorithm 1 策略矩阵构造算法BuildL

    Require:数据规模$N$,衰减因子$p$;

    Output:策略矩阵${\boldsymbol~L}$;

    初始化策略矩阵${\boldsymbol~L}$,将所有元素置为0;

    for $j=1$ to $N$

    $i=j$;

    while $i<N$ do

    更新矩阵元素$\textbf{{L}}[i][j]=p^{i-j}$;

    $i\leftarrow~i+\mathrm{lowbit}(i)$;

    end while

    end for

    返回矩阵${\boldsymbol~L}$.

  •   

    Algorithm 2 策略矩阵构造算法BuildB

    Require:数据规模$N$,衰减因子$p$;

    Output:还原矩阵${\boldsymbol~B}$;

    初始化策略矩阵${\boldsymbol~B}$,将所有元素置为0;

    for $i=1$ to $N$

    $j=i$;

    while $j>0$ do

    更新矩阵元素$\textbf{{B}}[i][j]=p^{i-j}$;

    $j\leftarrow~j-\mathrm{lowbit}(j)$;

    end while

    end for

    返回矩阵${\boldsymbol~B}$.

  •   

    Algorithm 3 指数衰减模式下差分隐私流数据发布DM

    Require:预设时刻上限$T$,衰减因子$p$;

    Output:每一时刻的发布结果$s_{t}$;

    for $t=1$ to $T$

    更新实际统计量$\Phi~_{\mathrm{lowbit}(t)}\leftarrow~D_{t}+\sum_{j=0}^{\mathrm{lowbit}(t)-1}\Phi~_{j}$;

    添加噪声$\overset{\sim~}{\Phi~_{\mathrm{lowbit}(t)}}\leftarrow~\Phi~_{\mathrm{lowbit}(t)}+\mathrm{Lap}(\frac{\Delta~\textbf{{L}}}{\epsilon~})$;

    $k\leftarrow~t,~s_{t}\leftarrow~0$;

    while $k>0$ do

    $s_{t}\leftarrow~s_{t}+(\overset{\sim~}{\Phi~_{\mathrm{lowbit}(k)}}\times~p^{i-k})$;

    $k\leftarrow~k-\mathrm{lowbit}(k)$;

    end while

    发布隐私数据$s_{t}$.

    end for

  •   

    Algorithm 4 对角阵系数求解算法getLambda

    Require:时刻上限$T$, 下标$k$, 衰减因子$p$;

    Output:对角阵系数$\lambda~_{k}$;

    初始化$\lambda~_{k}$为1, 根据式(32)计算出所需的系数$\delta~_{1}~\sim~\delta~_{\mathrm{log}_{2}(T)+1}$;

    $kt\leftarrow~k,~m\leftarrow~\mathrm{log}_{2}(T)+1,~\mathrm{div}\leftarrow~2^{m-1}$;

    while $\mathrm{div}\neq~kt$ do

    if $kt<\mathrm{div}$ then

    $\lambda~_{k}\leftarrow~\lambda~_{k}\times~\delta~_{m}$;

    else

    $kt\leftarrow~kt-\mathrm{div}$;

    end if

    $\mathrm{div}~\leftarrow~\frac{\mathrm{div}}{2},~m\leftarrow~m-1$;

    end while

    $\lambda~_{k}\leftarrow~\frac{\lambda~_{k}\times~(1-\delta~_{m})}{p}$;

    返回对角阵系数$\lambda~_{k}$.

  •   

    Algorithm 5 指数衰减模式下的基于对角矩阵优化差分隐私流数据发布算法DMFDA

    Require:时刻上限$T$, 衰减因子$p$;

    Output:每一时刻的发布结果$s_{t}$;

    for $t=1$ to $T$

    更新实际统计量$\Phi~_{\mathrm{lowbit}(t)}\leftarrow~D_{t}+\sum_{j=0}^{\mathrm{lowbit}(t)-1}\Phi~_{j}$;

    $\lambda~_{t}\leftarrow~\mathrm{getLambda}(T,t,p)$;

    添加噪声$\overset{\sim~}{\Phi~_{\mathrm{lowbit}(t)}}\leftarrow~\lambda~_{t}\times~\Phi~_{\mathrm{lowbit}(t)}+\mathrm{Lap}(\frac{1}{\epsilon~})$;

    $k\leftarrow~t,~s_{t}\leftarrow~0$;

    while $k>0$ do

    $s_{t}\leftarrow~s_{t}+\frac{(\overset{\sim~}{\Phi~_{\mathrm{lowbit}(k)}}\times~p^{i-k})}{\lambda~_{k}}$;

    $k\leftarrow~k-\mathrm{lowbit}(k)$;

    end while

    发布隐私数据$s_{t}$.

    end for

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1