logo

SCIENTIA SINICA Informationis, Volume 46, Issue 9: 1255-1275(2016) https://doi.org/10.1360/N112016-00060

Research progress in the complexity theory and algorithms of big-data computation

More info
  • ReceivedMar 24, 2016
  • AcceptedMay 31, 2016
  • PublishedSep 9, 2016

Abstract

With the explosive growth of available data in recent years, big-data research has attracted much attention from both academic and industrial researchers, and many significant advances have been achieved. Nevertheless, the fundamental research results are far from the actual needs, a number of key issues remain unresolved, a complete theory of big-data computation needs to be established, and considerable work remains to be accomplished. This paper focuses on the theoretical aspects of big-data research, especially the research progress of the complexity theory and algorithms of big-data computation. First, big-data computation is formally defined. Then, six challenges and ten research issues of big data are discussed. Next, the results of a survey on the research progress of complexity theory and algorithms of big-data computation are given. Finally, comments on the fundamental research results and the future theoretical research issues of big data are presented and discussed.


Funded by

国家重点基础研究发展计划(61125106)


References

[1] Shoshani A. Statistical databases: characteristics, problems, and some solutions. In: Proceedings of the 8th International Conference on Very Large Data Bases, Mexico City, 1982. 208-222. Google Scholar

[2] Shoshani A, Olken F, Wong H K T. Characteristics of scientific databases. In: Proceedings of the 10th International Conference on Very Large Data Bases, Singapore, 1984. 147-160. Google Scholar

[3] Shoshani A, Wong H K T. Statistical and scientific database issues. IEEE Trans Softw Eng, 1985, 11: 1040-1047. Google Scholar

[4] Turing A M. On computable numbers, with an application to the entscheidungs problem. Proc London Math Soc, 1936, 2: 230-265. Google Scholar

[5] 李建中. 大数据计算的挑战. 见: 香山科学会议, 北京, 2012. Google Scholar

[6] 李建中. 大数据计算的基本概念与研究问题. 见: 国家基金委第89期双清论坛, 上海, 2014. Google Scholar

[7] Li J Z. Complexity, algorithms and quality of big data intensive computing. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications, Bali, 2014. 230-265. Google Scholar

[8] 李建中. 大数据计算的研究问题和部分解. 见: 第30届中国数据库学术会议, 哈尔滨, 2013. Google Scholar

[9] Kleene S C. General recursive functions of natural numbers. MATH ANN, 1936, 112: 727-742 CrossRef Google Scholar

[10] Post E L. Finite combinatory processes-formulation 1. J Symb Log, 1936, 1: 103-105 CrossRef Google Scholar

[11] Church A. The Calculi of Lambda-Conversion. Princeton: Princeton University Press, 1951. Google Scholar

[12] Kleene S C. Introduction to Metamathematics. Japan: Ishi Press, 1952. Google Scholar

[13] Hermes H. Enumerability, Decidability, Computability. Berlin: Springer, 1965. Google Scholar

[14] Minsky M L. Computation: Finite and Infinite Machines. Upper Saddle River: Prentice-Hall Inc., 1967. Google Scholar

[15] Davis M. Computability and Unsolvability. New York: McGraw-Hill, 1958. Google Scholar

[16] Shepherdson J C, Sturgis H E. Computability of recursive functions. J ACM, 1963, 10: 217-255 CrossRef Google Scholar

[17] Cook S A, Reckhow R A. Time-bounded random access machines. In: Proceedings of the 4th Annual ACM Symposium on Theory of Computing, Denver, 1972. 73-80. Google Scholar

[18] Harrison M A. Introduction to Switching and Automata Theory. New York: MacGraw-Hill, 1965. Google Scholar

[19] Savage J E. The Complexity of Computing. New York: Wiley, 1976. Google Scholar

[20] Steven F, James W. Parallelism in random access machines. In: Proceedings of the 10th Annual ACM Aymposium on Theory of Computing, San Diego, 1978. 114-118. Google Scholar

[21] van Leeuwen J. Handbook of Theoretical Computer Science (vol. A): Algorithms and Complexity. Cambridge: MIT Press, 1991. Google Scholar

[22] Karloff H, Suri S, Vassilvitskii S. A model of computation for mapreduce. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, Austin, 2010. 938-948. Google Scholar

[23] Sarma A D, Afrati F N, Salihoqlu S, et al. Upper and lower bounds on the cost of a map-reduce computation. Proc VLDB Endowment, 2013, 6: 277-288 CrossRef Google Scholar

[24] Tao Y F, Lin W Q, Xiao X K. Minimal mapreduce algorithms. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 2013. 529-540. Google Scholar

[25] Afrati F N, Borkar V, Carey M, et al. Map-reduce extensions and recursive queries. In: Proceedings of the 14th International Conference on Extending Database Technology, Uppsala, 2011. 1-8. Google Scholar

[26] Afrati F N, Ullman J D. Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, Lausanne, 2010. 99-110. Google Scholar

[27] Afrati F N, Ullman J D. Transitive closure and recursive datalog implemented on clusters. In: Proceedings of the 15th International Conference on Extending Database Technology, Berlin, 2012. 132-143. Google Scholar

[28] Afrati F N, Fotakis D, Ullman J D. Enumerating subgraph instances using map-reduce. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, 2013. 62-73. Google Scholar

[29] Afrati F N, Sarma A D, Menestrina D, et al. Fuzzy joins using mapreduce. In: Proceedings of IEEE 28th International Conference on Data Engineering (ICDE), Washington, 2012. 498-509. Google Scholar

[30] Afrati F N, Koutris P, Suciu D, et al. Parallel skyline queries. Theory Comput Syst, 2015, 57: 1008-1037 CrossRef Google Scholar

[31] Beame P, Koutris P, Suciu D. Communication steps for parallel query processing. In: Proceedings of the 32nd ACM Symposium on Principles of Database Systems, New York, 2013. 273-284. Google Scholar

[32] Fan W F, Geerts F, Neven F. Making queries tractable on big data with preprocessing. Proc VLDB Endowment, 2013, 6: 685-696 CrossRef Google Scholar

[33] Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM, 2008, 51: 107-113. Google Scholar

[34] Wang J B, Wu S, Gao H, et al. Indexing multi-dimensional data in a cloud system. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Indianapolis, 2010. 591-602. Google Scholar

[35] Alper O, Mirek R. Processing theta-joins using mapreduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, Athens, 2011. 949-960. Google Scholar

[36] Kim Y, Shim K. Parallel top-k similarity join algorithms using mapreduce. In: Proceedings of IEEE 28th International Conference on Data Engineering (ICDE), Washington, 2012. 510-521. Google Scholar

[37] Lu W, Shen Y Y, Chen S, et al. Efficient processing of k nearest neighbor joins using mapreduce. Proc VLDB Endowment, 2012, 5: 1016-1027 CrossRef Google Scholar

[38] Zhang X F, Chen L, Wang M. Efficient multi-way theta-join processing using mapreduce. Proc VLDB Endowment, 2012, 5: 1184-1195 CrossRef Google Scholar

[39] Chen G, Vo H T, Wu S, et al. A framework for supporting dbms-like indexes in the cloud. Proc VLDB Endowment, 2011, 4: 702-713. Google Scholar

[40] Miliaraki I, Berberich K, Gemulla R, et al. Mind the gap: large-scale frequent sequence mining. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 2013. 797-808. Google Scholar

[41] Amol G, Rajasekar K, Edwin P D P, et al. Systemml: declarative machine learning on mapreduce. In: Proceedings of IEEE 27th International Conference on Data Engineering (ICDE), Hannover, 2011. 231-242. Google Scholar

[42] Zhang Z J, Shu H, Chong Z H, et al. C-cube: elastic continuous clustering in the cloud. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, 2013. 577-588. Google Scholar

[43] Pansare N, Borkar V R, Jermaine C, et al. Online aggregation for large mapreduce jobs. Proc VLDB Endowment, 2011, 4: 1135-1145. Google Scholar

[44] Grover R, Carey M J. Extending map-reduce for efficient predicate-based sampling. In: Proceedings of IEEE 28th International Conference on Data Engineering (ICDE), Washington, 2012. 486-497. Google Scholar

[45] Jestes J, Yi K, Li F F. Building wavelet histograms on large data in mapreduce. Proc VLDB Endowment, 2011, 5: 109-120 CrossRef Google Scholar

[46] Huang B T, Babu S, Yang J. Cumulon: optimizing statistical data analysis in the cloud. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 2013. 1-12. Google Scholar

[47] Zhang C, Ré C. Towards high-throughput gibbs sampling at scale: a study across storage managers. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 2013. 397-408. Google Scholar

[48] Zheng Y, Jestes J, Phillips J M, et al. Quality and efficiency for kernel density estimates in large data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 2013. 433-444. Google Scholar

[49] Wong H K T, Li J Z, Olken F, et al. Bit transposition for very large scientific and statistical databases. Algorithmica, 1986, 1: 289-309 CrossRef Google Scholar

[50] Li J Z, Rotem D, Wong H K T. A new compression method with fast searching on large databases. In: Proceedings of the 13th International Conference on Very Large Data Bases, Brighton, 1987. 311-318. Google Scholar

[51] Li J Z, Harry K T W, Doron R. Batched interpolation searching on databases. In: Proceedings of the IEEE 3rd International Conference on Data Engineering (ICDE), Los Angeles, 1987. 79-97. Google Scholar

[52] Wong H K T, Li J Z, Wong H K T, et al. Transposition algorithms on very large compressed databases. In: Proceedings of the 12th International Conference on Very Large Data Bases, Kyoto, 1986. 304-311. Google Scholar

[53] Li J Z, Rotem D, Srivastava J. Aggregation algorithms for very large compressed data warehouses. In: Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, 1999. 651-662. Google Scholar

[54] Wu W L, Gao H, Li J Z. New algorithm for computing cube on very large compressed data sets. IEEE Trans Knowl Data Eng, 2006, 18: 1667-1680 CrossRef Google Scholar

[55] Fan W F, Li J Z, Wang X, et al. Query preserving graph compression. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Scottsdale, 2012. 157-168. Google Scholar

[56] Zhang S, Li J Z, Gao H, et al. A novel approach for efficient supergraph query processing on graph databases. In: Proceedings of the 12th International Conference on Extending Database Technology, Petersburg, 2009. 204-215. Google Scholar

[57] Cheng S Y, Li J Z. Sampling based (epsilon, delta)-approximate aggregation algorithm in sensor networks. In: Proceedings of the 29th IEEE International Conference on Distributed Computing Systems (ICDCS), Montreal, 2009. 273-280. Google Scholar

[58] Li J Z, Cheng S Y. ({\(\varepsilon\)}, {\(\delta\)})-approximate aggregation algorithms in dynamic sensor networks. IEEE Trans Parall Distrib Syst, 2012, 23: 385-396 CrossRef Google Scholar

[59] Cheng S Y, Li J Z, Ren Q Q, et al. Bernoulli sampling based ({\(\varepsilon\)}, {\(\delta\)})-approximate aggregation in large-scale sensor networks. In: Proceedings of IEEE Conference on Computer Communications (INFOCOM), San Diego, 2010. 1181-1189. Google Scholar

[60] Cheng S Y, Cai Z P, Li J Z, et al. Drawing dominant dataset from big sensory data in wireless sensor networks. In: Proceedings of the IEEE Conference on Computer Communications (INFOCOM), Kowloon, 2015. 147-152. Google Scholar

[61] Liu Y, Li J Z, Gao H, et al. Enabling {\(\varepsilon\)}-approximate querying in sensor networks. Proc VLDB Endowment, 2009, 2: 169-180 CrossRef Google Scholar

[62] Gao J, Li J Z, Zhang Z G, et al. An incremental data stream clustering algorithm based on dense units detection. In: Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD), Hanoi, 2005. 420-425. Google Scholar

[63] Fan W F, Li J Z, Luo J Z, et al. Incremental graph pattern matching. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Athens, 2011. 925-936. Google Scholar

[64] Fan W F, Li J Z, Tang N, et al. Incremental detection of inconsistencies in distributed data. In: Proceedings of IEEE 28th International Conference on Data Engineering (ICDE), Washington, 2012. 318-329. Google Scholar

[65] Brodal G S, Tsakalidis K, Sioutas S, et al. Fully persistent b-trees. In: Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, Kyoto, 2012. 602-614. Google Scholar

[66] Ulrich M, Norbert Z. I/O-efficient shortest path algorithms for undirected graphs with random or bounded edge lengths. ACM Trans Algorithms, 2012, 8: 22. Google Scholar

[67] Rasmussen C K, Tao Y F, Tsakalidis K, et al. I/O-efficient planar range skyline and attrition priority queues. In: Proceedings of the 32nd ACM Symposium on Principles of Database Systems, New York, 2013. 103-114. Google Scholar

[68] Huang J W, Venkatraman K, Abadi D J. Query optimization of distributed pattern matching. In: Proceedings of IEEE 30th International Conference on Data Engineering (ICDE), Chicago, 2014. 64-75. Google Scholar

[69] Deng D, Li G L, Hao S, et al. Massjoin: a mapreduce-based method for scalable string similarity joins. In: Proceedings of IEEE 30th International Conference on Data Engineering (ICDE), Chicago, 2014. 340-351. Google Scholar

[70] Minsik C, Daniel B, Rajesh B, et al. Paradis: an efficient parallel algorithm for in-place radix sort. Proc VLDB Endowment, 2015, 8: 1518-1529 CrossRef Google Scholar

[71] Shumo C, Magdalena B, Dan S. From theory to practice: efficient join query evaluation in a parallel database system. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Melbourne, 2015. 63-78. Google Scholar

[72] Jennie D, Olga P, Leilani B, et al. Skew-aware join optimization for array databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Melbourne, 2015. 123-135. Google Scholar

[73] Orestis P, Rajkumar S, Kenneth A R. Track join: distributed joins with minimal network traffic. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Snowbird, 2014. 1483-1494. Google Scholar

[74] Amr E, Venkatesh R, Mohamed A S, et al. Optimization of common table expressions in mpp database systems. Proc VLDB Endowment, 2015, 8: 1704-1715 CrossRef Google Scholar

[75] Han X X, Li J Z, Wang J B, et al. Tjje: an efficient algorithm for top-k join on massive data. Inf Sci, 2013, 222: 362-383 CrossRef Google Scholar

[76] Han X X, Li J Z, Yang D H, et al. Efficient skyline computation on big data. IEEE Trans Knowl Data Eng, 2013, 25: 2521-2535 CrossRef Google Scholar

[77] Khayyat Z, Lucia W, Singh M, et al. Lightning fast and space efficient inequality joins. Proc VLDB Endowment, 2015, 8: 2074-2085 CrossRef Google Scholar

[78] Fan W F, Wang X, Wu Y H. Querying big graphs within bounded resources. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Snowbird, 2014. 301-312. Google Scholar

[79] Fan W F, Geerts F, Cao Y, et al. Querying big data by accessing small data. In: Proceedings of the 34th ACM Symposium on Principles of Database Systems, Melbourne, 2015. 173-184. Google Scholar

[80] Fan W F, Geerts F, Libkin L. On scale independence for querying big data. In: Proceedings of the 33rd ACM Symposium on Principles of Database Systems, Snowbird, 2014. 51-62. Google Scholar

[81] Cao Y, Fan W F, Huai J P, et al. Making pattern queries bounded in big graphs. In: Proceedings of the IEEE 31th International Conference on Data Engineering (ICDE), Seoul, 2015. 161-172. Google Scholar

[82] Fan W F, Wang X, Wu Y H, et al. Distributed graph simulation: impossibility and possibility. Proc VLDB Endowment, 2014, 7: 1083-1094 CrossRef Google Scholar

[83] Fan W F, Wang X, Wu Y H, et al. Association rules with graph patterns. Proc VLDB Endowment, 2015, 8: 1502-1513 CrossRef Google Scholar

[84] Huang J W, Abadi D J, Ren K. Scalable sparql querying of large rdf graphs. Proc VLDB Endowment, 2011, 4: 1123-1134. Google Scholar

[85] Zhang X F, Chen L, Tong Y X, et al. Eagre: towards scalable I/O efficient sparql query evaluation on the cloud. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, 2013. 565-576. Google Scholar

[86] Zeng K, Yang J C, Wang H X, et al. A distributed graph engine for web scale rdf data. Proc VLDB Endowment, 2013, 6: 265-276 CrossRef Google Scholar

[87] Yuan P P, Liu P, Wu B W, et al. Triplebit: a fast and compact system for large scale rdf data. Proc VLDB Endowment, 2013, 6: 517-528 CrossRef Google Scholar

[88] Zheng W G, Zou L, Feng Y S, et al. Efficient simrank-based similarity join over large graphs. Proc VLDB Endowment, 2013, 6: 493-504 CrossRef Google Scholar

[89] Mondal J, Deshpande A. Managing large dynamic graphs efficiently. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Scottsdale, 2012. 145-156. Google Scholar

[90] Yang S Q, Yan X F, Zong B, et al. Towards effective partition management for large graphs. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Scottsdale, 2012. 517-528. Google Scholar

[91] Zhao P X, Aggarwal C C, Wang M. Gsketch: on query estimation in graph streams. Proc VLDB Endowment, 2011, 5: 193-204 CrossRef Google Scholar

[92] Shao Y X, Cui B, Chen L, et al. Parallel subgraph listing in a large-scale graph. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Snowbird, 2014. 625-636. Google Scholar

[93] Li Z G, Fang Y X, Liu Q, et al. Walking in the cloud: parallel simrank at scale. Proc VLDB Endowment, 2015, 9: 24-35 CrossRef Google Scholar

[94] Zhu Y Y, Yu J X, Qin L. Leveraging graph dimensions in online graph search. Proc VLDB Endowment, 2014, 8: 85-96 CrossRef Google Scholar

[95] Qi Z C, Xiao Y H, Shao B, et al. Toward a distance oracle for billion-node graphs. Proc VLDB Endowment, 2013, 7: 61-72 CrossRef Google Scholar

[96] Qin L, Yu J X, Chang L J, et al. Scalable big graph processing in mapreduce. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Snowbird, 2014. 827-838. Google Scholar

[97] Levin R, Kanza Y. Stratified-sampling over social networks using mapreduce. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Snowbird, 2014. 863-874. Google Scholar

[98] Zhu A D, Lin W Q, Wang S B, et al. Reachability queries on large dynamic graphs. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Snowbird, 2014. 1323-1334. Google Scholar

[99] Gao J, Jin R M, Zhou J S, et al. Relational approach for shortest path discovery over large graphs. Proc VLDB Endowment, 2011, 5: 358-369 CrossRef Google Scholar

[100] Sun Z, Wang H Z, Wang H X, et al. Efficient subgraph matching on billion node graphs. Proc VLDB Endowment, 2012, 5: 788-799 CrossRef Google Scholar

[101] Jin R M, Ruan N, Dey S, et al. Scarab: scaling reachability computation on large graphs. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Scottsdale, 2012. 169-180. Google Scholar

[102] Chen H Q, Ku W, Wang H X, et al. Linkprobe: probabilistic inference on large-scale social networks. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, 2013. 290-301. Google Scholar

[103] Xiang J, Guo C, Aboulnaga A. Scalable maximum clique computation using mapreduce. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, 2013. 74-85. Google Scholar

[104] Zhang Z W, Yu J X, Qin L, et al. I/O efficient: computing sccs in massive graphs. VLDB J, 2015, 24: 245-270 CrossRef Google Scholar

[105] Shun J, Tangwongsan K. Multicore triangle computations without tuning. In: Proceedings of IEEE 31st International Conference on Data Engineering (ICDE), Seoul, 2015. 149-160. Google Scholar

[106] Chitnis L, Das S A, Machanavajjhala A, et al. Finding connected components in map-reduce in logarithmic rounds. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, 2013. 50-61. Google Scholar

[107] Beedkar K, Gemulla R. Lash: large-scale sequence mining with hierarchies. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Melbourne, 2015. 491-503. Google Scholar

[108] Buehrer G, De O R L, Fuhry D, et al. Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In: Proceedings of IEEE 31st International Conference on Data Engineering (ICDE), Seoul, 2015. 1071-1082. Google Scholar

[109] Schelter S, Soto J, Markl V, et al. Efficient sample generation for scalable meta learning. In: Proceedings of IEEE 31st International Conference on Data Engineering (ICDE), Seoul, 2015. 1191-1202. Google Scholar

[110] Riondato M, Debrabant J A, Fonseca R, et al. Parma: a parallel randomized algorithm for approximate association rules mining in mapreduce. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), Maui, 2012. 85-94. Google Scholar

[111] Riondato M, Upfal E. Mining frequent itemsets through progressive sampling with rademacher averages. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1005-1014. Google Scholar

[112] Wang P H, Lui J C S, Towsley D. Minfer: inferring motif statistics from sampled edges. In: Proceedings of IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, 2016. 1050-1061. Google Scholar

[113] Liang Y Y, Xie B, Woodruff D, et al. Communication efficient distributed kernel principal component analysis. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016, in press. Google Scholar

[114] Zhang A, Gu Q Q. Accelerated stochastic block coordinate descent with optimal sampling. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016, in press. Google Scholar

[115] Yang Y, Chen J F, Zhu J. Distributing the stochastic gradient sampler for large-scale lda. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016, in press. Google Scholar

[116] Kang U, Chau D H, Faloutsos C. Mining large graphs: algorithms, inference, and discoveries. In: Proceedings of IEEE 27th International Conference on Data Engineering (ICDE), Hannover, 2011. 243-254. Google Scholar

[117] Morales G D F, Gionis A, Sozio M. Social content matching in mapreduce. Proc VLDB Endowment, 2011, 4: 460-469 CrossRef Google Scholar

[118] Bahman B, Kaushik C, Dong X. Fast personalized pagerank on mapreduce. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Athens, 2011. 973-984. Google Scholar

[119] Chu L Y, Wang S H, Liu S Y, et al. Alid: scalable dominant cluster detection. Proc VLDB Endowment, 2015, 8: 826-837 CrossRef Google Scholar

[120] Lin W Q, Xiao X K, Ghinita G. Large-scale frequent subgraph mining in mapreduce. In: Proceedings of IEEE 30th International Conference on Data Engineering (ICDE), Chicago, 2014. 844-855. Google Scholar

[121] Alvanaki F, Michel S. Tracking set correlations at large scale. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Snowbird, 2014. 1507-1518. Google Scholar

[122] Li F, Ozsu M T, Chen G, et al. R-store: a scalable distributed system for supporting real-time analytics. In: Proceedings of IEEE 30th International Conference on Data Engineering (ICDE), Chicago, 2014. 40-51. Google Scholar

[123] Scholer F, Williams H E, Yiannis J, et al. Compression of inverted indexes for fast query evaluation. In: Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, Tampere, 2002. 222-229. Google Scholar

[124] Sihem A, Theodore J. Optimizing queries on compressed bitmaps. In: Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, 2010. 329-338. Google Scholar

[125] Yan Y, Zhang J X, Huang B J, et al. Distributed outlier detection using compressive sensing. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Melbourne, 2015. 3-16. Google Scholar

[126] Chang L J, Yu J X, Qin L, et al. Efficiently computing k-edge connected components via graph decomposition. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 2013. 205-216. Google Scholar

[127] Song C Y, Ge T J, Chen C, et al. Event pattern matching over graph streams. Proc VLDB Endowment, 2014, 8: 413-424 CrossRef Google Scholar

[128] Shiokawa H, Fujiwara Y, Onizuka M. Scan++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. Proc VLDB Endowment, 2015, 8: 1178-1189 CrossRef Google Scholar

[129] Liu Y, Lu J H, Yang H, et al. Towards maximum independent sets on massive graphs. Proc VLDB Endowment, 2015, 8: 2122-2133 CrossRef Google Scholar

[130] Min F, Narayanan S, Hector G M, et al. Computing iceberg queries efficiently. In: Proceedings of the 24th International Conference on Very Large Data Bases, New York, 1998. 299-310. Google Scholar

[131] Li N, Guan Z Y, Ren L J, et al. Giceberg: towards iceberg analysis in large graphs. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, 2013. 1021-1032. Google Scholar

[132] Cui W Y, Xiao Y H, Wang H X, et al. Local search of communities in large graphs. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Snowbird, 2014. 991-1002. Google Scholar

[133] Huang X, Cheng H, Qin L, et al. Querying k-truss community in large and dynamic graphs. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Snowbird, 2014. 1311-1322. Google Scholar

[134] Elgamal T, Yabandeh M, Aboulnaga A, et al. Spca: scalable principal component analysis for big data on distributed platforms. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Melbourne, 2015. 79-91. Google Scholar

[135] Yu L L, Shao Y X, Cui B. Exploiting matrix dependency for efficient distributed matrix computation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Melbourne, 2015. 93-105. Google Scholar

[136] Ghashami M, Phillips J M, Li F F. Continuous matrix approximation on distributed data. Proc VLDB Endowment, 2014, 7: 809-820 CrossRef Google Scholar

[137] Papapetrou O, Garofalakis M. Continuous fragmented skylines over distributed streams. In: Proceedings of IEEE 30th International Conference on Data Engineering (ICDE), Chicago, 2014. 124-135. Google Scholar

[138] Cao L, Yang D, Wang Q Y, et al. Scalable distance-based outlier detection over high-volume data streams. In: Proceedings of IEEE 30th International Conference on Data Engineering (ICDE), Chicago, 2014. 76-87. Google Scholar

[139] Aggarwal C C, Yu P S. On historical diagnosis of sensor streams. In: Proceedings of IEEE 31st International Conference on Data Engineering (ICDE), Seoul, 2015. 185-194. Google Scholar

[140] Sadoghi M, Jacobsen H. Adaptive parallel compressed event matching. In: Proceedings of IEEE 30th International Conference on Data Engineering (ICDE), Chicago, 2014. 364-375. Google Scholar

[141] Reynold S X, Josh R, Matei Z, et al. Shark: {SQL} and rich analytics at scale. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 2013. 13-24. Google Scholar

[142] Tsytsarau M, Amer-yahia S, Palpanas T. Efficient sentiment correlation for large-scale demographics. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 2013. 253-264. Google Scholar

[143] Gatterbauer W, Nnemann S, Koutra D, et al. Linearized and single-pass belief propagation. Proc VLDB Endowment, 2014, 8: 581-592. Google Scholar

[144] Park Y, Min J K, Shim K. Parallel computation of skyline and reverse skyline queries using mapreduce. Proc VLDB Endowment, 2013, 6: 2002-2013 CrossRef Google Scholar

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号