SCIENTIA SINICA Informationis, Volume 50 , Issue 5 : 637-648(2020) https://doi.org/10.1360/SSI-2019-0179

## RLO: a reinforcement learning-based method for join optimization

Bin CUI 1,*,
• AcceptedNov 6, 2019
• PublishedApr 27, 2020
Share
Rating

### Abstract

Join optimization is one of the most important research problems in database systems. Traditional join optimizers are usually proposed based on heuristics, which are expensive and often fail to generate the optimal execution plan. There are two reasons accounting for this. (1) The optimizers are based on heuristics and only explore a subset of the search space. (2) They do not use the history logs and cannot estimate the goodness of their generated plans on a specific join problem. To tackle these challenges, we propose RLO, a reinforcement learning-based optimizer for join optimization. We model the join optimization problem as a Markov decision process and use deep $Q$-learning to estimate the possible reward of a possible operation. To boost the effectiveness of RLO, we further propose a tree-based embedding method to represent the “state" and use a beam search to avoid missing the optimal plans. We implement RLO in Apache Calcite and Postgres. Extensive experiments demonstrate that: (1) Apache Calcite RLO is $10~\times$–$56~\times$ faster in finding the execution plan and 80% faster in executing the plan than the state-of-the-art heuristics. (2) Compared with the native Postgres implementation, RLO can be $14~\times$ faster in finding the execution plan and 12.9% faster in an end-to-end comparison.

### References

[1] Babcock B, Chaudhuri S. Towards a robust query optimizer: a principled and practical approach. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, 2005. 119--130. Google Scholar

[2] Krishnamurthy R, Boral H, Zaniolo C. Optimization of nonrecursive queries. In: Proceedings of VLDB, 1986. 128--137. Google Scholar

[3] Selinger P G, Astrahan M M, Chamberlin D D, et al. Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, 1979. 23--34. Google Scholar

[4] Leis V, Gubichev A, Mirchev A, et al. How good are query optimizers, really? In: Proceedings of the VLDB Endowment, 2015. 204--215. Google Scholar

[5] Waas F, Pellenkoft A. Join order selection (good enough is easy). In: Proceedings of British National Conference on Databases, 2000. 51--67. Google Scholar

[6] Hester T, Vecerik M, Pietquin O, et al. Deep q-learning from demonstrations. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018. Google Scholar

[7] Ziane M, Za?t M, Borla-Salamet P. Parallel query processing with zigzag trees. In: Proceedings of the International Journal on Very Large Data Bases, 1993. 277--302. Google Scholar

[8] Chande S V, Sinha M. Genetic optimization for the join ordering problem of database queries. In: Proceedings of 2011 Annual IEEE India Conference, 2011. 1--5. Google Scholar

[9] Kipf A, Kipf T, Radke B, et al. Learned cardinalities: estimating correlated joins with deep learning. 2018,. arXiv Google Scholar

[10] Ortiz J, Balazinska M, Gehrke J, et al. Learning state representations for query optimization with deep reinforcement learning. In: Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018. 1--4. Google Scholar

[11] Marcus R, Papaemmanouil O. Deep reinforcement learning for join order enumeration. In: Proceedings of the 1st International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, 2018. 3. Google Scholar

[12] Krishnan S, Yang Z, Goldberg K, et al. Learning to optimize join queries with deep reinforcement learning. 2018,. arXiv Google Scholar

[13] Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks? In: Proceedings of Advances in Neural Information Processing Systems, 2014. 3320--3328. Google Scholar

[14] Sutton R S, Barto A G. Reinforcement learning: an introduction. Cambridge: MIT Press, 2018. 34--152. Google Scholar

[15] Begoli E, Camacho-Rodríguez J, Hyde J, et al. Apache calcite: a foundational framework for optimized query processing over heterogeneous data sources. In: Proceedings of the 2018 International Conference on Management of Data, 2018. 221--230. Google Scholar

• Figure 1

(Color online) Overview for RLO

• Figure 2

(Color online) Tree-based embedding method for states

• Figure 3

(Color online) Time cost of searching the execution plan of different optimizers on Apache Calcite

• Figure 4

(Color online) Relative execution cost of different optimizers

• 图 5

(网络版彩图) Postgres上各个优化方法的搜索时间

• 图 6

(网络版彩图) 不同优化器的执行时间

• Figure 7

(Color online) Tradeoff between optimization and running time

Citations

• #### 0

Altmetric

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有