logo

SCIENCE CHINA Information Sciences, Volume 60, Issue 1: 012103(2017) https://doi.org/10.1007/s11432-015-0764-7

Evaluating the impacts of hugepage on virtual machines

More info
  • ReceivedMar 25, 2016
  • AcceptedMay 10, 2016
  • PublishedNov 29, 2016

Abstract

Modern applications often require a large amount of memory. Conventional 4KB pages lead to large page tables and thus exert high pressure on TLB address translations. This pressure is more prominent in a virtualized system, which adds an additional layer of address translation. Page walks due to TLB misses can result in a significant performance overhead. One effort in reducing this overhead is to use hugepage. Linux kernel has supported transparent hugepage since 2.6.38, which provides an alternate large page size. Generally, hugepage demonstrates better performance on address translations and page table modifications. This paper first analyzes the impact of hugepage on native system, and then, compares the impact of hugepage on different memory virtualization approaches: hardware-assisted paging (HAP), shadow paging, and para-virtualization. We observe that the current implementation of transparent hugepage is inefficient. It cannot exploit the full performance advantage of hugepages. Worse yet, the conservative strategy of transparent hugepage may conflict with existing OS functions, which can lead to performance degradation. So, we propose a new memory allocation strategy, alignment-based hugepage (ABH) that promotes hugepage allocations. We apply ABH to different paging modes in virtualized systems. The results show that the new allocation strategy can significantly reduce TLB misses and up to 90\% page walk cycles due to TLB misses and thus improve the performance in real world applications.


Funded by

National Natural Science Foundation of China(61232008)

National Natural Science Foundation of China(61272158)

National Natural Science Foundation of China(61328201)

National Natural Science Foundation of China(61472008)

National Natural Science Foundation of China(61170055)

National High-tech R&D Program of China(863)

(2012AA010905)

(2015AA015305)

Research Fund for the Doctoral Program of Higher Education of China(20110001110101)

. Zhenlin WANG is also supported by National Science Foundation(CSR1422342)


Acknowledgment

Acknowledgments

This work was supported by National Natural Science Foundation of China (Grant Nos. 61232008, 61272158, 61328201, 61472008, 61170055), National High-tech R&D Program of China (863) (Grant Nos. 2012AA010905, 2015AA015305), Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20110001110101). Zhenlin WANG is also supported by National Science Foundation (Grant No. CSR1422342).


References

[1] Henning J L. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput Architect News, 2006, 34: 1-17 Google Scholar

[2] Bienia C, Kumar S, Singh J P, et al. The parsec benchmark suite: characterization and architectural implications. In: {Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques}. New York: ACM, 2008. 72--81. Google Scholar

[3] Bhargava R, Serebrin B, Spadini F, et al. Accelerating two-dimensional page walks for virtualized systems. ACM SIGOPS Oper Syst Rev, 2008, 42: 26-35 Google Scholar

[4] Luo T W, Wang X L, Hu J Y, et al. Improving TLB performance by increasing hugepage ratio. In: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Washington, DC: IEEE, 2015. Google Scholar

[5] Ganapathy N, Schimmel C. General purpose operating system support for multiple page sizes. In: {Proceedings of USENIX Annual Technical Conference}. Berkeley: USENIX Association Berkeley, 1998. 8. Google Scholar

[6] Navarro J, Iyer S, Druschel P, et al. Practical, transparent operating system support for superpages. ACM SIGOPS Oper Syst Rev, 2002, 36: 89-104 Google Scholar

[7] Lu H J, Seth R, Doshi K, et al. Using hugetlbfs for mapping application text regions. In: {Proceedings of the Linux Symposium}, Ottawa, 2006. 2: 75--82. Google Scholar

[8] Romer T H, Ohlrich W H, Karlin A R, et al. Reducing tlb and memory overhead using online superpage promotion. In: {Proceedings of the 22nd Annual International Symposium on Computer Architecture}. New York: ACM, 1995. 176--187. Google Scholar

[9] Du Y, Zhou M, Childers B R, et al. Supporting superpages in non-contiguous physical memory. In: {Proceedings of IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)}, Burlingame, 2015. 223--234. Google Scholar

[10] Swanson M, Stoller L, Carter J. Increasing TLB reach using super backed by shadow memory. ACM SIGARCH Comput Architect News, 1998, 26: 204-213 CrossRef Google Scholar

[11] Talluri M, Hill M D. {Surpassing the TLB performance of super with less operating system support}. ACM SIGPLAN Notices, 1994, 29: 171-182 Google Scholar

[12] Bhattacharjee A. Large-reach memory management unit caches. In: {Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture}. New York: ACM, 2013. 383--394. Google Scholar

[13] Bhattacharjee A, Lustig D, Martonosi M. Shared last-level tlbs for chip multiprocessors. In: Proceedings of IEEE 17th International Symposium on High Performance Computer Architecture (HPCA). Washington, DC: IEEE, 2011. 62--63. Google Scholar

[14] Lustig D, Bhattacharjee A, Martonosi M. TLB improvements for chip multiprocessors: inter-core cooperative prefetchers and shared last-level TLBs. ACM Trans Architect Code Optim, 2013, 10: 2-182 Google Scholar

[15] Srikantaiah S, Kandemir M. Synergistic tlbs for high performance address translation in chip multiprocessors. In: {Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture}. Washington, DC: IEEE, 2010. 313--324. Google Scholar

[16] Barr T W, Cox A L, Rixner S. Translation caching: skip, don't walk (the page table). ACM SIGARCH Comput Architect News, 2010, 38: 48-59 CrossRef Google Scholar

[17] Barr T W, Cox A L, Rixner S. SpecTLB: a mechanism for speculative address translation. In: Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA). New York: ACM, 2011. 307--317. Google Scholar

[18] Papadopoulou M-M, Tong X, Seznec A, et al. Prediction-based superpage-friendly TLB designs. In: {Proceedings of IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)}, Burlingame, 2015. 210--222. Google Scholar

[19] Basu A, Gandhi J, Chang J C, et al. Efficient virtual memory for big memory servers. ACM SIGARCH Comput Architect News, 2013, 41: 237-248 CrossRef Google Scholar

[20] Karakostas V, Gandhi J, Ayar F, et al. Redundant memory mappings for fast access to large memories. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. New York: ACM, 2015. 66--78. Google Scholar

[21] Fang Z, Zhang L X, Carter J B, et al. Reevaluating online superpage promotion with hardware support. In: Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA). Washington, DC: IEEE, 2001. 63--72. Google Scholar

[22] Saulsbury A, Dahlgren F, Stenstr{ö}m P. {Recency-based TLB preloading}. ACM SIGARCH Comput Architect News, 2000, 28: 117-127 Google Scholar

[23] Kandiraju G B, Sivasubramaniam A. {Going the distance for TLB prefetching: an application-driven study}. ACM SIGARCH Comput Architect News, 2002, 30: 195-206 CrossRef Google Scholar

[24] Bhattacharjee A, Martonosi M. Characterizing the TLB behavior of emerging parallel workloads on chip multiprocessors. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT'09). Washington, DC: IEEE, 2009. 29--40. Google Scholar

[25] Bhattacharjee A, Martonosi M. Inter-core cooperative TLB for chip multiprocessors. ACM SIGARCH Comput Architect News, 2010, 38: 359-370 CrossRef Google Scholar

[26] Adams K, Agesen O. A comparison of software and hardware techniques for x86 virtualization. ACM SIGPLAN Notices, 2006, 41: 2-13 Google Scholar

[27] Bhatia N. Performance evaluation of Intel EPT hardware assist. VMware, Inc}, 2009. \url{http: ://-} Google Scholar

[28] Buell J, Hecht D, Heo J, et al. Methodology for performance analysis of VMware vSphere under Tier-1 applications. {VMware Technical J}, 2013. 19. Google Scholar

[29] Ahn J, Jin S, Huh J. Revisiting hardware-assisted page walks for virtualized systems. ACM SIGARCH Comput Architect News, 2012, 40: 476-487 CrossRef Google Scholar

[30] Gandhi J, Basu A, Hill M D, et al. Efficient memory virtualization. In: {Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)}. Washington, DC: IEEE, 2014. 178--189. Google Scholar

[31] Gadre A S, Kabra K, Vasani A, et al. X-xen: huge page support in xen. In: {Proceedings of the Linux Symposium}, Ottawa, 2011. 7. Google Scholar

[32] Pham B, Vesely J, Loh G H, et al. Using TLB Speculation to Overcome Page Splintering in Virtual Machines. Technical Report DCS-TR-7132015. Rutgers University, 2015. Google Scholar

[33] Wang X L, Zang J R, Wang Z L, et al. Selective hardware/software memory virtualization. ACM SIGPLAN Notices, 2011, 46: 217-226 CrossRef Google Scholar

[34] Wang X L, Weng L M, Wang Z L, et al. Revisiting memory management on virtualized environments. ACM Trans Architect Optim, 2013, 10: 48-226 Google Scholar

[35] Chang X T, Franke H, Ge Y, et al. Improving virtualization in the presence of software managed translation lookaside buffers. In: {Proceedings of the 40th Annual International Symposium on Computer Architecture}. New York: ACM, 2013. 120--129. Google Scholar

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1