logo

SCIENTIA SINICA Informationis, Volume 49, Issue 9: 1138-1158(2019) https://doi.org/10.1360/N112018-00246

Migration mechanism of heterogeneous memory pages using a two-way Hash chain list

More info
  • ReceivedSep 8, 2018
  • AcceptedNov 17, 2018
  • PublishedSep 3, 2019

Abstract

With the rapid development of big data technologies, the requirements of tremendously accessing memory are increasing, therebyresulting in a serious problem of high consumption power by accessing traditional DRAM memory. However, large volume and low power dissipation of NVM are becoming increasingly mature, showing potential as an alternative main memory in a heterogeneous memory system. In terms of the historical traces of accessing memory pages, we propose a two-way Hash chain list-based migration mechanism (THMigrator) for a heterogeneous memory system. The THMigrator can migrate memory pages with high frequency from PCM or STT-RAM to DRAM. Moreover, we evaluated the energy efficiency of a heterogeneous memory system by the proposed energy efficiency analysis model. The experimental results show that the performance of computation and the average ratio of energy efficiency supported by the THMigrator are improved by 9.3% and 17%, respectively, compared with that of an MQMigrator. Moreover, the average ratio of energy efficiency supported with the THMigrator increased by 26% compared with that of a CoinMigrator.


Funded by

上海市浦江人才(16PJ1407600)

中国博士后科学基金(2017M610230)

国家自然科学基金重点项目(61332009)

国家自然科学基金面上项目(61775139)

上海市自然科学基金(15ZR1428600)

计算机体系结构国家重点实验室开放课题(CARCH201807)


References

[1] Zhang D Z. Research and implementation of a simulation system for PCM/DRAM-based hybrid memory. Dissertationfor Master Degree. Hefei: University of Science and Technology of China, 2017 [张德志. 基于PCM和DRAM的混合主存仿真系统研究与实现. 硕士学位论文. 合肥: 中国科学技术大学, 2017]. Google Scholar

[2] Lefurgy C, Rajamani K, Rawson F. Energy management for commercial servers. Computer, 2003, 36: 39-48 CrossRef Google Scholar

[3] Wu Y, Fu Y J, Chen W W, et al. Efficient mechanism of hybrid memory placement and erasure code. Comput Sci, 2017, 44: 57--62. Google Scholar

[4] Mao W, Liu J N, Tong W, et al. A review of storage technology research based on phase change memory. Chinese J Comput, 2015, 38: 944--960. Google Scholar

[5] Mittal S, Vetter J S, Li D. A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches. IEEE Trans Parallel Distrib Syst, 2015, 26: 1524-1537 CrossRef Google Scholar

[6] Li Y, Chen Y R, Jones A K. A software approach for combating asymmetries of non-volatile memories. In: Proceedings of ACM/IEEE International Symposium on Low Power Electronics and Design, 2012. 191--196. Google Scholar

[7] Shu J W, Lu Y Y, Zhang J C, et al. Research progress on non-volatile memory based storage system. Sci Technol Rev, 2016, 34: 86--94. Google Scholar

[8] Jin P Q. Big data storage management based on new storage. Big Data Res, 2017, 3: 70--82. Google Scholar

[9] Liu T. Parallel program scheduling for hybrid memory computing. Dissertation for Master Degree. Wuhan: Huazhong University of Science and Technology, 2015 [刘涛. 异构内存环境下并行程序调度优化系统. 硕士学位论文. 武汉: 华中科技大学, 2015]. Google Scholar

[10] Zhang J B. An energy management for hybrid memory based on write frequency of pages. Dissertation for Master Degree. Wuhan: Huazhong University of Science and Technology, 2015. Google Scholar

[11] Khouzani H A, Yang C M, Hu J T. Improving performance and lifetime of DRAM-PCM hybrid main memory through a proactive page allocation strategy. In: Proceedings of the 20th Asia and South Pacific Conference and Design Automation Conference (ASP-DAC), 2015. 508--513. Google Scholar

[12] Dhiman G, Ayoub R, Rosing T. PDRAM: a hybrid PRAM and DRAM main memory system. In: Proceedings of the 46th Annual Design Automation Conference, 2009. 664--469. Google Scholar

[13] Yoon H B, Meza J, Ausavarungnirun R, et al. Row buffer locality aware caching policies for hybrid memories. In: Proceedings of the 30th International Conference on Computer Design (ICCD), 2012. 337--344. Google Scholar

[14] Liu H K, Chen Y J, Liao X F, et al. Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures. In: Proceedings of the International Conference on Supercomputing, 2017. Google Scholar

[15] Ramos L E, Gorbatov E, Bianchini R. Page placement in hybrid memory systems. In: Proceedings of the International Conference on Supercomputing, 2011. 85--95. Google Scholar

[16] Park K H, Park S K, Hwang W, et al. Resource management of manycores with a hierarchical and a hybrid main memory for MN-mate cloud node. In: Proceedings of the 8th World Congress on Services (SERVICES), 2012. 301--308. Google Scholar

[17] Seok H, Park Y, Park K H. Migration based page caching algorithm for a hybrid main memory of DRAM and PRAM. In: Proceedings of ACM Symposium on Applied Computing, 2011. 595--599. Google Scholar

[18] Pagh R, Rodler F F. Cuckoo hashing. J Algorithms, 2004, 51: 122-144 CrossRef Google Scholar

[19] Mai H T, Park K H, Lee H S. Dynamic Data Migration in Hybrid Main Memories for In-Memory Big Data Storage. ETRI J, 2014, 36: 988-998 CrossRef Google Scholar

[20] Kim S, Hwang S H, Kwak J W. Adaptive-Classification CLOCK: Page replacement policy based on read/write access pattern for hybrid DRAM and PCM main memory. Microprocessors MicroSyst, 2018, 57: 65-75 CrossRef Google Scholar

[21] Zhang Z, Fu Y J, Hu G Y. DualStack: a high efficient dynamic page scheduling scheme in hybrid main memory. In: Proceedings of International Conference on Networking, Architecture, and Storage (NAS), 2017. Google Scholar

[22] Wu D H, He B S, Tang X Y, et al. RAMZzz: rank-aware DRAM power management with dynamic migrations and demotions. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012. Google Scholar

[23] Pei S W, Zhang J, Xiong N, et al. Performance-energy efficiency model of heterogeneous parallel multicore system. In: Proceedings of the 6th International Conference on Green Computing Conference and Sustainable Computing Conference (IGSC), 2015. Google Scholar

[24] Pei S W, Zhang J G, Jiang L H, et al. Evaluating the overhead of data preparation for heterogeneous multicore system. KSII Trans Int Inform Syst, 2016, 10: 3231--3244. Google Scholar

[25] Liu D, Zhang J B, Liao X F, et al. Simulator for hybrid memory architecture. J East China Norm Univ (Nat Sci), 2014, 5: 133--140. Google Scholar

[26] Zhou Y Y, Philbin J, Li K. The multi-queue replacement algorithm for second level buffer caches. In: Proceedings of the General Track: 2001 USENIX Annual Technical Conference, 2001. 91--104. Google Scholar

[27] Lee B C, Ipek E, Mutlu O, et al. Architecting phase change memory as a scalable dram alternative. ACM SIGARCH Comput Architect News, 2009, 37: 2--13. Google Scholar

[28] Zuo P, Hua Y. A Write-Friendly and Cache-Optimized Hashing Scheme for Non-Volatile Memory Systems. IEEE Trans Parallel Distrib Syst, 2018, 29: 985-998 CrossRef Google Scholar

[29] Hassan A, Vandierendonck H, Nikolopoulos D S. Software-managed energy-efficient hybrid DRAM/NVM main memory. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015. Google Scholar

[30] Poremba M, Xie Y. NVMain: an architectural-level main memory simulator for emerging non-volatile memories. In: Proceedings of IEEE Computer Society Annual Symposium on VLSI, 2012. 392--397. Google Scholar

[31] Henning J L. SPEC CPU2006 benchmark descriptions. SIGARCH Comput Archit News, 2006, 34: 1-17 CrossRef Google Scholar

[32] Chen S, Gibbons P, Nath S, et al. Rethinking database algorithms for phase change memory. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR'11), 2011. 21--31. Google Scholar

[33] Zhao J S, Xie Y. Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration. In: Proceedings of International Conference on Computer-Aided Design (ICCAD), 2012. 81--87. Google Scholar

[34] Gao L, Wang R, Xu Y. SRAM- and STT-RAM-based hybrid, shared last-level cache for on-chip CPU-GPU heterogeneous architectures. J Supercomput, 2018, 74: 3388-3414 CrossRef Google Scholar

  • Figure 1

    Structure of heterogeneous memory system. (a) Flat memory architecture; (b) hierarchical memory architecture

  • Figure 2

    (Color online) Structure of two-way Hash chain list

  • Figure 3

    Structure of HashList and entry table

  • Figure 4

    Energy efficiency analysis model based on two-way Hash chain list page migration mechanism

  • Figure 5

    (Color online) Normalized IPC for different size of pages (DRAM+PCM memory model). (a) THMigrator;protect łinebreak (b) MQMigrator; (c) CoinMigrator

  • Figure 6

    (Color online) Normalized IPC for different size of pages (DRAM+STT-RAM memory model). (a) THMigrator; (b) MQMigrator; (c) CoinMigrator

  • Figure 7

    (Color online) Normalized instruction rate for different size of pages (DRAM+PCM memory model).protect łinebreak (a) THMigrator; (b) MQMigrator; (c) CoinMigrator

  • Figure 8

    (Color online) Normalized instruction rate for different sizes of pages (DRAM+STT-RAM memory model).protect łinebreak (a) THMigrator; (b) MQMigrator;(c) CoinMigrator

  • Figure 9

    (Color online) Normalized IPC. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+STT-RAM heterogeneous memory system

  • Figure 10

    (Color online) Average instruction rate. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+STT-RAM heterogeneous memory system

  • Figure 11

    (Color online) DRAM cache utilization. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+STT-RAM heterogeneous memory system

  • Figure 12

    (Color online) Normalized energy efficiency. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+protect łinebreak STT-RAM heterogeneous memory system

  • Figure 13

    (Color online) Average latency of accessing memory. (a) DRAM+PCM heterogeneous memory system;protect łinebreak (b) DRAM+STT-RAM heterogeneous memory system

  • Figure 14

    (Color online) The hit-rates of write access. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+protect łinebreak STT-RAM heterogeneous memory system

  • Figure 15

    (Color online) The normalized execute time. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+protect łinebreak STT-RAM heterogeneous memory system

  • Figure 16

    (Color online) The normalized time of migration with THMigrator mechanism

  • Figure 17

    (Color online) The normalized time of migration with MQMigrator mechanism

  • Figure 18

    (Color online) The normalized time of migration with CoinMigrator mechanism

  • Table 1   DRAM, PCM and STT-RAM performance comparison
    Storage/memory Reciprocal density Read speed Write speed Read power/mW Write power/mW Endurance
    DRAM $4-6{F}^{2}$ Slow Slow Medium Medium ${10}^{16}$
    PCM $4-12{F}^{2}$ Slow VerySlow Medium High ${10}^{8}-{10}^{9}$
    STT-RAM $6-50{F}^{2}$ Fast Slow Low High $4~\times~{10}^{12}$
  •   

    Algorithm 1 Migration algorithm of two-way Hash chain list

    Input:${\rm~page}_i$ is a memory page, and channelNumber for different memory type (DRAM or NVM);

    1: /*Initialize variables*/

    2: Initialize HashList, MigratorMap, lifeTime, Threshold, etc.;

    Primary iteration:Request access memory page;

    3:Request(${\rm page}_i$); /*Request access to the $i$-th memory page*/

    4: if isLocationHashMap(${\rm~page}_i$) then

    5: $ {\rm page}_i \rightarrow {\rm value} \Leftarrow {\rm page}_i \rightarrow {\rm value}+1 $; /*the value of ${\rm page}_i$ add 1*/

    6: moveToHead(${\rm page}_i$); /*Move ${\rm page}_i$ to the head of HashList*/

    7:else

    8: setHead(${\rm page}_i$); /*Set ${\rm page}_i$ to the head of HashList*/

    9:end if

    10:if ${\rm~page}_i~\rightarrow~{\rm~value}~>~{\rm~Threshold}$ then

    11: MigratorMap.insert(${\rm page}_i$); /*Insert ${\rm page}_i$ into MigratorMap*/

    12: HashList.remove(${\rm page}_i$); /*Remove ${\rm page}_i$ from HashList*/

    13: end if

    14:if ${\rm~LifeTime}({\rm~page}_i)~>~{\rm~currentTime}$ then

    15: MigratorMap.remove(${\rm page}_i$); /*Remove memory pages that exceed the life time*/

    16:end if

    17: InMigratorMap(${\rm page}_i$)&&!Migratored(${\rm page}_i$)

    18: startMigrator(${\rm page}_i$); /*Migrate ${\rm page}_i$ from NVM to DRAM*/

    19: MigratorMap.remove(${\rm page}_i$); /*Remove memory pages that exceed the life time*/

    20: end if

    Output:True/False. /*Output memory page migration is successful or failure*/

  • Table 2   Configuration of the EEAM
    Memory Read latency Write latency Read energy Write energy Read speed Write speed
    DRAM 3 $\mu$s (4 KB) 3 $\mu$s (4 KB) 0.8 J/GB 1 J/GB 1.09 GB/s 1 GB/s
    PCM 3 $\mu$s (4 KB) 64 $\mu$s (4 KB) 1.2 J/GB 6 J/GB 400 MB/s 100 MB/s
  • Table 3   Environment configuration of the simulator
    Configuration parameter Value
    CPU 2.0 GHZ + TimingSimpleCPU
    Mempry NVMainMemory
    L1 cache 32 KB instruction cache + 32 KB data cache
    L2 cache 256 KB instruction cache + 256 KB data cache
    Memory structure DRAM + PCM (1:3)
    Bus 64 bit
    Operation mode SE mode
  • Table 4   The configuration of benchmarks
    Benchmarks The number of instruction
    bzip2 10000000
    gcc 10000000
    leslie3d 10000
    mcf 10000
    calculix 10000000
    cactusADM 10000
    sjeng 10000000
    hmmer 10000000
    milc 10000
    povray 10000000
    soplex 10000000

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1