In Spark, a typical in-memory big data computing framework, an overwhelming majority of memory is used for caching data. Among those cached data, inactive data and suspension data account for a large portion during the execution. These data remain in memory until they are expelled or accessed again. During the period, DRAM needs to consume a lot of refresh energy to maintain these low profit data. Such a great energy waste can be terminated if we use NVM as alternation. Meanwhile, NVM is smaller cell-sized that it provides more in-memory room for caching data instead of disk access in DRAM setting. However, NVM can not completely take the place of DRAM due to its superiority in terms of access latency and endurance. So, hybrid DRAM/NVM memory architectures turns to be the optimal solution and have a promising prospect to solve the memory capacity and energy consumption dilemmas for in-memory big data computing systems. With this observation, in this paper, we propose a data caching framework for Spark in hybrid DRAM/NVM memory configuration. By identifying the data access behaviors with active factor and active stage distance, cache data with higher local I/O activity is prioritized cached in DRAM, while cache data with lower activity is placed into NVM. The data migration strategy dynamically moves the cold data from DRAM into NVM to save static energy consumption. The result shows that the proposed framework can effectively reduce energy consumption about 73.2% and improve latency performance by up to 20.9%.
Discussion(0)
No comments yet. Be the first to comment.