Abstract
Recently, big data processing has been an increasingly important field of computer applications, which has attracted a lot of attention from academia and industry. However, it worsens the memory wall problem for processor design, which means a large performance gap between processor computation and memory access. The stacked memory structure has the potential benefits for future processor design such as low latency, large capacity, and high bandwidth. Since these benefits can effectively relieve the problem of memory wall, stacked memory structure has been a promising architecture technique. Such memory structure began to use non-volatile memory (NVM) to provide a faster and larger memory, but its memory access behaviours for big data application have not been fully studied. In order to understand its memory performance better, this paper analyses the NVM 3D stacked structure using simulation method. Since flash memory is the maturest NVM media, this paper uses flash memory as the NVM part in the stacked structure to study, which results in a processor architecture with tightly connected CPU, DRAM and flash layers. In our experiment, channel number, capacity, page size and latency of read and write are test variables. Through observing the evaluation results of eight programs from big data program set, we conclude that the bandwidth and capacity have a significant effect for big data applications, and as bandwidth and capacity increasing, the Read/Write latency of flash and page size show less affection. We also point out some problems about data consistency, channel selection, read and write strategy and data granularity selection. These analysis results are useful for further study and optimization on NVM 3D stacked structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Huang, S., Huang, J.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: IEEE 26th ICDEW, pp. 41–51 (2010)
Ferdman, M., Adileh, A.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: ASPLOS XVII, pp. 37–48 (2012)
Chhetri, M.B., Chichin, S., Vo, Q.B., et al.: Smart CloudBench - automated performance benchmarking of the cloud. In: IEEE Sixth International Conference on Cloud Computing (CLOUD), pp. 414–421 (2013)
Luo, C., Zhan, J., Jia, Z., Wang, L., et al.: CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6(4), 347–362 (2012)
DCBench: a Benchmark Suite for Data Center Workloads. http://prof.ict.ac.cn/DCBench/
Ferdman, M., Adileh, A., Kocberber, O., et al.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. ACM SIGARCH Comput. Archit. News 40(1), 37–48 (2012). ACM
Lotfi-Kamran, P., Grot, B., Ferdman, M., et al.: Scale-out processors. In: Proceedings of the 39th International Symposium on Computer Architecture (ISCA) (2012)
Tsai, Y.-F., Xie, Y., Vijaykrishnan, N., Irwin, M.J.: Three-dimensional cache design exploration using 3DCacti. In: ICCD (2005)
Puttaswamy, K., Loh, G.H.: Implementing caches in a 3D technology for high performance processors. In: ICCD (2005)
Ranganathan, P.: From microprocessors to nanostores: rethinking data centric systems. Computer 44, 39–48 (2011)
Chang, J., Ranganathan, P., Mudge, T., et al.: A limits study of benefits from nanostore-based future data-centric system architectures. In: Proceedings of the 9th Conference on Computing Frontiers, pp. 33–42. ACM (2012)
Guthmuller, E., Miro-Panades, I., Greiner, A.: Adaptive stackable 3D cache architecture for many-cores. In: 2012 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 39–44. IEEE (2012)
Guthmuller, E., MiroPanades, I., Greiner, A.: Architectural exploration of a fine-grained 3D cache for high performance in a manycore context. In: 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC), pp. 302–307. IEEE (2013)
Lai, S.K.: Flash memories: successes and challenges. IBM J. Res. Devel. 52(4/5), 529–535 (2008)
Rosenfeld, P., Cooper-Balis, E., Jacob, B.: Dramsim2: a cycle accurate memory system simulator. Comput. Archit. Lett. 10(1), 16–19 (2011)
Kim, Y., Tauras, B., Gupta, A., et al.: Flashsim: a simulator for nand flash-based solid-statedrives. In: First International Conference on Advances in System Simulation, SIMUL 2009, pp. 125–131. IEEE (2009)
Luk, C.K., Cohn, R., Muth, R., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM Sigplan Not. 40, 190–200 (2005)
Jevdjic, D., Volos, S., Falsafi, B.: Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache. In: Proceedings of the 40th ISCA ACM, pp. 404–415 (2013)
Pawlowski, J.T.: Hybrid memory cube (HMC). Hot Chips 23 (2011)
Sandhu, G.: DRAM scaling and bandwidth challenges. In: NSF Workshop on Emerging Technologies for Interconnects (2012)
Kim, G., Kim, J., Ahn, J.H., et al.: Memory-centric system interconnect design with hybrid memory cubes. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp. 145–156. IEEE Press (2013)
Pugsley, S.H., Jestes, J., et al.: NDC: Analyzing the Impact of 3D-Stacked Memory+Logic Devices on MapReduce Workloads (2013)
Kgil, T., Mudge, T.: FlashCache: a NAND flash memory file cache for low power webservers. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 103–112. ACM (2006)
Saxena, M., Swift, M.M., Zhang, Y.: Flashtier: a lightweight, consistent and durable storagecache. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp. 267–280. ACM (2012)
Shi, L., Li, J., Xue, C.J., et al.: ExLRU: a unified write buffer cache management for flash memory. In: Proceedings of the Ninth ACM International Conference on Embedded Software, pp. 339–348. ACM (2011)
Yang, J., Plasson, N., et al.: HEC: improving endurance of high performance flash-based cache devices. In: Proceedings of the 6th International Systems and Storage Conference (SYSTOR 2013) (2013)
Caulfield, A.M., Grupp, L.M., Swanson, S.: Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications. ACM Sigplan Not. 44(3), 217–228 (2009)
Fawibe, A., Sherman, J., Kavi, K., Ignatowski, M., Mayhew, D.: New memory organizations for 3D DRAM and PCMs. In: Herkersdorf, A., Römer, K., Brinkschulte, U. (eds.) ARCS 2012. LNCS, vol. 7179, pp. 200–211. Springer, Heidelberg (2012)
Kavi, K., Pianelli, S., Pisano, G., Regina, G., Ignatowski, M.: 3D DRAM and PCMs in processor memory hierarchy. In: Maehle, E., Römer, K., Karl, W., Tovar, E. (eds.) ARCS 2014. LNCS, vol. 8350, pp. 183–195. Springer, Heidelberg (2014)
Dong, X., Wu, X., Sun, G., et al.: Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In: 45th ACM/IEEE Design Automation Conference, DAC 2008, pp. 554–559. IEEE (2008)
Acknowledgements
This research was parially funded by NSF grants (No. 61433019, No. 61472435, and No. 61572508), HPNSFC grant (No. 12JJ4070), and DFMEC grant (20114307120010).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Qian, C., Huang, L., Xie, P., Xiao, N., Wang, Z. (2015). A Study on Non-volatile 3D Stacked Memory for Big Data Applications. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9528. Springer, Cham. https://doi.org/10.1007/978-3-319-27119-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-27119-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27118-7
Online ISBN: 978-3-319-27119-4
eBook Packages: Computer ScienceComputer Science (R0)