Abstract
Recent emergence of systems with multiple performance and capacity tiers of memory invites a fresh consideration of strategies for optimal placement of data into the various tiers. This work explores a variety of cross-layer strategies for managing application data in multi-tiered memory. We propose new profiling techniques based on the automatic classification of program allocation sites, with the goal of using those classifications to guide memory tier assignments. We evaluate our approach with different profiling inputs and application strategies, and show that it outperforms other state-of-the-art management techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This allocation site-based strategy for optimizing accesses-per-byte is designed to obviate tracing or sampling on an object-by-object basis.
- 2.
The primary goal of this work is to study the potential benefits of automated application guidance. While our simulation-based evaluation neglects overhead of profiling, Sect. 4 covers how in practice, allocation site based guidance can be generated (either online or offline) and applied in direct execution with negligible overhead.
- 3.
Phase transitions may be detected online by several means, including through models of instruction and data access behaviors, hardware event ratios, etc.
- 4.
For direct execution, an alternative to Pin based instrumentation is to use LLVM inserted wrappers (as described in Sect. 4.2.1), and to sample access rates through hardware-based counters (e.g., using the PEBS facility on modern Intel processors).
- 5.
Other, more compact encodings of the allocation sites may also be employed – e.g., a low-overhead approximate method in direct execution is to use a hash over (call-return) last branch records (LBR) recorded by a processor’s monitoring unit.
- 6.
We had to omit zeusmp due to an incompatibility with our adopted basic block vector collection tool [23].
References
Intel: 3D XPoint (2016). http://www.intel.com/content/www/us/en/architecture-and-technology/3d-xpoint-unveiled-video.html
Mittal, S., Vetter, J.S.: A survey of techniques for architecting DRAM caches. IEEE Trans. Parallel Distrib. Syst. 27(6), 1852–1863 (2016)
Meswani, M., Blagodurov, S., Roberts, D., Slice, J., Ignatowski, M., Loh, G.: Heterogeneous memory architectures: a HW/SW approach for mixing die-stacked and off-package memories. In: HPCA, 2015 (February 2015)
Li, Y., Ghose, S., Choi, J., Sun, J., Wang, H., Mutlu, O.: Utility-based hybrid memory management. In: IEEE CLUSTER (September 2017)
Cantalupo, C., Venkatesan, V., Hammond, J.R.: User extensible heap manager for heterogeneous memory platforms and mixed memory policies (2015). http://memkind.github.io/memkind/memkind_arch_20150318.pdf
Dulloor, S.R., et al.: Data tiering in heterogeneous memory systems. In: Eleventh European Conference on Computer Systems, p. 15. ACM (2016)
Agarwal, N., et al.: Page placement strategies for GPUs within heterogeneous memory systems. SIGPLAN Not. 50(4), 607–618 (2015)
Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 40(6), 190–200 (2005)
Evans, J.: A scalable concurrent malloc (3) implementation for FreeBSD (2006)
Kim, Y., Yang, W., Mutlu, O.: Ramulator: a fast and extensible DRAM simulator. IEEE Comput. Archit. Lett. 15(1), 45–49 (2016). https://doi.org/10.1109/LCA.2015.2414456
Giardino, M., Doshi, K., Ferri, B.H.: Soft2LM: application guided heterogeneous memory management. In: IEEE International Conference on Networking, Architecture and Storage (NAS), USA, pp. 1–10 (2016)
Agarwal, N., Wenisch, T.F.: Thermostat: application-transparent page management for two-tiered main memory. In: ASPLOS. ASPLOS 2017, pp. 631–644. ACM, New York (2017)
Peng, I.B., Gioiosa, R., Kestor, G., Cicotti, P., Laure, E., Markidis, S.: RTHMS: a tool for data placement on hybrid memory system. In: ISMM (2017)
Servat, H., Pea, A.J., Llort, G., Mercadal, E., Hoppe, H., Labarta, J.: Automating the application data placement in hybrid memory systems. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER) (September 2017)
Dashti, M., Fedorova, A., Funston, J., Gaud, F., Lachaize, R., Lepers, B., Quema, V., Roth, M.: Traffic management: a holistic approach to memory placement on NUMA systems. SIGPLAN Not. 48(4), 381–394 (2013)
Jantz, M.R., et al.: A framework for application guidance in virtual memory systems. In: Virtual Execution Environments. VEE 2013, pp. 155–166 (2013)
Jantz, M.R., et al.: Cross-layer memory management for managed language applications. In: ACM/SIGPLAN OOPSLA. ACM, New York (2015)
Guo, R., Liao, X., Jin, H., Yue, J., Tan, G.: NightWatch: integrating lightweight and transparent cache pollution control into dynamic memory allocation systems. In: 2015 USENIX Annual Technical Conference (USENIX ATC 15), pp. 307–318 (2015)
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Code Generation and Optimization (2004)
Sodani, A.: Knights Landing (KNL): 2nd generation Intel® Xeon Phi processor. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–24. IEEE (2015)
Hamerly, G., Perelman, E., Lau, J., Calder, B.: Simpoint 3.0. J. Instr. Level Parallelism 7(4), 1–28 (2005)
Henning, J.L.: SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News 34(4), 1–17 (2006)
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: PLDI (2007)
Acknowledgements
This research is supported in part by the National Science Foundation under CCF-1619140, CCF-1617954, and CNS-1464288, as well as a grant from the Software and Services Group (SSG) at Intel®.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Effler, T.C., Howard, A.P., Zhou, T., Jantz, M.R., Doshi, K.A., Kulkarni, P.A. (2018). On Automated Feedback-Driven Data Placement in Multi-tiered Memory. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2018. ARCS 2018. Lecture Notes in Computer Science(), vol 10793. Springer, Cham. https://doi.org/10.1007/978-3-319-77610-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-77610-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77609-5
Online ISBN: 978-3-319-77610-1
eBook Packages: Computer ScienceComputer Science (R0)