Abstract
This paper presents a new cache consistency scheme for hierarchically structured shared-memory multiprocessors. The scheme is simple, fast and efficient, and it does not require a large amount of state information to be maintained. The scheme exploits the broadcast capability of these systems, but limits the extent of the broadcasts by means of a novel filtering mechanism. As a specific example, it is shown how the proposed cache consistency scheme can be implemented on the Hector multiprocessor architecture. Using trace-driven simulations, we demonstrate that the scheme is scalable and performs well for common applications.
Similar content being viewed by others
References
Barroso, L., and Dubois, M. 1991. Cache coherence on a slotted ring. InConference Proceedings-International Conference on Parallel Processing (Austin, Texas, Aug. 12–16), CRC Press Inc., pp. I-230–I-237.
Basket, F., Jermoluk, T., and Solomon, D. 1988. The 4D-MP graphics superworkstation: Computing + graphics = 40 MIPS + 40 MFLOPS and 100,000 lighted polygons per second. InConference Proceedings-The 33rd IEEE Computer Society International Conference — COMPCON (San Francisco, California, Feb. 24 – Mar. 4), IEEE Computer Society Press, pp. 468–471.
Chaiken, D., Fields, C., Kurihara, K., and Agarwal, A. 1990. Directory-based cache coherence in large-scale multiprocessors.Computer, 23, 6 (June): 49–58.
Chaiken, D., Kubiatowicz, J., and Agarwal, A. 1991. LimitLESS directories: A scalable cache coherence scheme. InConference Proceedings-The Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Santa Clara, California, April 8–11), The Association for Computing Machinery (ACM), pp. 224–234.
Dubois, M., Scheurich, C., and Briggs, F. A. 1986. Memory access buffering in multiprocessors. InConference Proceedings-The 13th Annual International Symposium on Computer Architecture (Tokyo, Japan, June 2–5), IEEE Computer Society Press, pp. 434–442.
Farkas, K. I. 1991. A decentralized hierarchical cache-consistency scheme for shared-memory multiprocessors. Master's thesis, University of Toronto, Technical Report no. EECG TR-91-04-01 (Electrical Engineering Computer Group).
Frank, S., Rothnie, J., and Burkhardt, H. 1993. The KSR1: Bridging the gap between shared memory and MPPs. InConference Proceedings-IEEE Compcon 1993 Digest of Papers, (San Francisco, California, Feb. 22–26), IEEE Computer Society Press, pp. 285–294.
Fu, J., Keller, J., and Haduch, K. 1987. Aspects of the VAX 8800 C box design.Digital Technical Journal, 4, 2 (Feb.): 41–51.
Gehringer, E., Siewiorek, D., and Segall, Z. 1987.Parallel Processing: The Cm* Experience. Digital Press.
Gharachorloo, K., Lenoski, D., Laudon, J., Gibbons, P., Gupta, A., and Hennessy, J. 1990. Memory consistency and event ordering in scalable shared-memory multiprocessors. InConference Proceedings-The 17th Annual International Symposium on Computer Architecture (Seattle, Washington, May 28–31), IEEE Computer Society Press, pp. 15–26.
Goodman, J. 1991. Cache consistency and sequential consistency. Technical Report no. 1006, Computer Sciences Department, University of Wisconsin-Madison.
Gustavson, D. 1992. The Scalable Coherent Interface and related standards projects.IEEE Micro, 12, 1 (Jan.): 10–22.
Konicek, J. 1991. The organization of the Cedar system. InConference Proceedings-The 1991 International Conference on Parallel Processing, (Austin, Texas, Aug. 12–16), CRC Press Inc., pp. 149–156.
KSR. 1992. KSR1 principles of operation. Technical report, Kendall Square Research.
Lamport, L. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs.IEEE Transactions on Computers, C-28, 9 (Sept.): 690–691.
Landin, A., Hagersten, E., and Haridi, S. 1991. Race-free interconnection networks and multiprocessor consistency. InConference Proceedings-The 18th Annual International Symposium on Computer Architecture (Toronto, Canada, May 27–30), IEEE Computer Society Press, pp. 106–115.
Lenoski, A. D., Laudon, J., Gharachorloo, K., Gupta, A., and Hennessy, J. 1990. Directory-based cache coherence protocol for the DASH multiprocessor. InConference Proceedings-The 17th Annual International Symposium on Computer Architecture (Seattle, Washington, May 28–31), IEEE Computer Society Press, pp. 148–158.
Lenoski, D., Laudon, J., Gharachorloo, K., Weber, W.-D., Gupta, A., Hennessy, J., Horowitz, M., and Lam, M. 1992. The Stanford Dash multiprocessor.Computer, 25, 3 (March): 63–79.
Scheurich, C. E. 1989. Access ordering and coherence in shared memory multiprocessors. Ph.D. thesis, University of Southern California, Technical Report no. CENG 89-19 (Computer Engineering).
Singh, J. P., Weber, W.-D., and Gupta, A. 1991. SPLASH: Stanford parallel applications for shared memory. Technical Report CSL-TR-91-469, Computer Systems Laboratory, Stanford University.
Stumm, M., Vranesic, Z., White, R., Farkas, K., and Unrau, R. 1993. Experiences with the Hector multiprocessor. InConference Proceedings-Seventh International Parallel Processing Symposium (Newport Beach, California, April 13–16), IEEE Computer Society Press, pp. 10–18.
Veenstra, J. E., and Fowler, R. J. 1994. MINT: A front end for efficient simulation of shared-memory multiprocessors. InWorkshop Proceedings-The Second International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (Los Alamitos, Jan.), IEEE Computer Society Press, pp. 201–207.
Vranesic, Z. G., Stumm, M., Lewis, D. M., and White, R. 1991. Hector: A hierarchically structured shared-memory multiprocessor.Computer, 24, 1 (Jan.): 72–79.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Farkas, K., Vranesic, Z. & Stumm, M. Scalable cache consistency for hierarchically structured multiprocessors. J Supercomput 8, 345–369 (1995). https://doi.org/10.1007/BF01901614
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01901614