Abstract
This paper introduces spinning-on-coherency (SOC) a technique for virtual shared memory (VSM) which enables latency-hiding of remote reads and the removal of related synchronisation points. Coherence-bits are hardware-tags associated with addresses which record local access permissions (such as read, write, invalid). In SOC a user-thread spins on the particular coherence-bits associated with an address until the new data value is asynchronously propagated and the address becomes valid. Data-propagation occurs when another node issues an update after having written the new value. Performance improvements are demonstrated for two codes, representing the core communication found in Shallow (a well known numerical weather prediction benchmark), and CG (from the NAS Parallel Benchmarks). These are run on a 30 node prototype distributed memory architecture (EDS), with invalidation based sequentially consistent VSM. SOC is also applicable to other consistency models and directory schemes, whether in hardware or software and complements other VSM optimisations. Currently such optimisation is performed by the programmer, but there is much scope for automating this process within a compiler.
This work was funded by the U.K. Meteorological Office and the ESPRIT SODA project.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D. Bailey, J. Barton, T. Lasinski, and H. Simon. The nas parallel benchmarks. NASA Technical Memorandum 103863, 1993.
B.N.Bershad and M.J.Zekauskas. Midway: Shared memory parallel programming with entry consistency for distributed memory multiprocessors. Technical Report CMU-CS-91-170, School of Computer Science, Carnegie Mellon University, Pitsburgh, PA 15213, 1991.
F. Bodin and M.F.P. OBoyle. A compiler strategy for svm. In 3rd Workshop on Languages, Compilers and Runtime Systems for Scalable Computing. Kluwer Press, May 1995.
A.L. Cox and R.J. Fowler. Adaptive cache coherence for detecting migratory shared data. In Proc. of the 20th International Symposium on Computer Architecture, pp 98–108, 1993.
B. Falsafi et al. Application-specific protocols for user-level shared memory. In Supercomputing 94. IEEE Press, 1994.
R.W. Ford, A.P. Nisbet, and J.M. Bull. User level vsm optimisation and its application. In Lecture Notes in Computer Science. 1041, pp 223–232, Springer-Verlag, 1996.
Burkhardt III H. Frank S. and Rothnie J. The KSR1: Bridging the gap between shared memory and mpps. In Proceedings of Compcon 93, pages 285–294, San Francisco, 1993.
D.B. Glasco, A. Delagi, and M.J. Flynn. The impact of cache coherence protocols on systems using fine-grain data synchronisation. In IFIP Transactions, Parallel Architectures and Compilation Techniques, PACT94. North Holland, 1994.
K.Gharachorloo, D.Lenoski, J.Jaudon, P.Gibons, A.Gupta, and J.Hennessy. Memory consistency and event ordering in scaleable shared memory multiprocessors. In Proceedings of the 17th International Symposium on Computer Architecture, pages 15–26, 1990.
A.R. Lebeck and D.A. Wood. Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors. In ISCA95, pages 48–59, 1995.
D. Lenoski, J. Landon, K. Gharachorloo, A. Gupta, and J.Henessy. The directory-based cache coherence protocol for the dash multiprocessor. In IEEE 17th Annual International Symposium on Computer Architecture. IEEE Press, 1990.
K. Li and P. Hudak. Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems, 7(4):321–359, 1989.
R. Mirchandaney, S. Hirandani, and A. Sethi. Improving the performance of dsm systems via compiler involvement. In Proceedings of Supercomputing, 1994.
D. Mosberger. Memory consistency models. ACM SIGOPs Review, 27(1), 1993.
F. Mounes-Toussi and D.J. Lilja. The potential of compile-time analysis to adapt the cache coherence enforcement strategy to the data sharing characteristics. IEEE Transactions on Parallel and Distributed Systems, 6(5), May 1995.
S.S. Mukherjeee, S.D.Sharma, M.D. Hill, J.R.Larus, A.Rodgers, and J.Saltz. Efficient support for irregular applications on distributed-memory machines. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1995.
M.F.P. O'Boyle, R.W. ford, and A.P. Nisbet. Compiler reduction of invalidation traffic in shared virtual memory systems, in preparation, 1995.
S.K. Reinhardt, J.R. Larus, and D.A. Wood. Tempest and typhoon: User-level shared memory. In Proc. of the 21st Annual International Symposium on Computer Architecture, 1994.
J.H. Saltz, R.Mirchandaney, and K.Crowley. Run-time parallelisation and scheduling of loops. IEEE Transactions on Computers, 40(5), May 1991.
C.J. Skelton et al. Eds a parallel computer system for advanced inoformation processing. In Parallel Architectures and Languages Europe, PARLE92, pages 3–18, 1992.
P. Stenstrom, M. Brosson, and L.Sandberg. Adaptive cache coherence protocol optimized for migratory sharing. In Proc. 20th Intl. Symp. on Computer Architecture, pp 109–118, 1993.
P.N. Swartzrauber. The shallow benchmark weather prediction program. Technical report, National Center for Atmospheric Research, Boulder, Colorado, 1984.
T.Mowry and A.Gupta. Tolerating latency through software-controlled prefetching in sharedmemory multiprocessors. Journal of Parallel and Distributed Computing, 12(2), June 1991.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nisbet, A.P., Ford, R.W. (1996). Spinning-on-coherency: A new VSM optimisation for write-invalidate. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds) High-Performance Computing and Networking. HPCN-Europe 1996. Lecture Notes in Computer Science, vol 1067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61142-8_628
Download citation
DOI: https://doi.org/10.1007/3-540-61142-8_628
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61142-4
Online ISBN: 978-3-540-49955-8
eBook Packages: Springer Book Archive