Abstract
Lightweight threads have become a common abstraction in the field of programming languages and operating systems. This paper examines the performance implications of locality information usage in thread scheduling algorithms for scalable shared-memory multiprocessors. The elements of a distributed scheduler using all available locality information as well as experimental measurements are presented.
Most shared-memory multiprocessors use multiple stages of caches to hide latency. Data structures and policies of a scheduling architecture have to reflect the various levels of the memory hierarchy in order to achieve high data locality. Per-processor data structures avoid lock contention and help to reduce memory traffic. While CPU utilization of processes still determines scheduling decisions of contemporary schedulers, we propose novel scheduling policies based on cache miss rates and information about synchronization. All data gathered at runtime are transformed into affinity values inside a metric space, so that threads migrate near to their (sub)optimal operation points defined by location and timing of execution. The distribution of data structures and the usage of locality information characterizes the proposed memory-conscious scheduling architecture. A prototype implementation shows that a locality-conscious scheduler outperforms centralized and distributed approaches ignoring locality information.
Preview
Unable to display preview. Download preview PDF.
References
T. Anderson; E. Lazowska; H. Levy: The Performance Implication of Thread Management Alternatives for Shared-Memory Multiprocessors, ACM Trans. on Computers, 38(12), Dec. 1989, pp. 1631–1644
T. E. Anderson; et al.: Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism. ACM Transactions on Computer Systems, 10(1), Feb. 1992, pp. 53–79
J. Barton, N. Bitar-SGI Corp., A Scalable Multi-Discipline, Multiple-Processor Scheduling Framework for IRIX, In Proc. of IPPS'95 Workshop of Job Scheduling Strategies for Parallel Processing, LNCS 949, Apr. 1995
F. Bellosa: Implementierung adaptiver Verfahren auf komplexen Geometrien mit leichtgewichtigen Prozessen, University Erlangen-Nürnberg, IMMD IV, TR-I4-94-07, Oct. 1994
F.Bellosa: Memory-Conscious Scheduling and Processor Allocation on NUMA Architectures, University Erlangen-Nürnberg, IMMD IV, TR-I4-95-06, May 1995
Convex: Exemplar SPP1000/1200 Architecture, Convex Press, May 1995
J. Driscoll; H. Gabow; R. Shrairman; R. Tarjan: Relaxed heaps: An alternative to Fibonacci heaps with applications to parallel computation, Communications of the ACM, 32:1343–1354, 1988
D. Keppel: Tools and Techniques for Building Fast Portable Threads Packages, University of Washington, TR UWCSE 93-05-06, May 1993
D. Knuth: The Art of Computer Programming, Vol. 3: Sorting and Searching, Addison-Wesley, Mass., 1973
C. Koppe: Sleeping Threads: A Kernel Mechanism for Support of Efficient User-Level Threads, In Proc. of International Conference of Parallel and Distributed Computing and Systems PDCS'95, Washington D.C., Oct. 95
T. J. LeBlanc; et al.: First-Class User-Level Threads, Operating Systems Review, 25(5), Oct. 1991, pp. 110–121
S. Leffler: The Design and Implementation of the 4.3BSD UNIX Operating System, Addison-Wesley, 1990
B. Lim; A. Agarwal: Waiting Algorithms for Synchronizations in Large-Scale Multiprocessors., ACM Transactions on Computer Systems, 11(1), Aug. 1993, pp. 253–297
Ulrich Rüde: On the multilevel adaptive iterative method, SIAM Journal on Scientific and Statistical Computing, Vol. 15, 1994
P. Sabalvarro, W. Weihl: Demand-based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors, In Proc. of IPPS'95 Workshop of Job Scheduling Strategies for Parallel Processing, LNCS 949, April 1995
S. Squillante; E. D. Lazowska: Using Processor Cache Affinity in Shared-Memory Multiprocessor, Scheduling. IEEE Transactions on Parallel and Distributed Systems, 4(2),Feb. 1993, pp. 131–143
M. Steckermeier: Using Locality Information in User-Level Scheduling, University Erlangen-Nürnberg, IMMD IV, TR-I4-95-14, Dec. 1995
D. Thiebaut, H. Stone: Footprints in the Cache, ACM Transactions on Computer Systems, 5(4), Nov. 1987, pp. 305–329
M. Tokoro: Computational Field Model: Toward a New Computation Model/Methodology for Open Distributed Environment, Sony Computer Science Laboratory Inc., SCSL-TR-90-006, June 1990
J. Torellas, A. Tucker, A. Gupta: Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors, Journal of Parallel and Distributed Computing, Vol. 24, 1995, pp. 139–151
A. Tucker: Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors, Phd Thesis, Department of Computer Science, Stanford University, CS-TN-94-4, Dec. 1993
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bellosa, F. (1996). Locality-information-based scheduling in shared-memory multiprocessors. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1996. Lecture Notes in Computer Science, vol 1162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0022298
Download citation
DOI: https://doi.org/10.1007/BFb0022298
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61864-5
Online ISBN: 978-3-540-70710-3
eBook Packages: Springer Book Archive