Abstract
This paper describes a novel alternative to trace scheduling and other global scheduling techniques that attempt to boost instruction level parallelism by moving operations beyond basic block boundaries. We quantify the relative benefits of moving operations from one basic block to another with respect to critical path length, register pressure, and avoiding interlocks from long-latency operations. The benefits are encoded as flow capacities in a network, and a mincut algorithm is used to select the set of operations to move. Unlike other approaches, our method is applied before register allocation and scheduling. Our experiments on a superscalar processor show that significant speedup can be obtained for both integer and floating-point benchmarks using this method, even in the presence of an excellent software pipeliner.
Chapter PDF
Bibliography
Bernstein, D., and Rodeh, M., Global Instruction Scheduling for Superscalar Machines, Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pp. 241–255, 1991.
Ball, T., and Laras, J.R., Branch Predication For Free, Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pp. 300–313, 1993.
Charlesworth, A.E. An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164. IEEE Computer 14 (9), pp. 18–27, 1981.
Dehnert, J.C., and Towle, R.A., Compiling for the Cydra 5, The Journal of Supercomputing 7 (1/2), pp. 181–227, 1993.
Kemal Ebcioglu and Alexandra Nicolau, A Global Resource-Constrained Parallelization Technique. Proceedings of the 3-rd International Conference on Supercomputing, pp. 154–163, 1989.
Edmonds, J., and Karp, R.M., Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems, J. Assoc. Computer Machinery 19, pp. 248–264, 1972.
Ellis, J., Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, Massachusetts, 1986.
Ford, L.R., and Fulkerson, D.R., Flows in Networks. Princeton University Press, Princeton, New Jersey, 1962.
Fisher, J.A., Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers C-30 (7), pp. 478–490, 1981.
Hwu, W.W., Mahlke, Chen, Chang, Warter, Bringmann, Ouellete, Hank, Kiyohara, Haab, Holm, and Lavery, The Superblock: An Effective Technique for VLIW and Superscalar Compilation, The Journal of Supercomputing 1 (1/2), pp. 182–229, May 1993.
Peter Yan-Tek Hsu, Design the TFP Microprocessor, IEEE MICRO, April 1994, pp. 23–33.
Suneel Jain, Circular Scheduling: A New Technique to Perform Software Pipelining, Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pp. 219–228, 1991.
Lowney, P.G., Freudenberger, S.M., Karzes, T.J., Lichtenstein, W.D., Nix, R.P., O'Donnell, J.J., and Ruttenberg, J.C., The Multiflow Trace Scheduling Compiler, The Journal of Supercomputing 7 (1/2), pp. 51–142, May 1993.
Virginia Mary Lo, Heuristic Algorithms for Task Assignment in Distributed Systems, IEEE Trans. on Computers, vol. 37, no. 11, pp. 1384–1397, Nov 1988.
Toshio Nakatani and Kemal Ebcioglu, Making Compaction-Based Parallelization Affordable, IEEE Trans. on Parallel and Dist. Syst. 4(9), pp. 1014–1029, 1993.
Alexandra Nicolau, A Fine-Grain Parallelizing Compiler, Tech. Report No. 86-792, Cornell University, 1986.
Rau, B.R., and Fisher, J.A., Instruction-Level Parallel Processing: History, Overview, and Perspective, The Journal of Supercomputing 7 (1/2), pp. 9–50.
Rau, B.R., and Glaeser, C.D., Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High-performance Scientific Computing, Proceedings — MICRO-14, October 1981, pp. 183–198.
Ruttenberg, J., Gao, G., Stoutchinin, A., and Lichtenstein, W., Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler, to appear in Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation, May 1996.
Harold S. Stone, Multiprocessor Scheduling with the Aid of Network Flow Algorithms, IEEE Trans. on Software Engineering, vol. SE-3, no. 1, Jan 1977.
Touzeau, R.F., A Fortran Compiler for the FPS-164 Scientific Computer. In Conference Proceedings — SIGPLAN '84 Symposium on Compiler Construction (Montreal, Canada, June 20), pp. 48–57, 1984.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lo, R., Chan, S., Dehnert, J., Towle, R. (1996). Aggregate operation movement: A min-cut approach to global code motion. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds) Euro-Par'96 Parallel Processing. Euro-Par 1996. Lecture Notes in Computer Science, vol 1124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024780
Download citation
DOI: https://doi.org/10.1007/BFb0024780
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61627-6
Online ISBN: 978-3-540-70636-6
eBook Packages: Springer Book Archive