Abstract
Profile-based optimization can be used for instruction scheduling, loop scheduling, data preloading, function in-lining, and instruction cache performance enhancement. However, these techniques have not been embraced by software vendors because programs instrumented for profiling run significantly slower, an awkward compile-run-recompile sequence is required, and a test input suite must be collected and validated for each program. This paper introduces hardware-based profiling that uses traditional branch handling hardware to generate profile information in real time. Techniques are presented for both one-level and two-level branch hardware organizations. The approach produces high accuracy with small slowdown in execution (0.4%–4.6%). This allows a program to be profiled while it is used, eliminating the need for a test input suite. With contemporary processors driven increasingly by compiler support, hardware-based profiling is important for high-performance systems.
Similar content being viewed by others
References
J. A. Fisher, Trace scheduling: A technique for global microcode compaction, IEEE Trans. Comput. C-30(7):478–490 (July 1981).
P. P. Chang, S. A. Mahlke, and W. W. Hwu, Using profile information to assist classic code optimizations, Software-Practice and Experience 21:1301–1321 (December 1991).
W. Y. Chen, Data preload for superscalar and VLIW processors. Ph.D. Thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, 1993.
W. W. Hwu and P. P. Chang, Inline function expansion for compiling C programs, in Proc. ACM SIGPLAN ’89 Conf. on Programming Language Design and Implementation (Portland, Oregon), (June 1989).
W. W. Hwu and P. P. Chang, Achieving high instruction cache performance with an optimizing compiler, in Proc. 16th Ann. Inter. Simp. Computer Architecture, Jerusalem, Israel, pp. 242–251 (May 1989).
T. Ball and J. R. Larus, Branch prediction for free, in Proc. of the ACM SIGPLAN ’93 Conf. on Programming Language Design and Implementation (Albuquerque, New Mexico, pp. 300–313 (June 1993).
J. A. Fisher and S. M. Freudenberger, Predicting conditional branch directions from previous runs of a program, in Proc Fifth Int’l. Conf. on Architectural Support for Prog. Lang. and Operating Systems, Boston, Massachusetts, pp. 85–95 (October 1992).
D. Wall, Predicting program behavior using real or estimated profiles, in Proc. ACM SIGPLAN ’91 Conf. on Programming Language Design and Implementation, Toronto, Ontario, Canada), pp. 59–70 (June 1991).
R. E. Hank, S. A. Mahlke, J. C. Gyllenhaal, R. Bringmann, and W. W. Hwu, Superblock formation using static program analysis, in Proc. 26th Ann. Int’l. Symp. on Microarchitecture, Austin, Texas, pp. 247–255 (December 1993).
D. Alpert and D. Avnon, Architecture of the Pentium microprocessor, IEEE Micro, 13:11–21 (June 1993).
S. P. Song and M. Denman, The PowerPC 604 RISC microprocessor, Technical Report Somerset Design Center, Austin, Texas (April, 1994).
J. E. Smith, A study of branch predition strategies, in Proc. Eight Ann. Int’l Symp. Computer Architecture, pp. 135–148 (June 1981).
Calder and D. Grunwald, Fast & accurate instruction fetch and branch predictions in Proc. 21st Ann. Int’l. Symp. on Computer Architecture, pp. 2–11 (April 1994).
T. Yeh and Y. N. Patt, Two-level adaptive training branch prediction, in Proc. 21th Ann. Int’l. Symp. on Microarchitecture, Albuquerque, New Mexico, pp. 51–61 (November 1991).
T. Yeh and Y. N. Patt, A comparison of dynamic branch predictors that use two levels of branch history, in Proc. 20th Ann. Int’l. Symp. Computer Architecture, Ann Arbor, Michigan, pp. 257–266 (May 1993).
M. L. Golden, Issues in trace collection through program instrumentation, Master’s Thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, 1991.
M. Smith, Tracing with pixies, Technical Report CSL-TR-91-497, Center for Integrated Systems, Stanford University (November 1991).
J. Larus and T. Ball, Rewriting executable files to measure program behavior, Software Practice & Experience, 24:197–218 (February 1994).
T. Ball and J. R. Larus, Optimally profiling and tracing programs, Technical Report 1031, Computer Sciences Department, University of Wisconsin-Madison, 1991.
J. S. Cox, D. P. Howell, and T. M Conte, Commercializing profile-driven optimization, in Proc. 28th Hawaii Int’l. Conf. on System Sciences, Maui, Hawaii, 1:221–228 (January 1995).
B. A. Patel, The effects of branch handling on superscalar performance, Master’s thesis, Department of Electrical and Computer Engineering, University of South Carolina, Columbia, South Carolina, 1995.
T. A. Wagner, V. Maverick, S. L. Graham, and M. A. Harrison, Accurate static estimators for program optimization, in Proc. Sixth Int’l. Conf. on Architectural Support for Prog. Lang. and Operating Systems, Orlando, Florida, pp. 85–95 (June 1994).
W. W. Hwu and P. P. Chang, Trace selection for compiling large C application programs to microcode, in Proc. 21st, Ann. Workshop on Microprogramming and Microarchitectures, San Diego, California (November 1988).
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The superblock: An effective structure for VLIW and superscalar compilation, The Journal of Super computing, 7:229–248 (January 1993).
G. T. Henry, Practical Sampling. Newbury Park, California, Sage Publications, 1990.
P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, IMPACT: An architectural framework for multiple-instruction-issue processors, in Proc. 18th Ann. Int’l. Symp. Computer Architecture, Toronto, Canada, pp. 266–275 (May 1991).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Conte, T.M., Patel, B.A., Menezes, K.N. et al. Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization. Int J Parallel Prog 24, 187–206 (1996). https://doi.org/10.1007/BF03356747
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF03356747