Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization | International Journal of Parallel Programming Skip to main content
Log in

Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Profile-based optimization can be used for instruction scheduling, loop scheduling, data preloading, function in-lining, and instruction cache performance enhancement. However, these techniques have not been embraced by software vendors because programs instrumented for profiling run significantly slower, an awkward compile-run-recompile sequence is required, and a test input suite must be collected and validated for each program. This paper introduces hardware-based profiling that uses traditional branch handling hardware to generate profile information in real time. Techniques are presented for both one-level and two-level branch hardware organizations. The approach produces high accuracy with small slowdown in execution (0.4%–4.6%). This allows a program to be profiled while it is used, eliminating the need for a test input suite. With contemporary processors driven increasingly by compiler support, hardware-based profiling is important for high-performance systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. A. Fisher, Trace scheduling: A technique for global microcode compaction, IEEE Trans. Comput. C-30(7):478–490 (July 1981).

    Article  Google Scholar 

  2. P. P. Chang, S. A. Mahlke, and W. W. Hwu, Using profile information to assist classic code optimizations, Software-Practice and Experience 21:1301–1321 (December 1991).

    Article  Google Scholar 

  3. W. Y. Chen, Data preload for superscalar and VLIW processors. Ph.D. Thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, 1993.

    Google Scholar 

  4. W. W. Hwu and P. P. Chang, Inline function expansion for compiling C programs, in Proc. ACM SIGPLAN ’89 Conf. on Programming Language Design and Implementation (Portland, Oregon), (June 1989).

    Google Scholar 

  5. W. W. Hwu and P. P. Chang, Achieving high instruction cache performance with an optimizing compiler, in Proc. 16th Ann. Inter. Simp. Computer Architecture, Jerusalem, Israel, pp. 242–251 (May 1989).

    Google Scholar 

  6. T. Ball and J. R. Larus, Branch prediction for free, in Proc. of the ACM SIGPLAN ’93 Conf. on Programming Language Design and Implementation (Albuquerque, New Mexico, pp. 300–313 (June 1993).

    Google Scholar 

  7. J. A. Fisher and S. M. Freudenberger, Predicting conditional branch directions from previous runs of a program, in Proc Fifth Int’l. Conf. on Architectural Support for Prog. Lang. and Operating Systems, Boston, Massachusetts, pp. 85–95 (October 1992).

    Chapter  Google Scholar 

  8. D. Wall, Predicting program behavior using real or estimated profiles, in Proc. ACM SIGPLAN ’91 Conf. on Programming Language Design and Implementation, Toronto, Ontario, Canada), pp. 59–70 (June 1991).

    Google Scholar 

  9. R. E. Hank, S. A. Mahlke, J. C. Gyllenhaal, R. Bringmann, and W. W. Hwu, Superblock formation using static program analysis, in Proc. 26th Ann. Int’l. Symp. on Microarchitecture, Austin, Texas, pp. 247–255 (December 1993).

    Chapter  Google Scholar 

  10. D. Alpert and D. Avnon, Architecture of the Pentium microprocessor, IEEE Micro, 13:11–21 (June 1993).

    Article  Google Scholar 

  11. S. P. Song and M. Denman, The PowerPC 604 RISC microprocessor, Technical Report Somerset Design Center, Austin, Texas (April, 1994).

    Google Scholar 

  12. J. E. Smith, A study of branch predition strategies, in Proc. Eight Ann. Int’l Symp. Computer Architecture, pp. 135–148 (June 1981).

  13. Calder and D. Grunwald, Fast & accurate instruction fetch and branch predictions in Proc. 21st Ann. Int’l. Symp. on Computer Architecture, pp. 2–11 (April 1994).

  14. T. Yeh and Y. N. Patt, Two-level adaptive training branch prediction, in Proc. 21th Ann. Int’l. Symp. on Microarchitecture, Albuquerque, New Mexico, pp. 51–61 (November 1991).

    Google Scholar 

  15. T. Yeh and Y. N. Patt, A comparison of dynamic branch predictors that use two levels of branch history, in Proc. 20th Ann. Int’l. Symp. Computer Architecture, Ann Arbor, Michigan, pp. 257–266 (May 1993).

    Chapter  Google Scholar 

  16. M. L. Golden, Issues in trace collection through program instrumentation, Master’s Thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, 1991.

    Google Scholar 

  17. M. Smith, Tracing with pixies, Technical Report CSL-TR-91-497, Center for Integrated Systems, Stanford University (November 1991).

    Google Scholar 

  18. J. Larus and T. Ball, Rewriting executable files to measure program behavior, Software Practice & Experience, 24:197–218 (February 1994).

    Article  Google Scholar 

  19. T. Ball and J. R. Larus, Optimally profiling and tracing programs, Technical Report 1031, Computer Sciences Department, University of Wisconsin-Madison, 1991.

    Google Scholar 

  20. J. S. Cox, D. P. Howell, and T. M Conte, Commercializing profile-driven optimization, in Proc. 28th Hawaii Int’l. Conf. on System Sciences, Maui, Hawaii, 1:221–228 (January 1995).

    Google Scholar 

  21. B. A. Patel, The effects of branch handling on superscalar performance, Master’s thesis, Department of Electrical and Computer Engineering, University of South Carolina, Columbia, South Carolina, 1995.

    Google Scholar 

  22. T. A. Wagner, V. Maverick, S. L. Graham, and M. A. Harrison, Accurate static estimators for program optimization, in Proc. Sixth Int’l. Conf. on Architectural Support for Prog. Lang. and Operating Systems, Orlando, Florida, pp. 85–95 (June 1994).

    Google Scholar 

  23. W. W. Hwu and P. P. Chang, Trace selection for compiling large C application programs to microcode, in Proc. 21st, Ann. Workshop on Microprogramming and Microarchitectures, San Diego, California (November 1988).

    Google Scholar 

  24. W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The superblock: An effective structure for VLIW and superscalar compilation, The Journal of Super computing, 7:229–248 (January 1993).

    Article  Google Scholar 

  25. G. T. Henry, Practical Sampling. Newbury Park, California, Sage Publications, 1990.

    Book  Google Scholar 

  26. P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, IMPACT: An architectural framework for multiple-instruction-issue processors, in Proc. 18th Ann. Int’l. Symp. Computer Architecture, Toronto, Canada, pp. 266–275 (May 1991).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Conte, T.M., Patel, B.A., Menezes, K.N. et al. Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization. Int J Parallel Prog 24, 187–206 (1996). https://doi.org/10.1007/BF03356747

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03356747

Key Words

Navigation