Abstract
The growing number of processing cores in a single CPU is demanding more parallelism from sequential programs. But in the past decades few work has succeeded in automatically exploiting enough parallelism, which casts a shadow over the many-core architecture and the automatic parallelization research. However, actually few work was tried to understand the nature, or amount, of the potentially available parallelism in programs. In this paper we will analyze at runtime the dynamic data dependencies among superblocks of sequential programs. We designed a meta re-arrange buffer to measure and exploit the available parallelism, with which the superblocks are dynamically analyzed, reordered and dispatched to run in parallel on an ideal many-core processor, while the data dependencies and program correctness are still maintained. In our experiments, we observed that with the superblock reordering, the potential speedup ranged from 1.08 to 89.60. The results showed that the potential parallelism of normal programs was still far from fully exploited by existing technologies. This observation makes the automatic parallelization a promising research direction for many-core architectures.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Basupalli V, Yuki T, Rajopadhye S, Morvan A, Derrien S, Quinton P, Wonnacott D (2011) ompVerify: polyhedral analysis for the OpenMP programmer. In: Chapman B, Gropp W, Kumaran K, Müller M (eds) OpenMP in the Petascale Era, vol 6665., Lecture notes in computer scienceSpringer, Berlin, pp 37–53
Blake G, Dreslinski RG, Mudge T, Flautner K (2010) Evolution of thread-level parallelism in desktop applications. In: ISCA ’10: Proceedings of the 37th annual international symposium on Computer architecture, ACM, New York, pp 302–313
Hammacher C, Streit K, Hack S, Zeller A (2009) Profiling java programs for parallelism. In: Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering., IWMSE ’09IEEE Computer Society, Washington DC, pp 49–55
Hammond L, Hubbert BA, Siu M, Prabhu MK, Chen M, Olukolun K (2000) The stanford hydra CMP. IEEE Micro 20(2):71–84
Hwu WMW, Mahlke SA, Chen WY, Chang PP, Warter NJ, Bringmann RA, Ouellette RG, Hank RE, Kiyohara T, Haab GE, Holm JG, Lavery DM (1993) The superblock: an effective technique for VLIW and superscalar compilation. J Supercomput 7(1–2):229–248
Islam MM (2007) On the limitations of compilers to exploit thread-Level parallelism in embedded applications. Computer and information science, ACIS international conference, pp 60–66
Johnson M (1991) Superscalar microprocessor design
Jones D, Marlow S, Singh S (2009) Parallel performance tuning for haskell. In: Proceedings of the 2nd ACM SIGPLAN symposium on Haskell, Haskell ’09ACM, New York, pp 81–92
Nethercote N (2004) Dynamic binary analysis and instrumentation. Ph.D. thesis, Univ. of Cambridge, Cambridge
Nethercote N, Seward J (2007) Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not 42(6):89–100
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. In: SIGGRAPH ’08: ACM SIGGRAPH 2008 classes, ACM, New York, pp 1–14
Ooi CL, Kim SW, Park I, Eigenmann R, Falsafi B, Vijaykumar TN (2001) Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor. In: Proceedings of the 15th international conference on supercomputing, ICS ’01ACM, New York, NY, pp 368–380
Ottoni G, Rangan R, Stoler A, August DI (2005) Automatic thread extraction with decoupled software pipelining. In: Proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture, MICRO 38IEEE computer society, Washington DC, pp 105–118
Packirisamy V, Zhai A, Chung Hsu W, Chung Yew P (2009) Fook Ngai T Exploring speculative parallelism in SPEC2006. In: international symposium on performance analysis of systems and software, pp 77–88
Schaumont PR (2010) Analysis of control flow and data flow a practical introduction to hardware/software codesign. A practical introduction to hardware/software codesign, chap., 3 Springer, Boston, pp 71–91
Shobaki G, Wilken KD (2004) Optimal superblock scheduling using enumeration. In: international symposium on microarchitecture, pp 283–293
Sohi GS, Breach SE, Vijaykumar TN (1995) Multiscalar processors. SIGARCH Comput Archit News 23:414–425
Terboven C, An Sarholz S (2008) OpenMP on multicore architectures. In: A practical programming model for the multi-core era, pp 54–64
Thornton JE (1965) Parallel operation in the control data 6600. In: Proceedings of the October 27–29., 1964, fall joint computer conference, Part II: very high speed computer systems, AFIPS ’64 (Fall, part II) ACM, New York, pp 33–40
Tomasulo RM (1967) An efficient algorithm for exploiting multiple arithmetic units. IBM J Res Dev 11(1):25–33
Wittenbrink CM, Kilgariff E, Prabhu A (2011) Fermi GF100 GPU architecture. Micro IEEE 31(2):50–59
Ye J (2012) Potential parallelism analysis tool of sequential programs [source code]. https://github.com/zjutoe/fat
Ye J, Chen T (2012) Exploring potential parallelism of sequential programs with superblock reordering. In: IEEE HPCC-2012
Zhong H, Mehrara M, Lieberman SA, Mahlke SA (2008) Uncovering hidden loop level parallelism in sequential applications. In: International symposium on high-performance computer, architecture, pp 290–301
Zier DA, Lee B (2010) Performance evaluation of dynamic speculative multithreading with the cascadia architecture. IEEE Tran Parallel Distrib Syst 21(1):47–59
Acknowledgments
This research is supported by the National Natural Science Foundation of China under Grant No. 61070001, the National Natural Science Foundation of Zhejiang Province No. LQ12F02017, the Special Funds for Key Program of the China No. 2011ZX0302-004-002 and 2012ZX01031001-003, the Key Science Foundation of Zhejiang Province under Grand No. 2010C11048, Open Fund of Mobile Netwok Application Technology Key Labotatory of Zhejiang Province, Innovation Group of New Generation of Mobile Internet Software and Services of Zhejiang Province.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ye, J., Yan, H., Hou, H. et al. Potential thread-level-parallelism exploration with superblock reordering. Computing 96, 545–564 (2014). https://doi.org/10.1007/s00607-014-0387-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-014-0387-8
Keywords
- Parallel computing
- Automatic parallelization
- Speculative multi-threading
- Many-core
- Parallelism measurement