Abstract
Although the ”SIMD within a register” parallel architectures have existed for almost 10 years, the automatic optimizations for such architectures are not well developed yet. Since most optimizations for SIMD architectures are transplanted from traditional vectorization techniques, many special features of SIMD architectures, such as packed operations, have not been thoroughly considered. As operands are tightly packed within a register, there is no spare space to indicate overflow. To maintain the accuracy of automatic SIMDized programs, the operands should be unpacked to preserve enough space for interim overflow. By doing this, great overhead would be introduced. Furthermore, the instructions for handling interim overflows can sometimes prevent other optimizations. In this paper, a new technique, OCSA (overflow controlled SIMD arithmetic), is proposed to reduce the negative effects caused by interim overflow handling and eliminate the interference of interim overflows. We have applied our algorithm to the multimedia benchmarks of Berkeley. The experimental results show that the OCSA algorithm can significantly improve the performance of ADPCM-Decoder (110%), MESA-Reflect (113%) and DJVU-Encoder (106%).
Supported by the National Natural Science Foundation of China under Grant No. 60273046; Shanghai Science and Technology Committee of China Key Project Funding (02JC14013).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cheong, G., Lam, M.: An Optimizer for Multimedia Instruction Sets. In: Second SUIF Compiler Workshop, Stanford (January 1996)
Fisher, R.J., Dietz, H.G.: Compiling for SIMD Within Register. In: Workshop on Language and Compiler for Parallel Computing, University of North Carolina at Chapel Hill, North Carolina (1998)
Sreraman, N., Govindarajan, R.: A Vectorizing Compiler for Multimedia Extensions. International Journal of Parallel Programming 28(4), 363–400 (2000)
Larsen, S., Amarasinghe, S.: Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In: Proceeding of SIGPLAN Conference on Programming Language Design and Implementation, Vancouver B.C. (2000)
Bik, A.J.C., Girkae, M., Grey, P.M., Tian, X.: Automatic Intra-Register Vectorization for Intel Architecture. International Journal of Parallel Programming 30(2), 65–98 (2002)
Bik, A.J.C., Girkae, M., Grey, P.M., Tian, X.: Automatic Detection of Saturation and Clipping Idioms. In: Proceedings of the 15th International Workshop on Languages and Compilers for parallel computers (2002)
Krall, A., Lelait, S.: Compilation Techniques for Multimedia Processor. International Journal of Parallel Programming 18(4), 347–361 (2000)
Stephenson, M., Babb, J., Amarasinghe, S.: Bitwidth Analysis with Application to Silicon Compilation. In: ACM SIGPLAN conference on Programming Language Design and Implementation, Vancouver, British Columbia (June 2000)
Diefendorff, K., Dubey, P.K.: How Multimedia Workloads Will Change Processor Design. IEEE Computer 30(9), 43–45 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, J., Zhang, H., Shi, H., Zang, B., Zhu, C. (2005). Overflow Controlled SIMD Arithmetic. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds) Languages and Compilers for High Performance Computing. LCPC 2004. Lecture Notes in Computer Science, vol 3602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532378_30
Download citation
DOI: https://doi.org/10.1007/11532378_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28009-5
Online ISBN: 978-3-540-31813-2
eBook Packages: Computer ScienceComputer Science (R0)