logo

SCIENTIA SINICA Informationis, Volume 47, Issue 3: 310-325(2017) https://doi.org/10.1360/N112016-00146

Identifying superword level parallelism with directed graph reachability}{Identifying superword level parallelism with directed graph reachability

More info
  • ReceivedJun 9, 2016
  • AcceptedJul 19, 2016
  • PublishedJan 13, 2017

Abstract

SLP (superword level parallelism) is an efficient solution to exploit the parallelism between statements in the basic blocks of SIMD (single instruction multiple data). It has been implemented in almost all the mainstream vectorizing compilers. However, its vectorization ability is limited due to the conservativeness of the parallelism identification process. To solve this problem, this paper proposes an edg (extended dependence graph) approach to identify SLP. First, we extend the adg (array dependence graph) and sdg (statement dependence graph) to construct the edg, which includes both the dependences between each array pair and those between each statement pair. When a statement is represented as an SCC (strong connected component), all of its array references are also constructed in this SCC. We then eliminate the redundant dependence edges between the SCCs from the edg. The dependence information of each statement pair and its SLP vectorization are thus determined by analyzing the reachability between each node pair from the corresponding SCC. We implement this approach to optimize the Open64-5.0 compiler, which improves the compiler's identification ability. The evaluation tests on the gcc-vect benchmarks show that the optimized Open64-5.0 compiler can identify more SLP vectorizable loops than the GCC4.9, and that the number of vectorizable loops is comparable to that of ICC14.0. The performance of our generated codes is better than the state-of-the-art for most practical applications.


Funded by

``核高基"国家科技重大专项(2009ZX01036-001-001-2)

数学工程与先进计算国家重点实验室开放课题(2013\linebreak A11)


References

[1] Kahle J A, Day M N, Hofstee H P, et al. Introduction to the cell multiprocessor. IBM J Res Dev, 2005, 49: 589-604 CrossRef Google Scholar

[2] Bachega L, Chatterjee S, Dockserz K A, et al. A high-performance SIMD floating point unit for blueGene/L: architecture, compilation and algorithm design. In: Proceedings of the 13rd International Conference on Parallel Architecture and Compilation Techniques. Washington: IEEE Computer Society, 2004. 85-96. Google Scholar

[3] Allen R, Kennedy K. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. San Francisco: Morgan Kaufmann Publishers Inc, 2001. Google Scholar

[4] Larsen S, Amarasinghe S. Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation. New York: ACM, 2000. 145-156. Google Scholar

[5] Padua D A, Wolfe M J. Advanced compiler optimizations for supercomputers. Commun ACM, 1986, 29: 1184-1201 CrossRef Google Scholar

[6] Bulic P, Gustin V. D-test: an extension to banerjee test for a fast dependence analysis in a multimedia vectorizing compiler. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium. Washington: IEEE Computer Society, 2004: 535-546. Google Scholar

[7] Liu J, Zhang Y, Jang O, et al. A compiler framework for extracting superword level parallelism. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2012. 347-358. Google Scholar

[8] Shin J. Compiler optimizations for architectures supporting superword-level parallelism. Dissertation for Ph.D. Degree. California: University of Southern California Los Angeles, 2005. Google Scholar

[9] Shin J, Chame J, Hall M. Compiler-controlled caching in superword register files for multimedia extension architectures. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Washington: IEEE Computer Society, 2002. 45-55. Google Scholar

[10] Shin J, Chame J, Hall M. Exploiting superword-level locality in multimedia extension architectures. J Instruction Level Parall, 2003, 5: 1-28. Google Scholar

[11] Shin J, Hall M, Chame J. Superword-level parallelism in the presence of control flow. In: Proceedings of the International Symposium on Code Generation and Optimization, 2005. 165-175. Google Scholar

[12] Karrenberg R, Hack S. Whole-function vectorization. In: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Washington: IEEE Computer Society, 2011. 141-150. Google Scholar

[13] Bik A, Girkar M, Grey P, et al. Automatic intra-register vectorization for the Intel architecture. Int J Parall Prog, 2002, 30: 65-98 CrossRef Google Scholar

[14] Tenllado C, Pinuel L, Prieto M, et al. Pack transposition: enhancing superword level parallelism exploitation. In: Proceedings of the International Conference Parallel Computing: Current & Future Issues of High-End Computing, Malaga, 2005. 33: 573-580. Google Scholar

[15] Tenllado C, Prieto L P M, Tirado F, et al. Improving superword level parallelism support in modern compilers. In: Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. New York: ACM, 2005. 303-308. Google Scholar

[16] Nuzman D, Rosen I, Zaks A. Auto-vectorization of interleaved data for SIMD. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2006. 132-143. Google Scholar

[17] Nuzman D, Zaks A. Outer-loop vectorization-revisited for short SIMD architectures. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Washington: IEEE Computer Society, 2008. 2-11. Google Scholar

[18] Scarborough R G, Kolsky H G. A vectorizing Fortran compiler. IBM J Res Dev, 1986, 30: 163-171 CrossRef Google Scholar

[19] Wu P, Eichenberger A E, Wang A, et al. An integrated SIMDization framework using virtual vectors. In: Proceedings of the 19th Annual International Conference on Supercomputing. New York: ACM, 2005. 169-178. Google Scholar

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号