Abstract—Fine grain methods for parallelization of the H.264 decoder have good latency performance and less memory usage. However, they could not reach the scalability of coarse grain approaches although assuming a well-designed entropy decoder which can feed the increasing number of parallel working cores. We would like to introduce a GOP (Group of Pictures) level approach due to its high scalability, mentioning solution approaches for the well-known memory issues. Our design revokes the need to a scanner for GOP start-codes which was used in the earlier methods. This approach lets all the cores work on the decoding task. Our experiments showed that the memory initialization operations may degrade the scalability of parallel applications substantially. The multi-core cache architecture appeared to be a critical point for getting the desired speedup. We observed a speedup of 7.63 with 8 processors having separate caches, and a speedup of 13.35 using 16 processors when a cache is shared by 2 processors.
Index Terms—video compression, H.264 decoder, parallel processing, high-performance computing, image processing.
Authors are with the Electronics Engineering, Collage of Electrical Engineering and Computer Science, National Taiwan University, Taiwan, R.O.C
Cite: Ahmet Gürhanlı, Charlie Chung-Ping Chen, and Shih-Hao Hung, "Coarse Grain Parallelization of H.264 Video Decoder and Memory Bottleneck in Multi-Core Architectures," International Journal of Computer Theory and Engineering vol. 3, no. 3, pp. 375-381, 2011.