Analysis of Coding Tools in HEVC Test Model (HM 1.0) – Inter Prediction
2010-12-07 H.265/HEVC View Comments Views(16,200)[Update on 2011-02-15] The 12-tap DCT-based interpolation filter (high efficiency configuration) and 6-tap directional interpolation filter (low complexity configuration) for 1/4 luma sample will be replaced by 8-tap DCT-based interpolation filter for both HE and LC (JCTVC-D344)Â in the upcoming HM2.0. In addition, the bilinear interpolation filter for 1/8 chroma sample will be replaced by 4-tap DCT-based interpolation filter (JCTVC-D347) in the HM2.0.
Tools adopted in HM0.9 include 12-tap DCT-based interpolation filter (high efficiency configuration) and 6-tap directional interpolation filter (low complexity configuration) for 1/4 luma sample, bilinear interpolation filter for 1/8 chroma sample (both HE and LC), advanced motion vector prediction, bi-direction rounding control, bi-directional prediction for temporal level 0.
Motion compensation is the key factor for high efficient video compressing, where fractional pel accuracy requires interpolation of the reference frame. In H.264/AVC, a 6-tap fixed Wiener filter is first used for half pel accuracy interpolation and then a bilinear combination of integer and half pel values is used to provide ¼ pel accuracy interpolation. Instead of adaptive interpolation filter, HM still adopts the fixed 12-tap DCT-based interpolation filter to provide fractional pel accuracy interpolation by replacing the combination of Wiener and bilinear filters with a set of interpolation filters at the desired fractional accuracy. More specifically, only one filtering procedure is needed to provide the interpolation pixel to any pixel accuracy, instead of a combination of 6-tap and bilinear filtering procedures in H.264/AVC. Thus, the motion compensation process can be simplified in the implementation point of view and the complexity can also be reduced for quarter-pel accuracy. Figure 1 shows the 12-tap DCT-based interpolation filter for luma (HE).

Figure 1. 12-tap DCT-based interpolation filter for luma (HE)
The DCT-based interpolation filtering process uses the horizontal neighboring integer pixels I(x,0), where x=-5…6,  to interpolate the horizontal fractional pixels, a, b and c, by using the following equations,
a = \sum_{x=-5}^{6} I(x,y)*f(x, 1/4)Â Â Â Â Â Â Â Â Â Â Â (1)
b = \sum_{x=-5}^{6} I(x,y)*f(x, 2/4)Â Â Â Â Â Â Â Â Â Â Â (2)
c = \sum_{x=-5}^{6} I(x,y)*f(x, 3/4)Â Â Â Â Â Â Â Â Â Â Â (3)
where y=0.
The vertical neighboring integer pixels I(0,y), where y=[-5,…,6], are used to interpolate the vertical fractional pixels, d(0), h(0), and n(0), by using the following equations,
d(x) = \sum_{y=-5}^{6} I(x,y)*f(y, 1/4)Â Â Â Â Â Â Â Â Â Â Â (4)
h(x) = \sum_{y=-5}^{6} I(x,y)*f(y, 2/4)Â Â Â Â Â Â Â Â Â Â Â (5)
n(x) = \sum_{y=-5}^{6} I(x,y)*f(y, 3/4)Â Â Â Â Â Â Â Â Â Â Â (6)
where x=0.
In order to interpolate other fractional pixels, e, f, g, i, j, k, p, q, and r, the corresponding auxiliary vertical fractional pixels, d(x), h(x) and n(x), where x=-5…6, should be first interpolated using Eqs. (4)-(6), respectively. For example, in order to interpolate the fractional pixels, e, f, and g, the d(x), where x=-5…6, should be first interpolated using Eq (4). Then the e, f, and g are interpolated using the following equations,
e = \sum_{x=-5}^{6} d(x)*f(x, 1/4)Â Â Â Â Â Â Â Â Â Â Â (7)
f = \sum_{x=-5}^{6} d(x)*f(x, 2/4)Â Â Â Â Â Â Â Â Â Â Â (8)
g = \sum_{x=-5}^{6} d(x)*f(x, 3/4)Â Â Â Â Â Â Â Â Â Â Â (9)
The same procedures are applied to other fractional pixels, i,j,k, p,q, and r by using the following equations,
i = \sum_{x=-5}^{6} h(x)*f(x, 1/4)Â Â Â Â Â Â Â Â Â Â Â (10)
j = \sum_{x=-5}^{6} h(x)*f(x, 2/4)Â Â Â Â Â Â Â Â Â Â Â (11)
k = \sum_{x=-5}^{6} h(x)*f(x, 3/4)Â Â Â Â Â Â Â Â Â Â Â (12)
p = \sum_{x=-5}^{6} n(x)*f(x, 1/4)Â Â Â Â Â Â Â Â Â Â Â (13)
q = \sum_{x=-5}^{6} n(x)*f(x, 2/4)Â Â Â Â Â Â Â Â Â Â Â (14)
r = \sum_{x=-5}^{6} n(x)*f(x, 3/4)Â Â Â Â Â Â Â Â Â Â Â (15)
In the above equantions (1)-(15), f(x,1/4), f(x,2/4) and f(x, 3/4) denote the 12-tap DCT-based interpolation filter coefficients in fractional positions 1/4, 2/4, and 3/4, respectively, as listed in Table 1.
Table 1. Filter coefficients for 12-tap DCT-based interpolation filter in HE configuration
| Position | 12-tap filter coeficients |
| ¼ | {-1, 5, -12, 20, -40, 229, 76, -32, 16, -8, 4, -1}/256 |
| ½ | {-1, 8, -16, 24, -48, 161, 161, -48, 24, -16, 8, -1}/256 |
| ¾ | {-1, 4, -8, 16, -32, 76, 229, -40, 20, -12, 5, -1}/256 |
A set of interpolation filter for low complexity configuration is referred to as the Directional Interpolation Filter (DIF) and is used for all 15 quarter-pixel positions, as shown in Figure 2. All interpolated sample values can be calculated using 16 bit arithmetic. The resulting interpolated values are scaled, clipped and stored using 8 bit/sample. Figure 2 shows the support pixels for each quarter-pixel position with different colours. For instance, the blue integer pixels are used to support the interpolation of three horizontal fractional pixels, a, b and c, the light-blue integer pixels for three vertical fractional pixels, d, h and n, the deep-yellow integer pixels for two down-right fractional pixels, e and r, the light-yellow integer pixels for two down-left fractional pixels, g and p, the purple integer pixels for the central fractional pixel j.

Figure 2. 6-tap Directional interpolation filter for luma (LE)
For each of the three horizontal fractional positions, a, b and c, and the three vertical fractional positions, d, h and n, which are aligned with full pixel positions, a single 6-tap filter is used. The filter coefficients of DIF are {3, -15, 111, 37, -10, 2}/128 for ¼ position (and mirrored for ¾ position), {3, -17, 78, 78, -17, 3}/128 for ½ position.
a = (3H-15I+111J+37K-10L+2M+64)>>7
b = (3H-17I+78J+78K-17L+3M+64)>>7
c = (2H-10I+37J+111K-15L+3M+64)>>7
d = (3B-15E+111J+37O-10S+2W+64)>>7
h = (3B-17E+78J+78O-17S+3W+64)>>7
n = (2B-10E+37J+111O-15S+3W+64)>>7
For the 4 innermost quarter-pixel positions, e, g, p, and r, the 6-tap filters at +45 degree and -45 degree angles are used respectively.
e = (3A-15D+111J+37P-10U+2X+64)>>7
g = (3C-15G+111K+37O-10R+2V+64)>>7
p = (2C-10G+37K+111O-15R+3V+64)>>7
r = (2A-10D+37J+111P-15U+3X+64)>>7
For another 4 innermost quarter-pixel positions, f, i, k, and q, a combination of the 6-tap filters at +45 degree and -45 degree angles, which is equivalent to a 12-tap filter, is used.
f = (e+g+1)>>1 = ((3A-15D+111J+37P-10U+2X)+(3C-15G+111K+37O-10R+2V)+128)>>8
i = (e+p+1)>>1 = ((3A-15D+111J+37P-10U+2X)+(2C-10G+37K+111O-15R+3V)+128)>>8
k = (g+r+1)>>1 = ((3C-15G+111K+37O-10R+2V)+(2A-10D+37J+111P-15U+3X)+128)>>8
q= (p+r+1)>>1 = ((2C-10G+37K+111O-15R+3V)+(2A-10D+37J+111P-15U+3X)+128)>>8
The exception is the central position, j, where a 12-tap non-separable filter is used. The filter coefficients of DIF are {0, 5, 5, 0; 5, 22, 22, 5; 5, 22, 22, 5; 0, 5, 5, 0}/128 for the central position j.
j = ((5E+5F)+(5I+22J+22K+5L)+(5N+22O+22P+5Q)+(5S+5T)+64)>>7
[To be continued...]
Permanent Link: Analysis of Coding Tools in HEVC Test Model (HM 1.0) – Inter Prediction
# 2010-12-22 Wednesday 2:20 pm
Hi, do you know where can I get the output file of Guangzhou meeting? I can not find it from the JCT-VC website. Thank you very much.
# 2010-12-23 Thursday 9:37 am
The output documents start from JCTVC-C400 to JCTVC-C405. However, most of them are not uploaded yet.
# 2010-12-24 Friday 11:32 pm
Hi, I have another question, do you know what the doucument number of the 2th JCT-VC meeting output file? Thanks.
# 2010-12-27 Monday 11:57 am
The output documents of 2nd JCT-VC meeting start from JCTVC-B200 to JCTVC-B205, and JCTVC-B300-B312 are Tool Experiments (TE).
# 2011-12-19 Monday 11:13 pm
Hi,how can I use C program to realize DCT in HEVC?