Witness the development of H.265

Mode-Dependent Directional Transform (MDDT) in JM/KTA

2009-09-22 Research View Comments Views(10,016)

The intra prediction in H.264/AVC is a type of spatial domain directional prediction, which means different intra prediction modes represent different prediction directions, such as horizontal, vertical, and diagonal. An intra-coded MB can be partitioned into 4×4, 8×8, or 16×16 intra prediction blocks. The 4×4 and 8×8 intra prediction blocks have nine prediction directions, respectively, and the 16×16 block has four. Hence, totally 22 (9+9+4) intra prediction modes are used in H.264/AVC. The residue usually has high energy along the direction of prediction, as edges are more difficult to be predicted than smooth areas.

Mode-dependent directional transform (MDDT) was proposed to compact the residue produced by intra prediction. It consists of a series of pre-defined separable transforms; each transform is efficient in compacting energy along one of the prediction directions, thus favoring one of the intra modes. The type of MDDT is coupled with the selected intra prediction mode, so is not explicitly signaled.

For inter prediction errors, which also contain direction information, MDDT cannot be used, unless the edge directions are explicitly detected and transmitted. However, the side information thus introduced is significant and hurts the overall performance improvement. Hence, MDDT is proposed only to intra-coded MBs.

Twenty-two separable transforms are pre-defined for the 22 intra prediction modes; each consists of two transform matrices for the horizontal and vertical transforms. The memory to store all the MDDT matrices is about 1.5Kb. The transform matrices are derived based on a large set of video sequences, which are all intra-coded. All the blocks are classified into 22 categories, according to their relevant intra prediction modes. For each category of blocks, the horizontal and vertical correlation matrices of the prediction errors are calculated, of which the eigenvectors are used to construct the horizontal and vertical transform matrices, respectively. The matrix derivation procedure of MDDT is similar to that of KLT, but MDDT is not optimal, because MDDT is separable and designed based on general statistics, which may not accord with local statistics of certain video sequences. Furthermore, basis vectors of MDDT, containing only integers, are the scaled and rounded versions of the eigenvectors, and are not orthogonal to each other. The risk that non-orthogonal transforms may take has been introduced in the earlier post (here).

It is well known separable transform efficiently deals with horizontal and vertical edges, because the basis images contain only horizontal and vertical edges, like checkerboards. MDDT–although a type of separable transform–is used to compacted energy along arbitrary directions, which seems quite contradictory. The basis images of MDDT for different intra prediction modes are studied. It is found that although the basis images also have checkerboard patterns, the positions of zero-crossings are different from those of DCT or ICT. Figs 1 and 2 show the basis images for the 4th mode (diagonal down right) of intra 8×8 and 4×4 prediction, respectively. Observing the basis image at the second row and the second column, which is typical, one will find the two squares along the diagonal down right direction have larger areas than the other two squares. Maybe, such differences make MDDT more efficient than DCT or ICT in dealing with arbitrary edges. Another observation is that intra prediction modes with close directions, such as (diagonal down right, vertical right, horizontal down) and (diagonal down left, vertical left, horizontal up), have similar MDDT basis image sets. A complete set of basis images of MDDT can be downloaded here.

INTRA8x8_4_Diagonal_Down_Right

Fig. 1 Basis images of MDDT for the 4th mode of intra 8×8 prediction — diagonal down right

INTRA4x4_4_Diagonal_Down_Right

Fig. 2 Basis images of MDDT for the 4th mode of intra 4×4 prediction — diagonal down right

MDDT has been adopted in KTA. The relevant documents include proposals (AF15, AG11, AH20, AJ24, AI36) and a conference paper [1].

[1] Y. Ye and M. Karczewicz, “Improved H.264 intra coding based on bi-directional intra prediction, directional transform, and adaptive coefficient scanning,” IEEE Int’l Conf. Image Process.’08 (ICIP08), San Diego, U.S.A., Oct. 2008.

Permanent Link: Mode-Dependent Directional Transform (MDDT) in JM/KTA

15 Comments Subscribe Comments (RSS)

hurumi

Although the memory to store MDDT matrices is limited to 1.5kb, it’s only true in software only and matrix multiplication design. If we use hardware implementation, different hardware logic is required for each transform, which means that practically, there is no way to implement MDDT in hardware, since it requires 44 transforms areas.

Even worse, when software only or DSP implementation is used, typical transform implementation is not a matrix multiplication, but a butterfly design to reduce complexity. It means 44 MDDT transforms require totally different functions to be optimized.

Jie Dong

What’s more, MDDT does not have fast algorithms, such as butterfly design. Nor can the transform matrices be decomposed into lifting structures, as they are non-orthogonal.

Sandip Ray

It is mentioned “The transform matrices are derived based on a large set of video sequences, which are all intra-coded.” Is it eigen decomposition of the the average data or average of eigen decomposition of each and every data ? Is the non-orthogonality arises because of rounding to integer ?

Jie Dong

>>Is it eigen decomposition of the the average data or average of eigen decomposition of each and every data?
I am not sure. I guess it should be the former, because the autocorrelation matrix, of which the eigenvectors construct the KLT matrix, shows the average property of a random field, e.g., a video source.

>>Is the non-orthogonality arises because of rounding to integer ?
I think so. Eigenvectors are always orthogonal to each other, which is not guaranteed after scaling and rounding.

Sandip Ray

Why are they not using closed-form sinusoidal or non-sinusoidal expression for directional transforms rather than using a training set for finding basis vectors ? If it is a closed form expression, I think we can easiliy implement fast algorithms using periodicity, symmetry and anti-symmetry properties.

Fatih

Can anyone tell me where I can download the HD test sequences used here such as BigShips City Crew Night ShuttleStart etc.. ?

Yu Liu

The sequences for VCEG common test conditions are available at ftp://vceg-seq.ateme.net/reference/, but the ftp server is a password protected site. You are advised to contact the Rapporteur (Gary Sullivan from Microsoft? I guess) to get the password according to VCEG-AJ10r1.

Zhou Jin

Please, Did anyone ever tested the KTA software and tried to view the .264 file it produced? I mean, I tried several versions but none of them produced a .264 file that can be viewed by players such like StreamEYE. Or did it need some specific players or something?

Yu Liu

The KTA software includes a lot of additional coding tools that are not compatible with H.264 syntax, so you can’t use H.264 video player to view the KTA bitstream. If you want the KTA software to produce a H.264 compatible bitstream, you should disable the KTA MACRO definitions in the “defines.h”, i.e., “ADAPTIVE_FILTER” to “SIMPLIFIED_RDPIC_DECISION”.

Zhou Jin

>>The KTA software includes a lot of additional coding tools that are not compatible with H.264 syntax, so >>you can’t use H.264 video player to view the KTA bitstream.
Thank you. So basically if I want to evaluate the stream then have to do it after decoding?

Yu Liu

Yes, if you just want to evaluate the subjective quality of reconstructed video, you have to use the KTA decoder to parse the KTA bitstream.

hohai

please,can you tell me how can we get basis images of MDDT .I have see the mddct i have found that there is something different .I want to know the principle of the algorithm.

Edward

I wonder is there any “directional” transforms such as Contourlet and Wavelet-based Contourlet used in the latest development of H.264/AVC or H.265 reference softwares?

Thank you very much!

Jie Dong

No.

Rebecca

Hi,I wonder how to make decision between 8x8DCT and 4X4DCT for inter prediction MB without sub-partition less than 8×8 in H.264/AVC. Could you please give me some hints? In fact ,I cann’t understand why JM just computs SATD8x8 and SADT4x4 respectively for a 8x8block and then get the best transform mode.
PS: the function is int GetBestTransformP8x8(Macroblock *currMB) .Thanks very much!

Post Comment

*
To prove you're a person (not a spam script), type the security word shown in the picture.
Anti-spam image