The idct_8x8_12q4 algorithm performs an IEEE-1180 compliant IDCT, complete with rounding and saturation to signed 9-bit quantities. The input coefficients are assumed to be signed 16-bit DCT coefficients in 12Q4 format.
The idct_8x8_12q4 routine accepts a list of 8x8 DCT coeffient blocks and performs IDCTs on each. The array should be aligned to a 64-bit boundary, and be laid out as "idct_data[num_idcts][8][8]". Input data must be in 12Q4 format. The routine operates entirely in-place, requiring no additional storage for intermediate results.
Note: This code guarantees correct operation, even in the case that "num_idcts" is zero. In this case, the function runs for 13 cycles (counting 6 cycles of function-call overhead), due to early-exit code.
Parameters:
idct_data
Pointer to 8x8 IDCT coefficient blocks
num_idcts
Number of IDCT blocks
Algorithm:
All levels of looping are collapsed into single loops which are pipelined. The outer loop focuses on 8-pt IDCTs, whereas the inner loop controls the column-pointer to handle jumps between IDCT blocks. (The column-pointer adjustment is handled by a four-phase rotating "fixup" constant which takes the place of the original inner-loop).
For performance, portions of the outer-loop code have been inter-scheduled with the prologs and epilogs of both loops. Finally, cosine term registers are reused between the horizontal and vertical loops to save the need for reinitialization.
To save codesize, prolog and epilog collapsing have been performed to the extent that performance is not affected. The remaining prolog and epilog code has been interscheduled with code outside the loops to improve performance.
Additional section-specific optimization notes are provided below.
Assumptions:
The "idct_data" array is laid out as idct_data[num_idcts][8][8]
The "idct_data" array must be 64-bit aligned
This is a LITTLE ENDIAN implementation
Implementation Notes:
This code is fully interruptible and fully reentrant
This code is compatible with C66x processors (though not optimized)
No bank conflicts occur
The code may perform speculative reads of up to 128 bytes beyond the end or before the start of the IDCT array. The speculatively accessed data is ignored
Benchmarks:
See IMGLIB_Test_Report.html for cycle and memory information.