IMG_wave_horz


Detailed Description


Functions

void IMG_wave_horz (short *iptr, short *qmf, short *filter, short *optr, int ish_x_dim)


Function Documentation

void IMG_wave_horz ( short *  iptr,
short *  qmf,
short *  filter,
short *  optr,
int  ish_x_dim 
)

Description:
This kernel performs a 1D Periodic Orthogonal Wavelet decomposition. This also performs athe row decomposition in a 2D wavelet transform. An in put signal x[n] is first low pass and high pass filterd and decimated by two. This results in a reference signal r1[n] which is the decimated output obtained by dropping the odd samples of the low pass filtered output and a detail signal d[n] obtained by dropping the odd samples of the high-pass output. A circular convolution algorithm is implemented and hence the wavelet transform is periodic. The reference signal and the detail signal are half the size of the original signal. The reference signal may then be iterated again to perform another scale of multi-resolution analysis.
Parameters:
iptr Input row of data
qmf Qmf filter-bank for Low-Pass
filter Mirror qmf filter bank for High-pass
optr Output row of detailed/reference decimated outputs
ish_x_dim Width of the input row
Algorithm:
The main idea behind the optimized C code is to issue one set of reads to the x array and to perform low-pass and high pass filtering together and to perfrom the filtering operations together to maximize the number of multiplies. The last 6 elements of the low-pass filter and the first 6 elements of the high pass filter use the same input This is used to appropraitely change the output pointer to the low pass filter after 6 iterations. However for the first six iterations pointer wrap-around can occurr and hence this creates a dependency. Pre-reading those 6 values outside the array prevents the checks that introduce this dependency. In addtion the input data is read as word wide quantities and the low-pass and high-pass filter coefficients are stored in registers allowing for the input loop to be completely unrolled. Thus the intrinsic C code has only one loop. A predication register is used to reset the low-pass output pointer after three iterations. The merging of the loops in this fashion allows for the maximum number of multiplies with the minimum number of reads.
Assumptions:
  • This kernel places no restrictions on the alignment of its input
Implementation Notes:
  • This kernel uses the Daubechies D4 filter bank for analysis with 4 vansishing moments. Hence the length of the analyzing low-pass and high pass filters is 8
  • The optimized kernel should not have any bank conflicts
  • This code is compatible with C66x processors
Benchmarks:
See IMGLIB_Test_Report.html for cycle and memory information.


Copyright 2012, Texas Instruments Incorporated