IMG_thr_gt2thr_8


Detailed Description


Functions

void IMG_thr_gt2thr_8 (const unsigned char *in_data, unsigned char *restrict out_data, short cols, short rows, unsigned char threshold)


Function Documentation

void IMG_thr_gt2thr_8 ( const unsigned char *  in_data,
unsigned char *restrict  out_data,
short  cols,
short  rows,
unsigned char  threshold 
)

Description:
This kernel performs a thresholding operation on the input image "in_data" with dimensions as specified by the input arguments "cols" and "rows". The thresholded pixels are written to the output image pointed to by "out_data". The input and output images are of the same dimensions.
Pixels with values equal-to or below the threshold are passed out unmodified. Pixels with values greater than the threshold are clipped to the threshold value in the output image.
Parameters:
in_data Input image pointer
out_data Output image pointer
cols Number of columns in the image
rows Number of rows in the image
threshold The (unsigned 8-bit) threshold
Algorithm:
The thresholding function is illustrated in the transfer function diagram below:

                 255_|                                                   
                     |                                                   
                     |                                                   
            O        |                                                   
            U        |                                                   
            T    th _|. . . . . _________                                
            P        |        /.                                         
            U        |      /  .                                         
            T        |    /    .                                         
                     |  /      .                                         
                   0_|/________.__________                               
                     |         |        |                                
                     0        th       255                               
                                                                         
                             INPUT                                       
   
The main thresholding loop is manually unrolled four times. The compiler is instructed to unroll by an additional factor of four, yielding a total unroll factor of 16.
A packed-data processing techniques is used to allow the processing of four pixels in parallel. The _amem4_const() intrinsic brings in four pixels, designated p0 thru p3. These pixels are packed into an unsigned integer variable p3p2p1p0 as illustrated below:
                                                                     
                         31  24   16    8    0                        
                         +----+----+----+----+                       
                p3p2p1p0 | p3 | p2 | p1 | p0 |                       
                         +----+----+----+----+                       
   
(Note that this illustration assumes a little endian memory configuration). We compare this packed word to a (4x) packed copy of the threshold. The packed threshold contains four copies of the threshold value, one per byte:
                                                                   
                       31  24   16    8    0                       
                         +----+----+----+----+                      
                thththth | th | th | th | th |                      
                         +----+----+----+----+
   
We compare using the intrinsic _minu4(). This instruction selects the smaller value (p3p2p1p0 versus thththth) for all four pixels in parallel. The result is that input values above the threshold are clipped to the threshold value.
The natural C implementation has no restrictions. The optimized intrinsic C code has restrictions as noted in Assumptions below.
Assumptions:
  • The input array and output array should not overlap
  • Both input and output arrays must be 64-bit aligned
  • The number of image pixels (cols x rows) must be a multiple of 16
Implementation Notes:
  • This code is fully interruptible
  • This code is compatible with C66x processors
  • This code is endian neutral
Benchmarks:
See IMGLIB_Test_Report.html for cycle and memory information.


Copyright 2012, Texas Instruments Incorporated