IMG_thr_gt2max_8


Detailed Description


Functions

void IMG_thr_gt2max_8 (const unsigned char *in_data, unsigned char *restrict out_data, short cols, short rows, unsigned char threshold)


Function Documentation

void IMG_thr_gt2max_8 ( const unsigned char *  in_data,
unsigned char *restrict  out_data,
short  cols,
short  rows,
unsigned char  threshold 
)

Description:
This kernel performs a thresholding operation on the input image "in_data" with dimensions as specified by the input arguments "cols" and "rows". The thresholded pixels are written to the output image pointed to by "out_data". The input and output images are of the same dimensions.
Pixels with values equal-to or below the threshold are passed out unmodified. Pixels with values greater than the threshold are set to the maximum unsigned 8-bit value (255) in the output image.
Parameters:
in_data Input image pointer
out_data Output image pointer
cols Number of columns in the image
rows Number of rows in the image
threshold The (unsigned 8-bit) threshold
Algorithm:
The thresholding function is illustrated in the transfer function diagram below:

                 255_|          _________                              
                     |         |                                       
                     |         |                                       
            O        |         |                                       
            U        |         |                                       
            T    th _|. . . . .|                                       
            P        |        /.                                       
            U        |      /  .                                       
            T        |    /    .                                       
                     |  /      .                                       
                   0_|/________.__________                             
                     |         |        |                              
                     0        th       255                             
                                                                     
                             INPUT                                     
   
The main thresholding loop is manually unrolled four times. The compiler is instructed to unroll by an additional factor of four, yielding a total unroll factor of 16.
A packed-data processing techniques is used to allow the processing of four pixels in parallel. The _amem4_const() intrinsic brings in four pixels, designated p0 thru p3. These pixels are packed into an unsigned integer variable p3p2p1p0 as illustrated below:
                                                                     
                         31  24   16    8    0                        
                         +----+----+----+----+                       
                p3p2p1p0 | p3 | p2 | p1 | p0 |                       
                         +----+----+----+----+                       
   
(Note that this illustration assumes a little endian memory configuration). We compare this packed word to a (4x) packed copy of the threshold. The packed threshold contains four copies of the threshold value, one per byte:
                                                                   
                       31  24   16    8    0                       
                         +----+----+----+----+                      
                thththth | th | th | th | th |                      
                         +----+----+----+----+
   
We compare using the intrinsic _cmpgtu4(). The comparison results are then expanded to masks using _xpnd4(). The result is a four-byte mask (x3210) which contains 0xFF in bytes that are greater than the threshold, and 0x00 in bytes that are less than or equal to the threshold.
To complete the thresholding process, we compute the logical OR between our original pixel values and the mask. This forces values above the threshold to 0xFF, and leaves the other values unmodified. The four pixels are then written with a single _amem4().
In this version of the code, we rely on the compiler to unroll the loop 4x (as noted above), and convert the _amem4_const() and _amem4() calls into _amemd8_const()/_amemd8() as part of its automatic optimizations.
The natural C implementation has no restrictions. The optimized intrinsic C code has restrictions as noted in Assumptions below.
Assumptions:
  • The input array and output array should not overlap
  • Both input and output arrays must be 64-bit aligned
  • The number of image pixels (cols x rows) must be a multiple of 16
Implementation Notes:
  • This code is fully interruptible
  • This code is compatible with C66x processors
  • This code is endian neutral
Benchmarks:
See IMGLIB_Test_Report.html for cycle and memory information.


Copyright 2012, Texas Instruments Incorporated