IMG_histogram_16


Detailed Description


Functions

void IMG_histogram_16 (unsigned short *restrict image, int n, int accumulate, short *restrict t_hist, short *restrict hist, int img_bits)


Function Documentation

void IMG_histogram_16 ( unsigned short *restrict  image,
int  n,
int  accumulate,
short *restrict  t_hist,
short *restrict  hist,
int  img_bits 
)

Description:
This code calculates the histogram of an image array "image" containing "n" pixels, with "img_bits" valid data bits per pixel. It returns a histogram with 2^img_bits bins, one for each of the possible pixel values based on the precision.
The routine can either add-to or subtract-from an existing histogram, through the "accumulate" control. The implementation requires temporary storage for 4 scratch histograms, to reduce bank conflicts.
The length of the "hist" and "t_hist" arrays depend on the pixel precision as specified by "img_bits". The length of "hist" is 2^img_bits and that of "t_hist" is 2^(img_bits+2)
Parameters:
image Input image pointer containing "n" unsigned 8-bit pixels
n Size of image in pixels
accumulate Control to add or subtract from the running histogram. This control is only defined for the values 1 and -1 for ADD and SUBTRACT respectively
t_hist Scratch buffer for temporary histogram storage (SIZE: 2^(img_bits+2))
hist Running histogram bins (SIZE: 2^img_bits)
img_bits Number of valid data bits per pixel
Algorithm:
This code operates on four interleaved histogram bins. The loop is divided into two halves: The "even" half operates on the even-numbered double-words from the input image, and the "odd" half operates on odd double-words. Each half processes four pixels at a time, and both halves operate on the same four sets of histogram bins. This introduces a memory dependency on the histogram bins which ordinarily would degrade performance. To break the memory depenencies, the two halves forward their results to each other via the register file, bypassing memory.
Exact memory access ordering obviates the need to predicate stores within the loop.
The algorithm is ordered as follows:
  1. Load from histogram for even half
  2. Store odd_bin to histogram for odd half (previous iteration)
  3. IF data_even == previous data_odd THEN increment even_bin by 2 ELSE increment even_bin by 1, forward to odd
  4. Load from histogram for odd half (current iteration)
  5. Store even_bin to histogram for even half
  6. IF data_odd == previous data_even THEN increment odd_bin by 2 ELSE increment odd_bin by 1, forward to even
  7. Repeat from 1
With this particular ordering, forwarding is necessary between the even and odd halves when pixels in adjacent halves fall in the same bin. The store is never predicated and occurs speculatively as it will be overwritten by the next value containing the extra forwarded value.
The four scratch histograms are interleaved with each bin spaced four half words apart and each histogram starting in a different memory bank. This allows the four histogram accesses to proceed in any order without worrying about bank conflicts. The diagram below illustrates this: (addresses are halfword offsets)
         0       1       2       3       4       5       6   ...        
     | hst 0 | hst 1 | hst 2 | hst 3 | hst 0 | hst 1 | ...   ...        
     | bin 0 | bin 0 | bin 0 | bin 0 | bin 1 | bin 1 | ...   ...        
   
Algorithm:
The natural C implementation has no restrictions. The optimized intrinsic C code has restrictions as noted in Assumptions below.
Assumptions:
  • The image and output histogram arrays do not overlap
  • The temporary array, t_hist, is initialized to zero for the first call to the routine (for "accumulate")
  • All arrays (image and histogram) are 64-bit aligned
  • The number of pixels is a non-zero multiple of 8.
  • The number of bits per pixel, "img_bits" is between 1 and 16 (inclusive)
  • The maximum count per bin is 32767
Implementation Notes:
  • This code is fully interruptible
  • This code is compatible with C64x+ processors
  • No bank conflicts should occur in this code
Benchmarks:
See IMGLIB_Test_Report.html for cycle and memory information.


Copyright 2012, Texas Instruments Incorporated