Thursday, January 16, 2020

8*8 block matrix in JPEG image compression?


In standard JPEG format of a image, Discrete Cosine transform is used. But instead of applying the transform on whole image, we first divide the image in 8x8 block and apply transform on each of them. Thus during quantization we remove small coefficient in higher frequency. These steps are explained in details here.


But as we know that in N-samples DCT we can have only N frequecy coefficient. If we apply the DCT on the whole image we will able to get much higher frequency coefficient.


Can someone here highlight any disadvantage of applying DCT over the whole image at a time?



Answer



The lossless JPEG compression does not merely remove small coefficients in higher frequencies. It encodes them with a precision relative to a (relatively crude) visual perception model; most notably, horizontal and vertical frequencies are not quantized with the same precision. And as in many compression formats, it essentially assumes that the data is locally stationary.



If you apply the DCT over the whole image, and you quantify DCT coefficients, this quantization will effect the whole image. Imagine an image with a background checkerboard pattern, and a small zebra in foreground. With a whole-DCT compression, the zebra is likely to loose its stripes, because their energy is negligible with respect to that of the checkerboard. Even more, as JPEG applies DCT on chrominances as well, with down-sampling, color coefficient quantization is likely to produce false colors at places where they do not belong. With block size larger than about $8 \times 8$, several meaningful image gradients are more likely to happen simultaneously.


Meanwhile, one can get slightly better results with $16\times 16$ blocks. That is for the perceptual part.


There is a computational part too. Small non overlapping blocks are easier to process, and require less memory, a still expensive part of electronic devices. Early JPEG 2000, applying wavelets on the whole image, have failed adoption despite better results, partly because of their memory footprint. Now, with the advent of GPUs, processing blocks is becoming quite attractive. Last, a power of two ($8=2^3$) and clever tricks make the block DCT very efficient. In modern coders, standard often use different $2\times 4$, $4\times 4$... block sizes.


Last, but not least, it is not modular, as you need DCT implemetations for each image size. Although it is possible with software, chip makers are unlikely to like the hardware part.


So, applying DCT over the whole image:



  1. is detrimental to non-stationary images, and waster some local orientation,

  2. can be costly in terms of memory, and hardware,

  3. is not quite modular nor computationally efficient.



Thus being said, very stationary images would be better compressed with whole-size DCT.


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...