Friday, May 3, 2019

image processing - 2D Haar Transform Computational Complexity


I want to learn the computational complexity of the 2D Discrete Haar (Wavelet) Transform (DHWT). The number of operations (divisions, summations etc.) is the main focus. I think there are many different ideas on that. There are even papers claiming that we do not need any multiplications for it. I am confused. Any help? Thanks.


Edit: I should have added the orthogonality requirement for the Haar transform.




Answer



Roughly $N_{\mathrm{op}} = 8\times 4^N/3$ or $N_{\mathrm{op}} = 16\times 4^N/3$. Let us first focus on "There are even papers claiming that we do not need any multiplications for it." and suppose you would like to avoid them. Finally, you can save a little if you abandon orthonormality, even linearity.


The most basic implementation of the Haar wavelet takes $p_0$ and $p_1$ and computes $d = p_0-p_1$ and $a=p_0+p_1$. Apply that horizontally and vertically on a block of four pixels, you need 8 $\pm$. So on a $2^N \times 2^N$ image, you have $2^{N-1} \times 2^{N-1}$ blocks you need $8\times 2^{N-1} \times 2^{N-1} \pm$. On the next level, you iterate on only one low-pass $ 2^{N-1} \times 2^{N-1} $ image, and get $8\times 2^{N-2} \times 2^{N-2} \pm$ more. Down to $J$ levels, you get roughly $$N_{\mathrm{op}\pm} = 8\times 4^N(1-4^{-J})/3 \quad \pm (\mathrm{plus}\;\mathrm{or}\;\mathrm{minus})$$ overall. This transform is not orthonormal, and the $\pm$ may yield too-high amplitude on limited arithmetic.


Another implementation is sometimes termed S+P transform (Sequential + Prediction or Said+Pearlman) or lifting Haar wavelet, and uses a different normalization on the low-pass coefficient. Take again $p_0$ and $p_1$ and computes $d = p_0-p_1$ and $a=p_1+d/2$. At first you add a divide, but it is a binary shift per two $\pm$, it does not cost much. What you gain here is that computations can be performed in place: $p_0 = p_0-p_1$ and $p_1=p_1+p_0/2$. Which could be useful in application where memory costs.


None of them use multiplies.


If you want orthonormality, using divisions by $\sqrt{2}$, then you can add as much multiplies/division as additions/subtractions. The reason is: you know compute $d = (p_0-p_1)/\sqrt{2}$ and $a=(p_0+p_1)/\sqrt{2}$, so for each add or substract, you have one more multiply/divide.


Some other tricks may save you operations, but only quite few in comparison of the above numbers.


Finally, all of this should be balanced with material properties: acceleration, parallelization, CPU/GPU tricks, SIMD instructions can improve the above figures in practice.


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...