The source code of iqdb contains a 2D Haar transform implementation. The author claims to have implemented it according to the paper "Fast Multiresolution Image Querying", which is freely available here.
This is the relevant text from the paper:
A standard two-dimensional Haar wavelet decomposition of an image is very simple to code. It involves a one-dimensional decomposition on each row of the image, followed by a one-dimensional decomposition on each column of the result.
The following pseudocode performs this one-dimensional decomposition on an array A of h elements, with h a power of two:
proc DecomposeArray (A : array[0..h-1] of color):
A <- A / sqrt(h)
while h > 1 do:
h <- h/2
for i <- 0 to h-1 do:
A'[i] <- (A[2i] + A[2i + 1]) / sqrt(2)
A'[h+i] <- (A[2i] - A[2i + 1]) / sqrt(2)
end for
A <- A'
end while
end proc
In the pseudocode above, the entries of A are assumed to be 3dimensional color components, each in the range [0,1]. The various arithmetic operations are performed on the separate color components individually.
An entire r x r image T can thus be decomposed as follows:
proc DecomposeImage(T : array[0..r-1, 0..r-1] of color):
for row <- 1 to r do:
DecomposeArray(T[row, 0..r-1])
end for
for col <- 1 to r do:
DecomposeArray(T[0..r-1, col])
end for
end proc
(end quote)
Implementing it this way does not produce results that match the example images in the majority of the articles I have found on the internet covering this topic, including the Wikipedia article.
Note: The image is divided into 4 large squares, and (only) the top left square is further divided into 4 squares.
However, I have also found counterexamples (i.e. examples that follow the scheme used in the paper above), e.g. here.
The question is whether to loop over all rows and columns, and doing the fully recursive transform in an inner loop for each row or column, - OR - doing one pass of recursion in the outermost loop, and within each single pass only processing the remaining rows and columns.
I have implemented both approaches to demonstrate the difference visually: https://bplu4t2f.github.io/wavelet_toy/
In the approach that wikipedia uses (which I call "pass major" because the pass of recursion is the outermost loop), the emerging pattern shows that each pass divides the image into 4 squares, and only the upper left square is modified in the next pass.
In the approach that iqdb uses (which I call "pass minor"), the emerging pattern shows that only the bottom right of the 4 divisions remains unchanged during subsequent passes.
The pass minor approach feels incorrect to me, because, when looking at it intuitively, it reprocesses parts of the image that have already been transformed during each pass, effectively applying a primitive edge detection scheme recursively on previosly detected edges. It doesn't seem to make much sense to me.
Which of these approaches is correctly being referred to as 2D Haar wavelet decomposition? To both approaches have a name?
Answer
[Beginning of the story] Remember the discrete wavelet curse: in 1D, with 2-scale or dyadic wavelets, you cannot have finite support, realness, orthogonality and linear phase (symmetry/antisymmetry) at the same time, except for the Haar wavelet, which lacks of regularity and overlap. You have to lift one constraint to have the other fulfilled. For instance :
- if you lift symmetry, you get Daubechies wavelets
- if you lift the dyadic, 2-scale, you get FIR $M$-band filter banks, etc.
When extending discrete wavelets to 2D, many options appear. Genuine two-dimensional dyadic wavelets exist (e.g. Design of Regular Nonseparable Bi-dimensional Wavelets Using Gröbner Basis Techniques, 1998), but their lack for separability make them cumbersome (and not often used in practice).
So, most people stick to the large background of known 1D dyadic wavelet designss and apply them on rows an columns of cartesian sampled data, like images. Thus is clearly a lack of imagination, and more genuine 2D, oriented, geometrical wavelets are possible, but the SE margin is too small to wrote about it.
Meanwhile, sticking to 1D wavelet designs, the order of operations on rows and columns matter.
Two main schemes are classics, but depending on the literature, they are more or less known, and more or less used, and often under different names, so:
- process all rows then all columns (or reverse): this scheme seems better known in numerical analysis and partial differential equations. It can be found under many names: separable, standard, S-form, rectangular, anisotropic, tensor, hyperbolic, separated.
- process rows and columns alternatively: this scheme seems better known in image/video processing. It can be found under many names: non-separable wavelet transform, non-standard form, NS-form, square wavelet transform, isotropic wavelet transform, Mallat decomposition, isotropic wavelet transform, combined wavelet transform.
Some papers are given here, and it is detailed in "Chapter 3. Oriented and geometrical multiscale representations" of the 2011 review paper on 2D wavelets: A panorama on Multiscale Geometric Representations, intertwining spatial, directional and frequency selectivity.
What is the best? I sure do not know, I I am still working on this. But it's geeting late, and I am attending these days a conference in honor of Alexandre Grossmann and Yves Meyer on wavelets in Paris, and I shall wake up early.
[EDIT: Added references to related questions]
Let me backlet... rrr
No comments:
Post a Comment