Saturday, January 26, 2019

image processing - What Is the Difference between Difference of Gaussian, Laplace of Gaussian, and Mexican Hat Wavelet?


There are three techniques used in CV that seem very similar to each other, but with subtle differences:



  • Laplacian of Gaussian: $\nabla^2\left[g(x,y,t)\ast f(x,y)\right]$

  • Difference of Gaussians: $ \left[g_1(x,y,t)\ast f(x,y)\right] - \left[g_2(x,y,t)\ast f(x,y)\right]$

  • Convolution with Ricker wavelet: $\textrm{Ricker}(x,y,t)\ast f(x,y)$



As I understand it currently: DoG is an approximation of LoG. Both are used in blob detection, and both perform essentially as band-pass filters. Convolution with a Mexican Hat/Ricker wavelet seems to achieve very much the same effect.


I've applied all three techniques to a pulse signal (with requisite scaling to get the magnitudes similar) and the results are pretty darn close. In fact, LoG and Ricker look nearly identical. The only real difference I noticed is with DoG, I had 2 free parameters to tune ($\sigma_1$ and $\sigma_1$) vs 1 for LoG and Ricker. I also found the wavelet was the easiest/fastest, as it could be done with a single convolution (done via multiplication in Fourier space with FT of a kernel) vs 2 for DoG, and a convolution plus a Laplacian for LoG.


Results of Ricker wavelet convolution, Laplacian of Gaussian, and Difference of Gaussian



  • What are the comparative advantages/disadvantages of each technique?

  • Are there different use-cases where one outshines the other?


I also have the intuitive thought that on discrete samples, LoG and Ricker degenerate to the same operation, since $\nabla^2$ can be implemented as the kernel $$\begin{bmatrix}-1,& 2,& -1\end{bmatrix}\quad\text{or}\quad\begin{bmatrix} 0 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 0 \end{bmatrix}\quad\text{for 2D images}$$.


Applying that operation to a gaussian gives rise to the Ricker/Hat wavelet. Furthermore, since LoG and DoG are related to the heat diffusion equation, I reckon that I could get both to match with enough parameter fiddling.



(I'm still getting my feet wet with this stuff to feel free to correct/clarify any of this!)



Answer



Laplace of Gaussian


The Laplace of Gaussian (LoG) of image $f$ can be written as


$$ \nabla^2 (f * g) = f * \nabla^2 g $$


with $g$ the Gaussian kernel and $*$ the convolution. That is, the Laplace of the image smoothed by a Gaussian kernel is identical to the image convolved with the Laplace of the Gaussian kernel. This convolution can be further expanded, in the 2D case, as


$$ f * \nabla^2 g = f * \left(\frac{\partial^2}{\partial x^2}g+\frac{\partial^2}{\partial y^2}g\right) = f * \frac{\partial^2}{\partial x^2}g + f * \frac{\partial^2}{\partial y^2}g $$


Thus, it is possible to compute it as the addition of two convolutions of the input image with second derivatives of the Gaussian kernel (in 3D this is 3 convolutions, etc.). This is interesting because the Gaussian kernel is separable, as are its derivatives. That is,


$$ f(x,y) * g(x,y) = f(x,y) * \left( g(x) * g(y) \right) = \left( f(x,y) * g(x) \right) * g(y) $$


meaning that instead of a 2D convolution, we can compute the same thing using two 1D convolutions. This saves a lot of computations. For the smallest thinkable Gaussian kernel you'd have 5 samples along each dimension. A 2D convolution requires 25 multiplications and additions, two 1D convolutions require 10. The larger the kernel, or the more dimensions in the image, the more significant these computational savings are.



Thus, the LoG can be computed using four 1D convolutions. The LoG kernel itself, though, is not separable.


There is an approximation where the image is first convolved with a Gaussian kernel and then $\nabla^2$ is implemented using finite differences, leading to the 3x3 kernel with -4 in the middle and 1 in its four edge neighbors.


The Ricker wavelet or Mexican hat operator are identical to the LoG, up to scaling and normalization.


Difference of Gaussians


The difference of Gaussians (DoG) of image $f$ can be written as


$$ f * g_{(1)} - f * g_{(2)} = f * (g_{(1)} - g_{(2)}) $$


So, just as with the LoG, the DoG can be seen as a single non-separable 2D convolution or the sum (difference in this case) of two separable convolutions. Seeing it this way, it looks like there is no computational advantage to using the DoG over the LoG. However, the DoG is a tunable band-pass filter, the LoG is not tunable in that same way, and should be seen as the derivative operator it is. The DoG also appears naturally in the scale-space setting, where the image is filtered at many scales (Gaussians with different sigmas), the difference between subsequent scales is a DoG.


There is an approximation to the DoG kernel that is separable, reducing computational cost by half, though that approximation is not isotropic, leading to rotational dependence of the filter.


I once showed (for myself) the equivalence of the LoG and DoG, for a DoG where the difference in sigma between the two Gaussian kernels is infinitesimally small (up to scaling). I don't have records of this, but it was not difficult to show.


Other forms of computing these filters



Laurent's answer mentions recursive filtering, and the OP mentions computation in the Fourier domain. These concepts apply to both the LoG and the DoG.


The Gaussian and its derivatives can be computed using a causal and anti-causal IIR filter. So all 1D convolutions mentioned above can be applied in constant time w.r.t. the sigma. Note that this is only efficient for larger sigmas.


Likewise, any convolution can be computed in the Fourier domain, so both the DoG and LoG 2D kernels can be transformed to the Fourier domain (or rather computed there) and applied by multiplication.


In conclusion


There are no significant differences in the computational complexity of these two approaches. I have yet to find a good reason to approximate the LoG using the DoG.


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...