How do window size, sample rate influence FFT pitch estimation?

Wednesday, August 28, 2019

How do window size, sample rate influence FFT pitch estimation?

I am trying to create a pitch-detection program which extracts the frequencies of peaks in a power spectrum obtained from an FFT (fftpack). I am extracting the peak frequencies from my spectrum using Quinn's First Estimator to interpolate between bin numbers. This scheme seems to work well under certain conditions. For example, using a rectangular window function with a window size of 1024 and a sample rate of 16000, my algorithm correctly identifies the frequency of a pure A440 tone as 440.06 with a second partial frequency of 880.1. However, under other conditions, it produces inaccurate results. If I change the sample rate (e.g to 8000) or the window size (e.g. to 2048), it still correctly identifies the first partial as 440, but the second partial is somewhere around 892. The problem becomes even worse for inharmonic tones like those produced by a guitar or piano.

My general question is: In what way do the sample rate, window size, and window function affect frequency estimation of FFT peaks? My assumption was that simply increasing the resolution of the spectrum would increase the accuracy of peak frequency estimation, but this is clearly not my experience (zero padding also does not help). I am also assuming that the choice of window function will not have much effect because spectral leakage should not change the peak location (though, now that I think about it, spectral leakage could potentially influence the interpolated frequency estimate if the magnitudes of bins adjacent to the peak are artificially increased by leakage from other peaks...).

Any thoughts?

Answer

Use a Gaussian window - the Fourier transform of a Gaussian is a Gaussian

Log-scale the spectrum to emphasize peaks and turn the Gaussian peaks into parabolic peaks

Use parabolic interpolation to find the true peaks.

Note that, as mentioned in §D.1, the Gaussian window transform magnitude is precisely a parabola on a dB scale. As a result, quadratic spectral peak interpolation is exact under the Gaussian window. Of course, we must somehow remove the infinitely long tails of the Gaussian window in practice, but this does not cause much deviation from a parabola, as shown in Fig.3.30.

https://ccrma.stanford.edu/~jos/sasp/Quadratic_Interpolation_Spectral_Peaks.html

enter image description here

I estimate 1000.000004 Hz for a 1000 Hz waveform this way: https://gist.github.com/255291#file_parabolic.py

If you're having trouble, plot the spectrum and use your eyes to see why it's not working.

Notes

Wednesday, August 28, 2019

How do window size, sample rate influence FFT pitch estimation?

No comments:

Post a Comment

digital communications - Understanding the Matched Filter