Wednesday, August 30, 2017

frequency spectrum - Why so many methods of computing PSD?


Welch's method has been my go-to algorithm for computing power spectral density (PSD) of evenly-sampled timeseries. I noticed that there are many other methods for computing PSD. For example, in Matlab I see:



  • PSD using Burg method

  • PSD using covariance method

  • PSD using periodogram

  • PSD using modified covariance method

  • PSD using multitaper method (MTM)

  • PSD using Welch's method


  • PSD using Yule-Walker AR method

  • Spectrogram using short-time Fourier transform

  • Spectral estimation


What are the advantages of these various methods? As a practical question, when would I want to use something other than Welch's method?



Answer



I have no familiarity with the Multitaper method. That said, you've asked quite a question. In pursuit of my MSEE degree, I took an entire course that covered PSD estimation. The course covered all of what you listed (with exception to the Multitaper method), and also subspace methods. Even this only covers some of the main ideas, and there are many methods stemming from these concepts.


For starters, there are two main methods of power spectral density estimation: non-parametric and parametric.


Non-parametric methods are used when little is known about the signal ahead of time. They typically have less computational complexity than parametric models. Methods in this group are further divided into two categories: periodograms and correlograms. Periodograms are also sometimes referred to as direct methods, as they result in a direct transformation of the data . These include the sample spectrum, Bartlett's method, Welch's method, and the Daniell Periodogram. Correlograms are sometimes referred to as indirect methods, as they exploit the Wiener-Khinchin theorem. Therefore these methods are based on taking the Fourier transform of some sort of estimate of the autocorrelation sequence. Because of the high amount of variance associated with higher order lags (due to a small amount of data samples used in the correlations), windowing is used. The Blackman-Tukey method generalizes the correlogram methods.


Parametric methods typically assume some sort of signal model prior to calculation of the power spectral density estimate. Therefore, it is assumed that some knowledge of the signal is known ahead of time. There are two main parametric method categories: autoregressive methods and subspace methods.



Autoregressive methods assume that the signal can be modeled as the output of an autoregressive filter (such as an IIR filter) driven by a white noise sequence. Therefore all of these methods attempt to solve for the IIR coefficients, whereby the resulting power spectral density is easily calculated. The model order (or number of taps), however, must be determined. If the model order is too small, the spectrum will be highly smoothed, and lack resolution. If the model order is too high, false peaks from an abundant amount of poles begin to appear. If the signal may be modeled by an AR process of model 'p', then the output of the filter of order >= p driven by the signal will produce white noise. There are hundreds of metrics for model order selection. Note that these methods are excellent for high-to-moderate SNR, narrowband signals. The former is because the model breaks down in significant noise, and is better modeled as an ARMA process. The latter is due to the impulsive nature of the resulting spectrum from the poles in the Fourier transform of the resulting model. AR methods are based on linear prediction, which is what's used to extrapolate the signal outside of it's known values. As a result, they do not suffer from sidelobes and require no windowing.


Subspace methods decompose the signal into a signal subspace and noise subspace. Exploiting orthogonality between the two subspaces allows a pseudospectrum to be formed where large peaks at narrowband components can appear. These methods work very well in low SNR environments, but are computationally very expensive. They can be grouped into two categories: noise subspace methods and signal subspace methods.


Both categories can be utilized in one of two ways: eigenvalue decomposition of the autocorrelation matrix or singular value decomposition of the data matrix.


Noise subspace methods attempt to solve for 1 or more of the noise subspace eigenvectors. Then, the orthogonality between the noise subspace and the signal subspace produces zeros in the denominator of the resulting spectrum estimates, resulting in large values or spikes at true signal components. The number of discrete sinusoids, or the rank of the signal subspace, must be determined/estimated, or known ahead of time.


Signal subspace methods attempt to discard the noise subspace prior to spectral estimation, improving the SNR. A reduced rank autocorrelation matrix is formed with only the eigenvectors determined to belong to the signal subspace (again, a model order problem), and the reduced rank matrix is used in any one of the other methods.


Now, I'll try to quickly cover your list:




  • PSD using Burg method: The Burg method leverages the Levinson recursion slightly differently than the Yule-Walker method, in that it estimates the reflection coefficients by minimizing the average of the forward and backward linear prediction error. This results in a harmonic mean of the partial correlation coefficients of the forward and backward linear prediction error. It produces very high resolution estimates, like all autoregressive methods, because it uses linear prediction to extrapolate the signal outside of its known data record. This effectively removes all sidelobe phenomena. It is superior to the YW method for short data records, and also removes the tradeoff between utilizing the biased and unbiased autocorrelation estimates, as the weighting factors divide out. One disadvantage is that it can exhibit spectral line splitting. In addition, it suffers from the same problems all AR methods have. That is, low to moderate SNR severely degrades the performance, as it is no longer properly modeled by an AR process, but rather an ARMA process. ARMA methods are rarely used as they generally result in a nonlinear set of equations with respect to the moving average parameters.





  • PSD using covariance method: The covariance method is a special case of the least-squares method, whereby the windowed portion of the linear prediction errors is discarded. This has superior performance to the Burg method, but unlike the YW method, the matrix inverse to be solved for is not Hermitian Toeplitz in general, but rather the product of two Toeplitz matrices. Therefore, the Levinson recursion cannot be used to solve for the coefficients. In addition, the filter generated by this method is not guaranteed to be stable. However, for spectral estimation this is a good thing, resulting in very large peaks for sinusoidal content.




  • PSD using periodogram: This is one of the worst estimators, and is a special case of Welch's method with a single segment, rectangular or triangular windowing (depending on which autocorrelation estimate is used, biased or unbiased), and no overlap. However, it's one of the "cheapest" computationally speaking. The resulting variance can be quite high.




  • PSD using modified covariance method: This improves on both the covariance method and the Burg method. It can be compared to the Burg method, whereby the Burg method only minimizes the average forward/backward linear prediction error with respect to the reflection coefficient, the MC method minimizes it with respect to ALL of the AR coefficients. In addition, it does not suffer from spectral line splitting, and provides much less distortion than the previously listed methods. In addition, while it does not guarantee a stable IIR filter, it's lattice filter realization is stable. It is more computationally demanding than the other two methods as well.





  • PSD using Welch's method: Welch's method improves upon the periodogram by addressing the lack of the ensemble averaging which is present in the true PSD formula. It generalizes Barlett's method by using overlap and windowing to provide more PSD "samples" for the pseudo-ensemble average. It can be a cheap, effective method depending on the application. However, if you have a situation with closely spaced sinusoids, AR methods may be better suited. However, it does not require estimating the model order like AR methods, so if little is known about your spectrum a priori, it can be an excellent starting point.




  • PSD using Yule-Walker AR method: This is a special case of the least squares method where the complete error residuals are utilized. This results in diminished performance compared to the covariance methods, but may be efficiently solved using the Levinson recursion. It's also known as the autocorrelation method.




  • Spectrogram using short-time Fourier transform: Now you're crossing into a different domain. This is used for time-varying spectra. That is, one whose spectrum changes with time. This opens up a whole other can of worms, and there are just as many methods as you have listed for time-frequency analysis. This is certainly the cheapest, which is why its so frequently used.




  • Spectral estimation: This is not a method, but a blanket term for the rest of your post. Sometimes the Periodogram is referred to as the "sample spectrum" or the "Schuster Periodogram", the former of which may be what you're referring to.





If your interested, you may also look into subspace methods such as MUSIC and Pisarenko Harmonic Decomposition. These decompose the signal into signal and noise subspace, and exploits the orthogonality between the noise subspace and the signal subspace eigenvectors to produce a pseudospectrum. Much like the AR methods, you may not get a "true" PSD estimate, in that power most likely is not conserved, and the amplitudes between spectral components is relative. However, it all depends on your application.


Cheers


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...