The spectrogram if generally defined with the squared magnitude of the fft. However, in lots of implementation, it seems that people just use the magnitude without square.
Moreover an audio signal is by convention scale between -1 and 1. This scaling often needs a supplementary step in implementations, in python language for example, which is not always do.
Finally, what are best practices to compute an audio spectrogram? - square magnitude of the fft / magnitude of the fft ? - Integer audio values / scaling (-1 to 1) audio values
EDIT
As the comments tell, these are questions without consequences if the aim is to plot an image of the spectrogram.
However, I would like to use the matrix of the spectrogram as an entry point for sound analysis and recognition. In this case, the computation process matters and I find curious that the implementations differ so often.
No comments:
Post a Comment