I am a long time software engineer but have practically zero experience with signal/audio processing. I am interested in learning about signal processing via a use-case we have for one of our audio components. NOTE: this is just a learning exercise for me... it is not a priority that the end result be useful.
We have a component (A) which produces audio (via a speaker). We have another component (B) which records that audio (via a simple microphone).
What I would like to do is use A to record B's audio. Then I would like to, if even possible, compare the two streams. The goal would be to remove the audio that was present in stream A - leaving the ambient conditions which existed during the recording - I understand full removal is not possible.
I realize that there are phase and magnitude issues. I also realize that it isn't just a simple matter of "subtracting" B from A. That said, my assumption is you can subtract A from A.
I would like to understand how to approach the problem. Again, this is a learning experience for me (there are no deadlines); I am more than willing to start from the beginning.
Any advice/suggestions would be much appreciated.
Answer
This setup shares some similarities with system identification problems, where $A$ would be the input of the LTI system you want to estimate the transfer function of, $B$ being the output; and the "ambient sound" being the additive noise. The LTI assumption is reasonable provided your converters/amplifiers/transducers are of decent quality.
So the steps would be:
- Use a system identification technique to find the FIR filter $\hat{h}$ that minimizes the mean-square error between $\hat{h} \star A$ and $B$. A simple method, which might not be the most suitable here, is to divide the cross-correlation of $A$ and $B$ by the autocorrelation of $A$. Explanation here. The limitation is that it will not work well for long recordings (you might better compute your estimates on shorter segments and average them) - and that music is not the best "probe" signal to send into a system to estimate its response.
- You can now use $\hat{h} \star A$ as an estimate of the original signal A as "heard" by the microphone and subtract it from $B$ to retrieve the ambient sound.
I gave a shot at this using a music clip (A), applying a reverb and slight amp model to simulate a speaker in a room, then mixing in a cat audio sample to get (B), then estimating an impulse response from the (A, B) pair, then subtracting the filtered A from B. This shows some results but a better FIR estimation technique might help here! (note that I truncated the estimated IR to its first 5000 samples to speed up computations).
Note that there are algorithms for doing this adaptively (such as LMS). This might be more suitable for your problem if $A$ and $B$ are processed in realtime rather than offline. Such algorithms form the basis of echo cancellation systems used in telecommunications.
No comments:
Post a Comment