As seen in the diagram, below I have a reference audio and a test audio. I want to find at what part of the test clip the reference audio be heard. Once, that is found I want to crop the test file from the point where the match begins and the point where the match ends. How can I do this matching and cropping of audio signals using Python? Some code will be of help.
The diagram below(from wiki) is for illustration purposes only
My audio clips: Reference audio: africaChirp Test audio: Africa
I want to find the time in seconds at which africaChirp appears in Africa. africaChirp was cropped from the source file, Africa, found here.
I have also included my code to show my progress:
import numpy as np
#import librosa
import logging
from scipy.io import wavfile
rate2,test = wavfile.read('africa.wav')
#indices
testStart = 6000;
testEnd = 200000;
refStart = testStart + 11000;
refEnd = refStart + 2000 - 1;
testSig=test[testStart:testEnd]
refSig=test[refStart:refEnd]
refStartRel=refStart-testStart+1
refEndRel=refEnd-testStart+1
no_samples_test=testSig.shape[0]
no_samples_ref=refSig.shape[0]
xcorr=np.zeros((no_samples_test-no_samples_ref+1,1))
xcorr_norr=np.zeros((no_samples_test-no_samples_ref+1,1))
refSig_norm= np.linalg.norm(refSig)
#print(xcorr.shape[1])
for i in range(1,xcorr.shape[0]):
testSig_samples=testSig[i:(i+no_samples_ref)]
xcorr[i]=np.sum(np.multiply(testSig_samples,refSig))
# norrFilt=(filt-np.mean(filt))/(np.std(filt))
linalgnorm=np.linalg.norm(testSig_samples)
xcorr_norr[i]=xcorr[i]/np.multiply(refSig_norm,linalgnorm)
xcorr_max_id=np.max(np.abs(xcorr_norr))
print(xcorr_max_id)
Answer
If you have a reference signal you want to find in a different signal then your model matches almost perfectly (Up to the environment the signal to be found is in) to Matched Filter.
So basically you need to do cross correlation between the Test Signal and the Reference Signal.
Find the point of maximum correlation and create a cropping zone around it according to the length of the reference signal.
Update
I downloaded the files and wrote a MATLAB Code to do the task of finding a reference signal within a signal.
I generated an equivalent test case by cropping 14,000 samples from the Song File - Toto - Africa as the Test Signal and from them cropped 2,000 samples as the Reference Signal.
Matched Filter in the case above must be tweaked to normalize different volume levels in the signal.
Hence the correct way to so is Matched Filter which is normalized by the Norm of the sections being cross correlated (This is actually correlation in Statistics).
The the result it as following:
As can be seen, the algorithm detects the exact starting index of the reference signal.
Pay attention that the classic cross correlation yields the wrong answer as it is being tricked by higher volume levels of the test signal.
When we take care of different volume level by the normalization then the maximum correlation happens exactly where it should.
The MATLAB code is available at my Signal Processing StackExchange Question 50003 - GitHub Repository.
No comments:
Post a Comment