Friday, August 16, 2019

signal analysis - How to detect and remove ringback-tone and ivrs voice etc from the beginning of an audio call recording


In an audio recording (say a telephone conv. b/w two people), how would I programatically detect and remove the dial-tone at the beginning of a call using python. Ex : sample audio call As you can see the first 15 seconds or so is just a dial tone like tring-tring-tring-tring.


Are there any audio analysis libraries in python that could help me achieve this?


If this is not the right forum, kindly point me to the right place.




Answer



You can use librosa and scikit-learn to create a machine learning classifier. It would work roughly like this:


Training



  1. Get training signals of (A) just phone ringing, and (B) no phone ringing, e.g. ordinary conversation.

  2. Segment the training signals with a frame size of ~50-500 milliseconds.

  3. Extract features from each frame, e.g. MFCCs.


  4. Train a scikit-learn classifier, e.g.


    classifier.fit(X, y)


    where X is a ndarray of feature vectors, and y are the target labels, e.g. "ring" (1) and "no ring" (0).




Prediction


classifier.predict(X)

where X is an ndarray of feature vectors extracted in the same way from a test signal.


The latest frame which returns a positive "ring" label is where to truncate the signal.


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...