signal analysis - How to detect and remove ringback-tone and ivrs voice etc from the beginning of an audio call recording

Friday, August 16, 2019

signal analysis - How to detect and remove ringback-tone and ivrs voice etc from the beginning of an audio call recording

In an audio recording (say a telephone conv. b/w two people), how would I programatically detect and remove the dial-tone at the beginning of a call using python. Ex : As you can see the first 15 seconds or so is just a dial tone like tring-tring-tring-tring.

Are there any audio analysis libraries in python that could help me achieve this?

If this is not the right forum, kindly point me to the right place.

Answer

You can use librosa and scikit-learn to create a machine learning classifier. It would work roughly like this:

Training

Get training signals of (A) just phone ringing, and (B) no phone ringing, e.g. ordinary conversation.

Segment the training signals with a frame size of ~50-500 milliseconds.

Extract features from each frame, e.g. MFCCs.

Train a scikit-learn classifier, e.g.
```
classifier.fit(X, y)
```
where X is a ndarray of feature vectors, and y are the target labels, e.g. "ring" (1) and "no ring" (0).

Prediction

classifier.predict(X)

where X is an ndarray of feature vectors extracted in the same way from a test signal.

The latest frame which returns a positive "ring" label is where to truncate the signal.

Notes

Friday, August 16, 2019