In an audio recording (say a telephone conv. b/w two people), how would I programatically detect and remove the dial-tone at the beginning of a call using python. Ex : As you can see the first 15 seconds or so is just a dial tone like tring-tring-tring-tring
.
Are there any audio analysis libraries in python that could help me achieve this?
If this is not the right forum, kindly point me to the right place.
Answer
You can use librosa and scikit-learn to create a machine learning classifier. It would work roughly like this:
Training
- Get training signals of (A) just phone ringing, and (B) no phone ringing, e.g. ordinary conversation.
- Segment the training signals with a frame size of ~50-500 milliseconds.
- Extract features from each frame, e.g. MFCCs.
Train a scikit-learn classifier, e.g.
classifier.fit(X, y)
where
X
is andarray
of feature vectors, andy
are the target labels, e.g. "ring" (1) and "no ring" (0).
Prediction
classifier.predict(X)
where X
is an ndarray
of feature vectors extracted in the same way from a test signal.
The latest frame which returns a positive "ring" label is where to truncate the signal.
No comments:
Post a Comment