So, I've just learned that the human voice is not a single sine wave its a bunch of unlimited Sine waves each having a different frequencies,
According to Wikipedia,
The voice consists of sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc. Its frequency ranges from about 60 to 7000 Hz.
So if the human voice is a composite signals, it contains various frequencies ranges between 60Hz to 7KHz.
Suppose if there is a group of people singing the same song all together, each person has its own ranges of voice frequencies,
For example,
If a person A has the following frequencies, 100Hz , 250HZ 6KHz, 10Hz, 87Hz, 52Hz, 2KHz.......
and a Person B has the following, 217Hz1, 11Hz, 12Hz, 2323Hz, 839Hz, 4KHz, 100Hz, 10Hz.....
there must be so many frequencies which are similar in both the person A & B, like in above example the frequencies 100Hz and 10Hz are common between two persons.
I was watching a TV Show name "Fringe" where they filter out the particular Man's voice from an audio file while there were other people voice present there too.
So how does exactly they filter someone's voice out of the voice of 100s of people if there are so many frequencies common among all of them does it have to do something with the amplitudes of person's frequencies ?
Answer
If the signal is recorded using just one microphone, you can use methods such as spectral subtraction. This method is more suitable for "constant" noise, like the noise from a fan or an idle engine. Other methods rely on statistics and perceptual models of speech. If the signal is recorded with several microphones, you can use blind source separation for separating the (speech) signals. As it stands today, you won't get perfect results. The typical end-result is always a trade off between "noise" and clarity of the speech signal of interest. More "noise" suppression --> more degradation of the signal of interest.
No comments:
Post a Comment