This question may be more appropriate on music stack exchange but since sound is a branch of physics and this is more of a technical question relating to sound waves than a musical question I asked it here.
How can you change the speed of audio samples without changing pitch? I have heard songs that are sped up without changing the pitch of the song. I also know that simply compressing an audio sample changes the frequency and thus the pitch of the sound waves that are perceived by our brains. So how is this done without changing pitch?
Answer
This question (about "time-scaling" audio) is closely related to pitch shifting, which is time-scaling combined with resampling. But changing the speed without changing pitch is only time-scaling, so there is no resampling involved (contrary to what thomas has suggested).
There are frequency-domain methods (phase-vocoder and sinusoidal modeling) that can change speed for an orchestral mix (or some other broadbanded non-periodic sound) without glitches, but they can become a little "phasey" in their stretched sound.
If it's monophonic, like a solo or a single note or string or voice, a less expensive time-domain approach is sufficient and can sound very good, under this monophonic condition. If it's a single note or tone, the signal is quasiperiodic. That means, in the vicinity of some time, $t_0$, that
$$ x(t) \approx x(t - \tau(t_0)) \qquad \text{for } t \approx t_0 $$
So $\tau(t_0) > 0$ is the estimated period of the quasiperiodic waveform in the neighborhood of $t_0$. This estimated period $\tau(t_0)$ is estimated by finding the value of $\tau$ that minimizes something like
$$ Q_x(\tau, t_0) = \int \Big|x(t) - x(t-\tau)\Big|^2 w(t-t_0) \, dt $$
where the window $w(t-t_0) \ge 0$ is centered around the time $t_0$. If there are multiple candidates of $\tau$ that minimize $Q_x(\tau, t_0)$, usually it is best to pick the smallest $\tau$ in which $Q_x(\tau, t_0)$ is small.
Now imagine copying your note $x(t)$ and offsetting (delaying) the copy in time by the amount $\tau(t_0)$. That delayed copy would be $x(t - \tau(t_0))$. Then you have two identical waveforms, except for the offset of $\tau(t_0)$ but around the time $t_0$ the two waveforms will look almost the same because the offset is exactly one period, based on the estimate of the period around time $t_0$.
Then you can cross-fade (using a Smoothstep function $S(t)$) from the original to the offset copy, the cross-fade will not have any nasty cancellation or "destructive interference" because the waveforms are lined up. And the result will be the same note, but one period longer. (Longer by $\tau(t_0)$ seconds.)
$$ y(t) = \Big(1-S\big(\tfrac{t-t_0}{\tau(t_o)}\big) \Big) \, x(t) \ + \ S\big(\tfrac{t-t_0}{\tau(t_o)}\big) \, x(t - \tau(t_0)) $$
(That won't stretch it long enough to notice a difference, but if you repeat this operation many times per second, you can stretch a patch of sound by even a factor of two.)
No comments:
Post a Comment