Wednesday, February 21, 2018

wavelet - Audio time stretching, without pitch shifting


This might be a Sound Design question, or a StackOverflow question since I am attempting to do this with Java.


I would like to play back a sound at the same pitch, but stretched out in time.


My attempt has been the following strategy:


I break the sound into many granules of 2K sample points at 44100 fps (have also tried other fixed lengths). For example, if the first granule runs from 0 to 2000, the second might start at sample 250 and run to 2250. The third from 500 to 2500, etc.



The question is how to string these together in a way that will create a minimum of artifacts. I'm new to granular synthesis. It seems that a lot of writing about working with granules discusses their smooth entry and exit as a function of the envelope. But with what I want to do, I've been thinking in terms of cross-fading, and thus, coming up with an envelope where the attack is a mirror image of the decay. Is this faulty?


For cross-fades, I've tried the following functions, described here in terms of a LUT that is sized to match the number of overlap samples:




  • linear:


    $$\text{lut}[i] = \frac{(n-i)}{n}$$




  • a cosine function:





$$\text{lut}[i] = \cos \left( \frac{\pi}{2} \times \frac{i}{n} \right)$$




  • an equation that eases in and out (provided by a friend):


    $$\text{lut}[i] = 3\left( \frac{i}{n} \right)^2 - 2\left( \frac{i}{n} \right)^3$$




  • squares:





$$\text{lut}[i] = \frac{i \times i}{n \times n}$$



  • square root:


$$\text{lut}[i] = \sqrt{ \frac{(n - i)}{n} }$$


Once the LUT is made, the decay factor for the first granule is gotten by iterating from front to back, and the attack of the second granule via iterating from back to front. I've tried different sizes of LUT for the different functions.


All of these are flawed, in terms of the resulting sound, even for longer overlaps, where the attack and decay are almost the entire granule length.


Of these, when dealing with "macro" granules in another context (lengths of 1/2 second with cross-fades of 1/5 of a second), I found that the "cos" function above did the best job of keeping the volume. But for small granules of about 0.05 of a second, for this time-stretching task, it seems the considerations must be rather different.


I know I've heard pretty nice time stretching effects before. Is there an entirely different approach that has to be used? Can an improvement be accomplished by using a different envelope shape?





Progress report, 4/11/17 In order to better understand the range of solutions offered, I am reading the wiki article Audio time-scale/pitch modification, and an article cited therein A Review of Time-Scale Modification of Musical Signals.




Progress, 4/16/17


I now have a valid implementation of an OLA algorithm using a Hamming window working. This was tricky to code and debug! A big help in understanding came by referring to concepts in the Driedger/Muller article cited above, in particular, the concepts of "analysis granule" and "synthesis granule" and their respective "hops" as a way to keep things straight.


A test that helped, in terms of debugging, was to modify my original naive attempt into something that should theoretically produce the equivalent of the OLA method: with my original approach (based on cross-fades), I used the Hamming window algorithm to control the cross-fading, and made the entire signal either fading in or out, and thus equal to a single granule in the OLA method.


Eventually, I got both methods to produce the same outcome. However, there is a strong pitch component that occurs with stretching, and correlates with the amount of stretch (the longer the stretch, the lower the tone). I will link an audio file of examples as soon as I can get to it. It may be an indication of something awry in my coding. [The artifacts are actually not so bad for a music-based example, the pitchiness seems to arise when there is some noise in the original. I haven't gotten to the bottom of this yet. But the OLA is definitely an improvement over my first attempts.]


I don't think I'll have the time to go deeper into the more advanced suggestions until after the concert, and so, I apologize for not designating an accepted answer. I want to actually try to implement the answer before selecting it.


Meanwhile, will probably be using Audacity's Paulstretch or its variable stretch tool to make multiple, fixed-length takes, rather than attempt to do the time-stretching procedurally during playback.




No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...