Thursday, March 8, 2018

fft - Coloring a waveform with spectral centroid or by other means


Update 8


Unhappy of having to upload tracks to the service and looking at the new release of RekordBox 3 I've decided to take another look for an offline approach and a finer resolution :D


Sounds promising though still in a very alpha state:


Johnick - Good Time


enter image description here


enter image description here


Note that there's no logarithmic scale nor palette tuning, just a raw mapping from frequency to HSL.


The idea : now a waveform renderer has a color provider which is queried for a color for a specific position. The one you are seeing above gets the zero crossing rate for the 1024 samples next to that position.


Obviously there's still a lot to do before getting something robust but it seems like a good path...



From RekordBox 3:


enter image description here


enter image description here


Update 7


The final form that I'll adopt, much like in Update 3


enter image description here


(it's been a little Photoshopped to get smooth transitions between colors)


Conclusion is I was close months ago but did not consider that result thinking it was bad X)


Update 6


I've unearthed the project recently, so I thought about updating here :D



Song : Chic - Good Times 2001 (Stonebridge Club mix)


enter image description here


enter image description here


enter image description here


enter image description here


It's way better IMO, beats have a constant color etc ... it's not optimized though.


How ?


Still with http://developer.echonest.com/docs/v4/_static/AnalyzeDocumentation.pdf (page 6)


For each segment :


public static int GetSegmentColorFromTimbre(Segment[] segments, Segment segment)

{
var timbres = segment.Timbre;

var avgLoudness = timbres[0];
var avgLoudnesses = segments.Select(s => s.Timbre[0]).ToArray();
double avgLoudnessNormalized = Normalize(avgLoudness, avgLoudnesses);

var brightness = timbres[1];
var brightnesses = segments.Select(s => s.Timbre[1]).ToArray();
double brightnessNormalized = Normalize(brightness, brightnesses);


ColorHSL hsl = new ColorHSL(brightnessNormalized, 1.0d, avgLoudnessNormalized);
var i = hsl.ToInt32();
return i;
}

public static double Normalize(double value, double[] values)
{
var min = values.Min();
var max = values.Max();

return (value - min) / (max - min);
}

Obviously there's a lot more code needed before you get here (upload to service, parse JSON etc) but that's not the point of this site so I'm just posting the relevant stuff for getting the above result.


So I'm using the first 2 functions of the analysis result, there is certainly more to do with but I still need to test. If I ever find something cooler than above I'll come back and update here.


As always, any hints on the topic are more than welcome !


Update 5


Some gradient using harmonic series


enter image description here


The color smoothing is ratio-sensitive otherwise it looks bad, will need some adjustment.



Update 4


Rewrote the coloring to happen at source and smoothed colors using an Alpha beta filter with 0.08 and 0.02 values.


enter image description here


A little better when zoomed out


enter image description here


Next step is to get a great color palette !


Update 3


enter image description here


Yellows represents mediums enter image description here


Not so great when unzoomed, yet. enter image description here



(the palette needs some serious work)


Update 2


Preliminary test using second 'timbre' coefficient hint from pichenettes enter image description here


Update 1


A preliminary test using an analysis result from EchoNest service, note that it's not aligned very well (my fault) but it's much more coherent than the above approach.


enter image description here


For people interested in using this great API, start here : http://developer.echonest.com/docs/v4/track.html#profile


Also, don't get confused by these waveforms as they do represent 3 different songs.


Initial question


So far this is the result I get using a 256-samples FFT and calculating each chunk's spectral centroid.



Raw result of the calculations enter image description here


Some smoothing applied (form looks much better with it) enter image description here


Waveform produced enter image description here


Ideally, this is how it should look like (taken from Serato DJ software) enter image description here


Do you know what technique/algorithm I could use to be able to split the audio when average frequency changes over time ? (like the above image)



Answer



You can try first the following (no segmentation):



  • Process the signal in small chunks (say 10ms to 50ms in duration) - if necessary with a 50% overlap between them.

  • Compute the spectral centroid on each chunk.


  • Apply a non-linear function to the spectral centroid value to get a uniform distribution of the palette color used. Logarithm is a good start. Another option is to compute first the distribution of the centroid value other the entire file, and then color according to percentiles of this distribution (CDF). This adaptive approach guarantees that each color in the palette will be used in equal proportion. The drawback is that what is plotted in blue in one file won't be similar sounding to what is plotted in blue in another file if this adaptive approach is used!


It is not clear from the picture if Serato does this, or indeed goes one step further and attempts to segment the signal - it might not be surprising that it does since knowing the instants at which notes are present in a music signal can help synchronizing them! The steps would be:



  • Compute a Short-Term Fourier Transform (spectrogram) of the signal - using the FFT on short overlapping segments (start with a FFT size of 1024 or 2048 with 44.1kHz audio).

  • Compute an onset-detection function. I recommend you to look at this paper - the proposed approach is very effective and even a C++ implementation takes less than 10 lines. You can find an implementation in Yaafe - as ComplexDomainOnsetDetection.

  • Detect peaks in the onset-detection function to get the position of note onsets.

  • Compute the spectral centroid over all the time segments delimited by the detected onsets (don't forget to window and/or zero-pad!)

  • And don't forget the non-linear map! Note that the gradient-effect that appear between each note on the Serato waveform might be artificially generated.



Once you get this working, a "low hanging fruit" would be to compute a few other features on each segment (moments, a handful of MFCCs...), run k-means on these feature vectors, and decide on the color of the segment using the cluster index. See section II of Ravelli's paper.


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...