Update 8
Unhappy of having to upload tracks to the service and looking at the new release of RekordBox 3 I've decided to take another look for an offline approach and a finer resolution :D
Sounds promising though still in a very alpha state:
Johnick - Good Time
Note that there's no logarithmic scale nor palette tuning, just a raw mapping from frequency to HSL.
The idea : now a waveform renderer has a color provider which is queried for a color for a specific position. The one you are seeing above gets the zero crossing rate for the 1024 samples next to that position.
Obviously there's still a lot to do before getting something robust but it seems like a good path...
From RekordBox 3:
Update 7
The final form that I'll adopt, much like in Update 3
(it's been a little Photoshopped to get smooth transitions between colors)
Conclusion is I was close months ago but did not consider that result thinking it was bad X)
Update 6
I've unearthed the project recently, so I thought about updating here :D
Song : Chic - Good Times 2001 (Stonebridge Club mix)
It's way better IMO, beats have a constant color etc ... it's not optimized though.
How ?
Still with http://developer.echonest.com/docs/v4/_static/AnalyzeDocumentation.pdf (page 6)
For each segment :
public static int GetSegmentColorFromTimbre(Segment[] segments, Segment segment)
{
var timbres = segment.Timbre;
var avgLoudness = timbres[0];
var avgLoudnesses = segments.Select(s => s.Timbre[0]).ToArray();
double avgLoudnessNormalized = Normalize(avgLoudness, avgLoudnesses);
var brightness = timbres[1];
var brightnesses = segments.Select(s => s.Timbre[1]).ToArray();
double brightnessNormalized = Normalize(brightness, brightnesses);
ColorHSL hsl = new ColorHSL(brightnessNormalized, 1.0d, avgLoudnessNormalized);
var i = hsl.ToInt32();
return i;
}
public static double Normalize(double value, double[] values)
{
var min = values.Min();
var max = values.Max();
return (value - min) / (max - min);
}
Obviously there's a lot more code needed before you get here (upload to service, parse JSON etc) but that's not the point of this site so I'm just posting the relevant stuff for getting the above result.
So I'm using the first 2 functions of the analysis result, there is certainly more to do with but I still need to test. If I ever find something cooler than above I'll come back and update here.
As always, any hints on the topic are more than welcome !
Update 5
Some gradient using harmonic series
The color smoothing is ratio-sensitive otherwise it looks bad, will need some adjustment.
Update 4
Rewrote the coloring to happen at source and smoothed colors using an Alpha beta filter with 0.08 and 0.02 values.
A little better when zoomed out
Next step is to get a great color palette !
Update 3
Yellows represents mediums
Not so great when unzoomed, yet.
(the palette needs some serious work)
Update 2
Preliminary test using second 'timbre' coefficient hint from pichenettes
Update 1
A preliminary test using an analysis result from EchoNest service, note that it's not aligned very well (my fault) but it's much more coherent than the above approach.
For people interested in using this great API, start here : http://developer.echonest.com/docs/v4/track.html#profile
Also, don't get confused by these waveforms as they do represent 3 different songs.
Initial question
So far this is the result I get using a 256-samples FFT and calculating each chunk's spectral centroid.
Raw result of the calculations
Some smoothing applied (form looks much better with it)
Waveform produced
Ideally, this is how it should look like (taken from Serato DJ software)
Do you know what technique/algorithm I could use to be able to split the audio when average frequency changes over time ? (like the above image)
Answer
You can try first the following (no segmentation):
- Process the signal in small chunks (say 10ms to 50ms in duration) - if necessary with a 50% overlap between them.
- Compute the spectral centroid on each chunk.
- Apply a non-linear function to the spectral centroid value to get a uniform distribution of the palette color used. Logarithm is a good start. Another option is to compute first the distribution of the centroid value other the entire file, and then color according to percentiles of this distribution (CDF). This adaptive approach guarantees that each color in the palette will be used in equal proportion. The drawback is that what is plotted in blue in one file won't be similar sounding to what is plotted in blue in another file if this adaptive approach is used!
It is not clear from the picture if Serato does this, or indeed goes one step further and attempts to segment the signal - it might not be surprising that it does since knowing the instants at which notes are present in a music signal can help synchronizing them! The steps would be:
- Compute a Short-Term Fourier Transform (spectrogram) of the signal - using the FFT on short overlapping segments (start with a FFT size of 1024 or 2048 with 44.1kHz audio).
- Compute an onset-detection function. I recommend you to look at this paper - the proposed approach is very effective and even a C++ implementation takes less than 10 lines. You can find an implementation in Yaafe - as
ComplexDomainOnsetDetection
. - Detect peaks in the onset-detection function to get the position of note onsets.
- Compute the spectral centroid over all the time segments delimited by the detected onsets (don't forget to window and/or zero-pad!)
- And don't forget the non-linear map! Note that the gradient-effect that appear between each note on the Serato waveform might be artificially generated.
Once you get this working, a "low hanging fruit" would be to compute a few other features on each segment (moments, a handful of MFCCs...), run k-means on these feature vectors, and decide on the color of the segment using the cluster index. See section II of Ravelli's paper.
No comments:
Post a Comment