Several simple experiments I made.
How to?
- (Fork and) clone Github's sms-tools into local repository.
- Install sms-tools (and essentia if interested) in Ubuntu (or Debian, I don't know whether it works in other Linux distribs, I compiled sms-tools in Windows 7 successfully).
- For transformation, open sms-tools/software/transformations and sms-tools/software/transformations-interface folders.
- Available transformation models: sine, STFT morph (short time Fourier transform), harmonic, stochastic, HPS (harmonic plus stochastic), HPS morph.
- Either (a) Run python or (b) write Python script to call transformation models' functions.
- In the GUI or Python code, first to do analysis to determine best HPS model's parameters, then apply transformation.
- To help determine f0, harmonics, and several parameters, we can use "Sonic Visualiser" (free) with its Spectogram pane and Aubio plugin (pitch detector, onset detector).
(1) Single-pass.
Source is "speech-female.wav" with duration of 3.2836 sec.
HPS analysis parameters are,
- window type = blackman,
- window size M = 1101,
- FFT size N = 4096,
- threshold t = -90,
- minimum duration of harmonic tracks minSineDur = 0.01,
- minimum f0, minfo = 120,
- maximum f0, maxf0 = 250,
- error threshold, f0et = 8,(to let more error in the TWM algorithm),
- maximum number of harmonics nH = 8, (to minimize maximum number of harmonics for more transformation),
- maximum frequency deviation in harmonic tracks harmdevSlope = 0.2, (big enough to to allow higher harmonics to deviate more than lower harmonics.),
- stochastic decimation/approximation factor stocf = 0.1, (small enough for higher decimation, more compact representation. )
Changing pitches of female to male by scaling the frequency down, reverse the time scaling, to create totally different speech (encrypted / alien sound).
HPS transformation's parameters:
[0, 0.5, 1, 0.5]
[0, 0.99, 1, 0.99]
[0, 0, 0.1, 1, 0.2, 0.9, 0.3, 0.8, 0.4, 0.7, 0.5, 0.6, 0.6, 0.5, 0.7, 0.4, 0.8, 0.3, 0.9, 0.2, 1, 0.1]
Output WAVE
Alien speech
(2) Single-pass.
Source is ", piano phrase" with duration of 3.8458 sec.
HPS analysis parameters are,
- window type = blackman,
- window size M = 1101,
- FFT size N = 4096,
- threshold t = -80,
- minimum duration of harmonic tracks minSineDur = 0.01,
- minimum f0, minfo = 100,
- maximum f0, maxf0 = 300,
- error threshold, f0et = 5,(to let more error in the TWM algorithm),
- maximum number of harmonics nH = 100,
- maximum frequency deviation in harmonic tracks harmdevSlope = 0.01,
- stochastic decimation/approximation factor stocf = 0.2,
HPS transformation's parameters:
[0, 1, 1, 1]
timeScaling[0, 0, 1, 1]
Output WAVE
Swinging piano
(3) Single-pass.
Source is " violin" with duration of 5.0 sec.
HPS analysis parameters are,
- window type = blackman,
- window size M = 661,
- FFT size N = 2048,
- threshold t = -90,
- minimum duration of harmonic tracks minSineDur = 0.01,
- minimum f0, minfo = 400,
- maximum f0, maxf0 = 1000,
- error threshold, f0et = 5,
- maximum number of harmonics nH = 40,
- maximum frequency deviation in harmonic tracks harmdevSlope = 0.01,
- stochastic decimation/approximation factor stocf = 0.1,
HPS transformation's parameters:
[0, 1, 1, 1]
[0, 0, 1, 1]
Output WAVE
Violin to mosquito
(4) Single-pass.
Source is "speech-female.wav" with duration of 3.2836 sec.
HPS analysis parameters are,
- window type = blackman,
- window size M = 1101,
- FFT size N = 4096,
- threshold t = -90,
- minimum duration of harmonic tracks minSineDur = 0.01,
- minimum f0, minfo = 120,
- maximum f0, maxf0 = 250,
- error threshold, f0et = 8,(to let more error in the TWM algorithm),
- maximum number of harmonics nH = 8, (to minimize maximum number of harmonics for more transformation),
- maximum frequency deviation in harmonic tracks harmdevSlope = 0.2, (big enough to to allow higher harmonics to deviate more than lower harmonics.),
- stochastic decimation/approximation factor stocf = 0.1, (small enough for higher decimation, more compact representation. )
To change pitch of every word, up and down intermittently. To change time duration of every word up and down, splitted into words: "this" at 0.72s, "is" at 1.05s, "the" at 1.22s, "v" at 1.85s, "of" at 2.25s, "vendetta"at 3.2836s.
HPS transformation's parameters:
[0, 0.5, 0.72, 2, 1.05, 0.8, 1.22, 1.4, 1.85, 0.6, 2.25, 1.7, 3.2836, 0.4]
[0, 1, 0.5, 1.1, 1, 0.98]
[0, 0, 0.72, 0.4, 1.05, 1.3, 1.22, 1.4, 1.85, 1.6, 2.25, 1.8, 3.2836, 3.2836]
Output WAVE
Unavailable on the cloud.
(5) Multiple sounds, multiple passes, source are all natural sounds,
bass drum = "90150__menegass__bd05.wav"
snare drum 1 = "82238__kevoy__snare-drum.wav"
snare drum 2 = "2103__opm__sn-set4.wav"
cowbell = "22759__franciscopadilla__56-cowbell.wav"
hi-hat 1 = "67210__akosombo__xbhhopen.wav"
hi-hat 2 = "100054__menegass__gui-drum-ch.wav"
To do multiple-passes transformation of a violin sound with certain frequency into a piece of song, by using sms-tools HPS transformation functions. Adding some other transformed percussion sounds to the resulted song.
Phase one: Frequency-scaling is determined by a table of MIDI note numbers in D Dorian scale. Frequency stretching for each beat is randomized. Frequency-scaling of the percussions transformation is randomized for each beat.
Phase two of transformation is time-scaling applied to the composed song, by a table of time periods.
- Provide a python-list variable, Notes = a list of MIDI note number, in D Dorian scale.
- Provide a python-list variable, Freqs = convert Notes to its frequency values.
- I do not use transformation GUI, instead call functions directly from, “sms-tools/software/transformations_interface/”
- To analyze (HPS), call analysis() function.
- To HPS-transform and synthesis, call transformation_synthesis() function.
- Input file is violin sound with 901.546 Hz of main frequency. Note frequency is tracked by using “Aubio Note Tracker” in Sonic Visualiser.
- Analysis parameters are, window = "blackman" , M = 601 , N = 2048, t = -90 , minSineDur = 0.01 , nH = 100 , minf0 = 400 , maxf0 = 1200 , f0et = 5 , harmDevSlope = 0.01 , stocf = 0.2
- Call “hpsTransformations_function.analysis()” as follow,
import hpsTransformations_function as HTF
inputFile, fs, hfreq, hmag, mYst = HTF.analysis(inputFile,
window, M, N, t, minSineDur,
nH, minf0, maxf0, f0et,
harmDevSlope, stocf)
Note that, hpsTransformations_function has been edited to remove all plotting commands, also to return output signal “y” in the transformation_synthesis() function.
- Establish a for loop to all Freqs, in each iteration do frequency-scaling transformation according to Freqs[] frequency value.
- Frequency scaling parameter freqScaling is set to [0, scale, 1, scale], in which “scale” is scaling value to mimic the MIDI note frequency.
- In the same time, frequency stretching parameter freqStretching is set with random values, [0, random.uniform(0.8,0.9), 1, random.uniform(0.9,1.0)], to produce different timbre for each beat.
- TimbrePreservation parameter is set to 0, to let timbre changes.
- Do phase-1 transformation and synthesis to the input sound, by calling function,
# transformation_synthesis() is modified to return y
y = HTF.transformation_synthesis(inputFile, fs, hfreq, hmag, mYst,
freqScaling, freqStretching,
timbrePreservation, timeScaling)
- Append “y” in each iteration to a “song” variable.
- Until this step, variable “song” contains piece of song with melody determined by Freqs’ frequencies.
- Next step is to add several percussion instrument sounds to the “song”.
- Do HPS-analysis first to each percussion sounds, bass-drum, snare-drum1, snare-drum2, cowbell, hi-hat1, hi-hat2.
- They have low frequencies, single-note and natural, due to the same characteristics, we can use the same analysis parameters to all of them.
- HPS-analysis parameters are, window="blackman", M=801, N=2048, t=-100, minSineDur=0.005, nH=80, minf0=10, maxf0=400, f0et=7, harmDevSlope=0.01, stocf=0.1
- Call “hpsTransformations_function.analysis()” to each percussion sounds, and store the returned values to respective variables.
- Establish a for loop, in each iteration get a “beat” counter for each 8 iterations, so “beat” variable has [0..7] of integer value. This can be done by using python % modulo operator.
- When “beat” has value of 0 and 4, add transformed bass-drum sound. Bass-drum analysis values are used to call “hpsTransformations_function.transformation_synthesis()” function. Frequency scaling parameter is set to a random value, by calling random.uniform() function. So, for each iteration bass-drum has different frequency.
- When “beat” has value of 2 and 6, add transformed snare-drum1 sound. Do the same random transformation as bass-drum.
- For every “beat” (every iteration), add transformed snare-drum2 sound. Do the same as random transformation bass-drum.
- When “beat” has value of 3 and 7, add transformed cowbell sound. Do the same as random transformation bass-drum.
- When “beat” has the same value as random integer [0..7], add transformed hi-hat1 sound. Do the same random transformation as bass-drum.
- When “beat” has value of 5, add transformed hi-hat2 sound. Do the same as random transformation bass-drum.
- Until this step, “song” variable will have a complete piece of the song melody according to the MIDI note numbers specified, with added percussion sounds. Output sound of this phase-1 transformation is saved into a WAVE file, by using “models.utilFunctions.wavwrite()” function.
- In the phase-2 analysis, the output sound of phase-1 is analyzed by using the same parameters as input sound. Call “hpsTransformations_function.analysis()” to the phase-1 output song.
- Do time-scaling transformation, with timeScale parameter is set to a list of values, such that for each beat of the song, the period is scaled by certain factor. Generate timeScale by using a loop over all beats in the “phase-1 song”. First pair is of course = 0,0.
- Call “hpsTransformations_function.transformation_synthesis()” to the analysis values.
- Output is the final phase-2 song. Write this song to WAVE file by using “models.utilFunctions.wavwrite()” function.
Output sound, (size = 3.876 MB)
Singing frogs