AI powered dialogs research

very important to have extra word at the beginning (ideally numbers, as it helps order the audio tracks)
turn xVASynth sample rate to 44100 (check ffmpeg first)
in 2.3.0, male V sounds metallic out of the box, that's just like that: applying a gate helps somewhat but is not perfect.
recording one's voice and importing in xVASynth can help with the phonetic for a better pronunciation.

it might be worth having a look at ElevenLabs (TBC: train models ?).

it might be worth trying other vocoders too.

credits: thanks to bespokecomp on Github for helping out