AI powered dialogs research
-
very important to have extra word at the beginning (ideally numbers, as it helps order the audio tracks)
-
turn xVASynth sample rate to 44100 (check ffmpeg first)
-
in 2.3.0, male V sounds metallic out of the box, that's just like that: applying a gate helps somewhat but is not perfect.
-
recording one's voice and importing in xVASynth can help with the phonetic for a better pronunciation.
it might be worth having a look at ElevenLabs (TBC: train models ?).
it might be worth trying other vocoders too.
credits: thanks to bespokecomp on Github for helping out