Skip to main content

ANY-TO-ANY VOICE CONVERSION WITH F0 AND TIMBRE DISENTANGLEMENT AND NOVEL TIMBRE CONDITIONING

Sudheer Kumar Kovela (Nvidia); Rafael Valle (NVIDIA); Ambrish Dantrey (Nvidia); Bryan Catanzaro (NVIDIA)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
08 Jun 2023

Despite recent advances in voice conversion (VC), it is still challenging to do real-time one-shot voice conversion with good control over timbre and $F_0$. In this work, we present a PPG-based VC model that directly decodes waveforms. We designed a speaker conditioned decoder based on HiFi-GAN\cite{kong2020hifi}, along with a new discriminator that produces high quality audio. Using an $F_0$ prenet and $F_0$ augmented speaker encoder, we are able to control $F_0$ and timbre independently with high fidelity. Our objective and subjective evaluations show that our method is preferred over others in terms of audio quality, timbre similarity and prosody retention.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00