Microsoft can synthesize your voice with just a 3 second clip

January 19, 2023 matt Comments 0 Comment

Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person’s voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker’s emotional tone and background environmental noise balance.

The scientists also note that since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.

You can find audio samples and the paper here.

It sure would make breaking into Werner Brandes office a lot easier (1992 movie Sneakers) than convincing your friend to record snippets of a really terrible date.

Matt's Homepage

Microsoft can synthesize your voice with just a 3 second clip

January 19, 2023 matt Comments 0 Comment

Related

Leave a Reply Cancel reply

Share this:

Related

Leave a Reply Cancel reply