Microsoft can synthesize your voice with just a 3 second clip
Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person’s voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker’s emotional tone and background environmental noise balance.
The scientists also note that since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.
You can find audio samples and the paper here.
It sure would make breaking into Werner Brandes office a lot easier (1992 movie Sneakers) than convincing your friend to record snippets of a really terrible date.