Getting closer to synthetic people

Microsoft has released a fascinating new framework for generating lifelike talking faces called VASA-1.

Given a single static image and a speech audio clip, VASA-1 is capable of producing lip movements that are synchronized with the audio and capture a large spectrum of facial nuances and natural head motions.

See more here, read the paper here and here.

Getting worried you’ll be replaced by AI yet? If this gets perfected (it’s not perfect yet, but the results get better and better each year), then you can pretty much get rid of any ‘talking head’ jobs.

This could also be used to fool people on conference calls where video quality would totally render any minor glitches as unnoticeable or easily ignored as just streaming artifacts.

Just slap the CEO’s face into this, set up a conference call with finance via some very easy phishing, and approve that $1m transfer to your Swiss bank account.

