搜索

Meta is using AI to generate videos from just a few words

查看: 86.4k|回复: 0
  发表于 Sep 30, 2022 10:44:30 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式

Artificial intelligence is getting better and better at generating an image in response to a handful of words, with publicly available AI image generators such as DALL-E 2 and Stable Diffusion. Now, Meta researchers are taking AI a step further: they're using it to concoct videos from a text prompt.

Meta CEO Mark Zuckerberg posted on Facebook on Thursday about the research, called Make-A-Video, with a 20-second clip that compiled several text prompts that Meta researchers used and the resulting (very short) videos. The prompts include “A teddy bear painting a self portrait,” “A spaceship landing on Mars,” “A baby sloth with a knitted hat trying to figure out a laptop,” and “A robot surfing a wave in the ocean.”

The videos for each prompt are just a few seconds long, and they generally show what the prompt suggests (with the exception of the baby sloth, which doesn't look much like the actual creature), in a fairly low-resolution and somewhat jerky style. Even so, it demonstrates a fresh direction AI research is taking as systems become increasingly good at generating images from words. If the technology is eventually released widely, though, it will raise many of the same concerns sparked by text-to-image systems, such as that it could be used to spread misinformation via video.

A web page for Make-A-Video includes these short clips and others, some of which look fairly realistic, such as a video created in response to the prompt “Clown fish swimming through the coral reef” or one meant to show “A young couple walking in a heavy rain.”

In his Facebook post, Zuckerberg pointed out how tricky it is to generate a moving image from a handful of words.

“It's much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they'll change over time,” he wrote.

A research paper describing the work explains that the project uses a text-to-image AI model to figure out how words correspond with pictures, and an AI technique known as unsupervised learning in which algorithms pore over data that isn't labeled to discern patterns within it to look at videos and determine what realistic motion looks like.

As with massive, popular AI systems that generate images from text, the researchers pointed out that their text-to-image AI model was trained on internet data which means it learned “and likely exaggerated social biases, including harmful ones,” the researches wrote. They did note that they filtered data for “NSFW content and toxic words,” but as datasets can include many millions of images and text, it may not be possible to remove all such content.

Zuckerberg wrote that Meta plans to share the Make-A-Video project as a demo in the future.

您需要登录后才可以回帖 登录 | 注册

本版积分规则

秀哈英语

Copyright © 2024 秀哈英语版权所有

https://www.showha.cn/ ( 皖ICP备2022008997号 )

关于我们
关于我们
秀哈文化
使用指南
招聘信息
小黑屋
政策说明
法律声明
隐私保护
信息发布规则
关注秀哈微信公众号
手机访问秀哈英语,更方便!
快速回复 返回列表 返回顶部