Make-A-Video by Meta AI
post by P. · 2022-09-29T17:07:15.664Z · LW · GW · 4 commentsThis is a link post for https://makeavideo.studio/
Contents
4 comments
Meta AI (Facebook) created a text-to-video model by taking a diffusion text-to-image model, adding temporal convolutional and attention layers, and fine-tuning it with video data (without text). They also use spatial and temporal super-resolution networks. Showing, to the surprise of no one who was paying attention, that our existing mostly homogeneous architectures can be easily extended to understand, to some extent, the structure of everyday reality. It's not the first text-to-video model, but it's much better than what came before.
4 comments
Comments sorted by top scores.
comment by P. · 2022-09-29T17:52:09.924Z · LW(p) · GW(p)
Emad from Stability AI (the people behind Stable Diffusion) says that they will make a model better than this.
comment by P. · 2022-09-29T19:03:54.005Z · LW(p) · GW(p)
And here we have another one: https://phenaki.video/
Replies from: P.↑ comment by P. · 2022-09-29T19:39:54.526Z · LW(p) · GW(p)
And a 3D one by optimizing a differentiable volumetric representation using 2D diffusion: https://dreamfusionpaper.github.io/