Google's Imagen uses larger text encoder
post by Ben Livengood (ben-livengood) · 2022-05-24T21:55:52.221Z · LW · GW · 2 commentsContents
2 comments
https://imagen.research.google/
Scaling the text encoder gives Imagen the ability to spell, count, and assign colors and properties to distinct objects in the image that DALL-E2 was not so great at. It looks visually about as photorealistic as DALL-E2 from the small set of sample images. Eyes are still weird.
2 comments
Comments sorted by top scores.
comment by Kayden (kunvar-thaman) · 2022-05-25T05:00:49.866Z · LW(p) · GW(p)
From what I've seen so far, Imagen is more "straightforward" and does a better job generating an image describing the text than DALE-2. But DALE-2 seems to be producing prettier images (which makes sense given it was fine-tuned for aesthetics),
There's a Github repo up already, so I hope we'll be able to try an Open source version and actually test on the same prompts as DALE-2.
Replies from: logan-zoellner↑ comment by Logan Zoellner (logan-zoellner) · 2022-05-25T15:27:06.288Z · LW(p) · GW(p)
It'll be interesting to see Imagen fine-tuned on laion aesthetic