Posts
Comments
Comment by
Gabriel Adriano de Melo (gabriel-adriano-de-melo) on
A short introduction to machine learning ·
2024-01-19T10:05:46.414Z ·
LW ·
GW
The text-to-image from Dall-E was based on another model called CLIP, which had learned to caption images (generate image-to-text). This captioning could be thought as supervised learning, but the caveat is that they weren't labeled by humans (in the ML sense) but extracted from web data. This is just a part of the Dall-E model, another one is the diffusion process that is based on recovering an image from noise, which is un-supervised as we can just add noise to images and ask it to recover the original image.