Posts

Comments

Comment by Gabriel Adriano de Melo (gabriel-adriano-de-melo) on A short introduction to machine learning · 2024-01-19T10:05:46.414Z · LW · GW

The text-to-image from Dall-E was based on another model called CLIP, which had learned to caption images (generate image-to-text). This captioning could be thought as supervised learning, but the caveat is that they weren't labeled by humans (in the ML sense) but extracted from web data. This is just a part of the Dall-E model, another one is the diffusion process that is based on recovering an image from noise, which is un-supervised as we can just add noise to images and ask it to recover the original image.