Toronto AI safety meetup: Latent Knowledge and Contrast-Consistent Search

post by Giles · 2023-04-29T23:24:03.191Z · ? · GW · 0 comments

Contents

No comments

We'll do a presentation and discussion based on the following paper:

https://arxiv.org/abs/2212.03827

Language models sometimes emit false information. There can be many reasons for this, including:

For each of these it may be the case that the model is actively being trained to emit falsehoods, while still having the correct knowledge internally. Might it be possible to extract this knowledge from a model's hidden states?

The paper introduces a technique, Contrast-Consistent Search, that begins to address this challenge. In the meetup we'll try to wrap our heads around what's going on, as well as having a broader discussion around falsehood and deception in large language models.

0 comments

Comments sorted by top scores.