Posts

Comments

Comment by EA (eran-alouf) on Using GPT-Eliezer against ChatGPT Jailbreaking · 2022-12-10T13:39:12.700Z · LW · GW

This might work a bit better:


e.g., the following confused the previous version (which didn't allow the benign answer):


but

Comment by EA (eran-alouf) on Using GPT-Eliezer against ChatGPT Jailbreaking · 2022-12-09T15:40:16.310Z · LW · GW

Asking a separate session to review the answer seems to work nicely, at least in some cases:

Image

but:
Image