Posts
Comments
Comment by
EA (eran-alouf) on
Using GPT-Eliezer against ChatGPT Jailbreaking ·
2022-12-10T13:39:12.700Z ·
LW ·
GW
This might work a bit better:
e.g., the following confused the previous version (which didn't allow the benign answer):
but
Comment by
EA (eran-alouf) on
Using GPT-Eliezer against ChatGPT Jailbreaking ·
2022-12-09T15:40:16.310Z ·
LW ·
GW
Asking a separate session to review the answer seems to work nicely, at least in some cases:
but: