Posts
Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM
2024-08-28T08:41:38.967Z
Deception and Jailbreak Sequence: 1. Iterative Refinement Stages of Deception in LLMs
2024-08-22T07:32:07.600Z
Comments
Comment by
Winnie Yang (winnie-yang) on
Coup probes: Catching catastrophes with probes trained off-policy ·
2024-11-23T18:10:13.745Z ·
LW ·
GW
to do a do a
There seem to by a typo here :)
Comment by
Winnie Yang (winnie-yang) on
Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM ·
2024-08-28T16:44:56.129Z ·
LW ·
GW
Thank you so much for your interest and suggestion! Sorry this is a really rough draft... I didn't have time to polish it yet. This is a good point! I might try make use of Claude's help tonight!
Comment by
Winnie Yang (winnie-yang) on
Normalizing Sparse Autoencoders ·
2024-06-02T23:55:27.591Z ·
LW ·
GW
Hi Hengyu! Really nice work here! I am wondering if you have released the pre-trained SAE for llama-2?