Jessica Rumbelow's Shortform

post by Jessica Rumbelow (jessica-cooper) · 2024-08-09T16:27:52.247Z · LW · GW · 1 comments


1 comment


Comments sorted by top scores.

comment by Jessica Rumbelow (jessica-cooper) · 2024-08-09T16:27:53.041Z · LW(p) · GW(p)

Attribution can identify when system prompts are affecting behaviour. 
Note the diminished overall attribution when a hidden system prompt is responsible for the output (or is something else going on?).  Post on method here [LW · GW].