Thane Ruthenis's Shortform
post by Thane Ruthenis · 2024-09-13T20:52:23.396Z · LW · GW · 4 commentsContents
4 comments
4 comments
Comments sorted by top scores.
comment by Thane Ruthenis · 2024-09-13T20:52:24.130Z · LW(p) · GW(p)
On the topic of o1's recent release: wasn't Claude Sonnet 3.5 (the subscription version at least, maybe not the API version) already using hidden CoT? That's the impression I got from it, at least.
The responses don't seem to be produced in constant time. It sometimes literally displays a "thinking deeply" message which accompanies a unusually delayed response. Other times, the following pattern would play out:
- I pose it some analysis problem, with a yes/no answer.
- It instantly produces a generic response like "let's evaluate your arguments".
- There's a 1-2 second delay.
- Then it continues, producing a response that starts with "yes" or "no", then outlines the reasoning justifying that yes/no.
That last point is particularly suspicious. As we all know, the power of "let's think step by step" is that LLMs don't commit to their knee-jerk instinctive responses, instead properly thinking through the problem using additional inference compute. Claude Sonnet 3.5 is the previous out-of-the-box SoTA model, competently designed and fine-tuned. So it'd be strange if it were trained to sabotage its own CoTs by "writing down the bottom line first" like this, instead of being taught not to commit to a yes/no before doing the reasoning.
On the other hand, from a user-experience perspective, the LLM immediately giving a yes/no answer followed by the reasoning is certainly more convenient.
From that, plus the minor-but-notable delay, I'd been assuming that it's using some sort of hidden CoT/scratchpad, then summarizes its thoughts from it.
I haven't seen people mention that, though. Is that not the case?
(I suppose it's possible that these delays are on the server side, my requests getting queued up...)
(I'd also maybe noticed a capability gap between the subscription and the API versions of Sonnet 3.5, though I didn't really investigate it and it may be due to the prompt.)
Replies from: habryka4, quetzal_rainbow↑ comment by habryka (habryka4) · 2024-09-13T20:59:22.446Z · LW(p) · GW(p)
My model was that Claude Sonnet has tool access and sometimes does some tool-usage behind the scenes (which later gets revealed to the user), but that it wasn't having a whole CoT behind the scenes, but I might be wrong.
↑ comment by quetzal_rainbow · 2024-09-13T23:42:48.972Z · LW(p) · GW(p)
I think you heard about this thread (I didn't try to replicate it myself).
Replies from: Thane Ruthenis↑ comment by Thane Ruthenis · 2024-09-14T00:13:16.778Z · LW(p) · GW(p)
Thanks, that seems relevant! Relatedly, the system prompt indeed explicitly instructs it to use "<antThinking>" tags when creating artefacts. It'd make sense if it's also using these tags to hide parts of its CoT.