shawnghu's Shortform

shawnghu

shawnghu's Shortform

post by shawnghu · 2025-02-23T05:17:56.080Z · LW · GW · 3 comments

3 comments

3 comments

Comments sorted by top scores.

comment by shawnghu · 2025-02-23T05:17:56.076Z · LW(p) · GW(p)

Is anyone else noticing that Claude (Sonnet 3.5 new, the default on claude.ai) is a lot worse at reasoning recently? In the past five days or so its rate of completely elementary reasoning mistakes, which persist despite repeated clarification in different ways, seems to have skyrocketed for me.

Replies from: cubefox

↑ comment by cubefox · 2025-02-23T14:38:04.286Z · LW(p) · GW(p)

Maybe they are preparing for switching from merely encouraging their main model to do CoT (old technique) to a full RL-based reasoning model. I recently saw this, before the GUI aborted and said the model was over capacity:

Then it wouldn't make sense anymore to have the non-reasoning model attempt to do CoT.

Replies from: weibac

↑ comment by Milan W (weibac) · 2025-02-23T23:07:25.905Z · LW(p) · GW(p)

I have also seen this.

shawnghu's Shortform

Contents

3 comments