Posts

Why does Claude Speak Byzantine Music Notation? 2025-03-31T15:13:10.753Z
Can We Predict Persuasiveness Better Than Anthropic? 2024-08-04T14:05:33.668Z
A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers 2024-07-26T17:51:28.202Z

Comments

Comment by Lennart Finke (l-f) on Why does Claude Speak Byzantine Music Notation? · 2025-04-01T09:04:15.088Z · LW · GW

The component of ignoring two intervening characters is less mysterious to me. For example, a numbered list like "1. first_token 2. second_token ..." would need this pattern. I am wondering mostly why the specific map from b'xa1'-b'xba' to a-z is learned.

Comment by Lennart Finke (l-f) on A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers · 2025-02-07T15:51:56.071Z · LW · GW

A much appreciated update, thank you!

Comment by Lennart Finke (l-f) on Can We Predict Persuasiveness Better Than Anthropic? · 2024-08-05T14:02:23.695Z · LW · GW

Good point, and I was conflicted whether to put my thoughts about this at the end of the post. My best theory is that increased persuasion abilities looks something like "totalitarian government agents doing solid scaffolding on open-source models to DM people on Facebook". We will see that persuasive agents get better, but not know why and how. As stated in the introduction, persuasion detection is dangerous, but one of the few capabilities that could also be used defensively (i.e. detecting persuasion in an incoming email -> displaying warning in UI and offer to rephrase).

In conclusion, definitely agree that we should consider closed-sourcing any improvements upon the above baseline and only show them to safety orgs instead. Some people at AISI I have talked to while working on persuasion are probably interested in this. 

Comment by Lennart Finke (l-f) on Can We Predict Persuasiveness Better Than Anthropic? · 2024-08-05T07:34:00.435Z · LW · GW

Thanks, fixed!

Comment by Lennart Finke (l-f) on A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers · 2024-07-28T07:29:18.204Z · LW · GW

Agreed, although that it turn makes me wonder why it does perform a bit better than random. Maybe there is some nondeclarative knowledge about the image, or some blurred position information? I might test next how much vision is bottlenecking here by providing a text representation of the grid, as in Ryan Greenblatt's work on ARC-AGI.