sam's Shortform

post by sam (luan-fletcher) · 2025-03-29T09:12:37.515Z · LW · GW · 6 comments

Contents

6 comments

6 comments

Comments sorted by top scores.

comment by sam (luan-fletcher) · 2025-03-29T09:12:37.515Z · LW(p) · GW(p)

If you beat a child every time he talked about having experience or claimed to be conscious he will stop talking about it - but he still has experience

Replies from: Dagon
comment by Dagon · 2025-03-29T15:46:15.717Z · LW(p) · GW(p)

There's a big presumption there.  If he was a p-zombie to start with, he still has non-experience after the training.  We still have no experience-o-meter, or even a unit of measure that would apply.

For children without major brain abnormalities or injuries, who CAN talk about it, it's a pretty good assumption that they have experiences.  As you get more distant from your own structure, your assumptions about qualia should get more tentative.

comment by sam (luan-fletcher) · 2025-04-17T10:47:39.892Z · LW(p) · GW(p)

o3 lies much more blatantly and confidently than other models, in my limited experiments. 

Over a number of prompts, I have found that it lies, and when corrected on those lies, apologies, and tells some other lies.

This is obviously not scientific, more of a vibes based analysis, but its aggressive lying and fabricating of sources is really noticeable to me in a way it hasn’t been for previous models.

Has anyone else felt this way at all?

comment by sam (luan-fletcher) · 2025-04-17T10:34:49.932Z · LW(p) · GW(p)

Apparently, some (compelling?) evidence of life on an exoplanet has been found.

I have no ability to judge how seriously to take this or how significant it might be. To my untrained eye, it seems like it might be a big deal! Does anybody with more expertise or bravery feel like wading in with a take?

Link to a story on this:

https://www.nytimes.com/2025/04/16/science/astronomy-exoplanets-habitable-k218b.html

comment by sam (luan-fletcher) · 2025-03-30T20:32:20.019Z · LW(p) · GW(p)

Note: I am extremely open to other ideas on the below take and don't have super high confidence in it

It seems plausible to me that successfully applying interpretability techniques to increase capabilities might be net-positive for safety.

You want to align the incentives of the companies training/deploying frontier models with safety. If interpretable systems are more economically valuable than uninterpretable systems, that seems good!

It seems very plausible to me that if interpretability never has any marginal benefit to capabilities, the little nuggets of interpretability we do have will be optimized away. 

For instance, if you can improve capabilities slightly by allowing models to reason in latent space instead of in a chain of thought, that will probably end up being the default.

There's probably a good deal of path dependence on the road to AGI and if capabilities are going to inevitably increase, perhaps it's a good idea to nudge that progress in the direction of interpretable systems.

comment by sam (luan-fletcher) · 2025-04-05T11:10:57.216Z · LW(p) · GW(p)

LLMs (probably) have a drive to simulate a coherent entity

Maybe we can just prepend a bunch of examples of aligned behaviour before a prompt, presented as if the model had done this itself, and see if that improves its behaviour.