Posts

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 2025-02-13T19:21:35.392Z
Book Review: Consciousness Explained (as the Great Catalyst) 2023-09-17T15:30:33.295Z
Why it's so hard to talk about Consciousness 2023-07-02T15:56:05.188Z
A chess game against GPT-4 2023-03-16T14:05:17.559Z
Understanding Gödel's Incompleteness Theorem 2022-04-06T19:31:19.711Z
The case for Doing Something Else (if Alignment is doomed) 2022-04-05T17:52:21.459Z
Not-Useless Advice For Dealing With Things You Don't Want to Do 2022-04-04T16:37:05.298Z
How to think about and deal with OpenAI 2021-10-09T13:10:56.091Z
Insights from "All of Statistics": Statistical Inference 2021-04-08T17:49:16.270Z
Insights from "All of Statistics": Probability 2021-04-08T17:48:10.972Z
FC final: Can Factored Cognition schemes scale? 2021-01-24T22:18:55.892Z
Three types of Evidence 2021-01-19T17:25:20.605Z
Book Review: On Intelligence by Jeff Hawkins (and Sandra Blakeslee) 2020-12-29T19:48:04.435Z
Intuition 2020-12-20T21:49:29.947Z
Clarifying Factored Cognition 2020-12-13T20:02:38.100Z
Traversing a Cognition Space 2020-12-07T18:32:21.070Z
Idealized Factored Cognition 2020-11-30T18:49:47.034Z
Preface to the Sequence on Factored Cognition 2020-11-30T18:49:26.171Z
Hiding Complexity 2020-11-20T16:35:25.498Z
A guide to Iterated Amplification & Debate 2020-11-15T17:14:55.175Z
Information Charts 2020-11-13T16:12:27.969Z
Do you vote based on what you think total karma should be? 2020-08-24T13:37:52.987Z
Existential Risk is a single category 2020-08-09T17:47:08.452Z
Inner Alignment: Explain like I'm 12 Edition 2020-08-01T15:24:33.799Z
Rafael Harth's Shortform 2020-07-22T12:58:12.316Z
The "AI Dungeons" Dragon Model is heavily path dependent (testing GPT-3 on ethics) 2020-07-21T12:14:32.824Z
UML IV: Linear Predictors 2020-07-08T19:06:05.269Z
How to evaluate (50%) predictions 2020-04-10T17:12:02.867Z
UML final 2020-03-08T20:43:58.897Z
UML XIII: Online Learning and Clustering 2020-03-01T18:32:03.584Z
What to make of Aubrey de Grey's prediction? 2020-02-28T19:25:18.027Z
UML XII: Dimensionality Reduction 2020-02-23T19:44:23.956Z
UML XI: Nearest Neighbor Schemes 2020-02-16T20:30:14.112Z
A Simple Introduction to Neural Networks 2020-02-09T22:02:38.940Z
UML IX: Kernels and Boosting 2020-02-02T21:51:25.114Z
UML VIII: Linear Predictors (2) 2020-01-26T20:09:28.305Z
UML VII: Meta-Learning 2020-01-19T18:23:09.689Z
UML VI: Stochastic Gradient Descent 2020-01-12T21:59:25.606Z
UML V: Convex Learning Problems 2020-01-05T19:47:44.265Z
Excitement vs childishness 2020-01-03T13:47:44.964Z
Understanding Machine Learning (III) 2019-12-25T18:55:55.715Z
Understanding Machine Learning (II) 2019-12-22T18:28:07.158Z
Understanding Machine Learning (I) 2019-12-20T18:22:53.505Z
Insights from the randomness/ignorance model are genuine 2019-11-13T16:18:55.544Z
The randomness/ignorance model solves many anthropic problems 2019-11-11T17:02:33.496Z
Reference Classes for Randomness 2019-11-09T14:41:04.157Z
Randomness vs. Ignorance 2019-11-07T18:51:55.706Z
We tend to forget complicated things 2019-10-20T20:05:28.325Z
Insights from Linear Algebra Done Right 2019-07-13T18:24:50.753Z
Insights from Munkres' Topology 2019-03-17T16:52:46.256Z

Comments

Comment by Rafael Harth (sil-ver) on The Unearned Privilege We Rarely Discuss: Cognitive Capability · 2025-02-18T21:18:30.038Z · LW · GW

I'm just not sure the central claim, that rationalists underestimate the role of luck in intelligence, is true. I've never gotten that impression. At least my assumption going into reading this was already that intelligence was probably 80-90% unearned.

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-17T14:44:43.263Z · LW · GW

Humans must have gotten this ability from somewhere and it's unlikely the brain has tons of specialized architecture for it.

This is probably a crux; I think the brain does have tons of specialized architecture for it, and if I didn't believe that, I probably wouldn't think thought assessment was as difficult.

The thought generator seems more impressive/fancy/magic-like to me.

Notably people's intuitions about what is impressive/difficult tend to be inversely correlated with reality. The stereotype is (or at least used to be) that AI will be good at rationality and reasoning but struggle with creativity, humor, and intuition. This stereotype contains information since inverting it makes better-than-chance predictions about what AI has been good at so far, especially LLMs.

I think this is not a coincidence but roughly because people use "degree of conscious access" an inverse proxy for intuitive difficulty. The more unconscious something is, the more it feels like we don't know how it works, the more difficult it intuitively seems. But I suspect degree of conscious access positively correlates with difficulty.

If sequential reasoning is mostly a single trick, things should get pretty fast now. We'll see soon? :S

Yes; I think the "single trick" view might be mostly confirmed or falsified in as little as 2-3 years. (If I introspect I'm pretty confident that I'm not wrong here, the scenario that frightens me is more that sequential reasoning improves non-exponentially but quickly, which I think could still mean doom, even if it takes 15 years. Those feel like short timelines to me.)

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-16T22:21:02.578Z · LW · GW

Whether or not every interpretation needs a way to connect measurements to conscious experiences, or whether they need extra machinery?

If we're being extremely pedantic, then then KC is about predicting conscious experience (or sensory input data, if you're an illusionist; one can debate what the right data type is). But this only matters for discussing things like Boltzmann brains. As soon as you assume that there exists an external universe, you can forget about your personal experience just try to estimate the length of the program that runs the universe.

So practically speaking, it's the first one. I think what quantum physics does with observers falls out of the math and doesn't require any explicit treatment. I don't think Copenhagen gets penalized for this, either. The wave function collapse increases complexity because it's an additional rule that changes how the universe operates, not because it has anything to do with observers. (As I mentioned, I think the 'good' version of Copenhagen doesn't mention observers, anyway.)

If you insist on the point that interpretation relates to an observer, then I'd just say that "interpretation of quantum mechanics" is technically a misnomer. It should just be called "theory of quantum mechanics". Interpretations don't have KC; theories do. We're comparing different source codes for the universe.

steelmanning

I think this argument is analogous to giving white credit for this rook check, which is fact a good move that allows white to win the queen next move -- when in actual fact white just didn't see that the square was protected and blundered a rook. The existence of the queen-winning tactic increases the objective evaluation of the move, but once you know that white didn't see it, it should not increase your skill estimate of white. You should judge the move as if the tactic didn't exist.

Similarly, the existence of a way to salvage the argument might make the argument better in the abstract, but should not influence your assessment of DeepSeek's intelligence, provided we agree that DeepSeek didn't know it existed. In general, you should never give someone credit for areas of the chess/argument tree that they didn't search.

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-16T14:09:51.273Z · LW · GW

The reason we can expect Copenhagen-y interpretations to be simpler than other interpretations is because every other interpretation also needs a function to connect measurements to conscious experiences, but usually requires some extra machinery in addition to that.

I don't believe this is correct. But I separately think that it being correct would not make DeepSeek's answer any better. Because that's not what it said, at all. A bad argument does not improve because there exists a different argument that shares the same conclusion.

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-15T20:27:40.802Z · LW · GW

Here's my take; not a physicist.

So in general, what DeepSeek says here might align better with intuitive complexity, but the point of asking about Kolmogorov Complexity rather than just Occam's Razor is that we're specifically trying to look at formal description length and not intuitive complexity.

Many Worlds does not need extra complexity to explain the branching. The branching happens due to the part of the math that all theories agree on. (In fact, I think a more accurate statement is that the branching is a description of what the math does.)

Then there's the wavefunction collapse. So first of all, wavefunction collapse is an additional postulate not contianed in the remaining math, so it adds complexity. (... and the lack of the additional postulate does not add complexity, as DeepSeek claimed.) And then there's a separate issue with KC arguably being unable to model randomness at all. You could argue that this is a failure of the KC formalism and we need KC + randomness oracle to even answer the question. You could also be hardcore about it and argue that any nondeterministic theory is impossible to describe and therefore has KC . In either case, the issue of randomness is something you should probably bring up in response to the question.

And finally there's the observer role. Iiuc the less stupid versions of Copenhagen do not give a special role to an observer; there's a special role for something being causally entangled with the experiment's result, but it doesn't have to be an agent. This is also not really a separate principle from the wave function collapse I don't think, it's what triggers collapse. And then it doesn't make any sense to list as a strength of Copenhagen because if anything it increases description length.

There are variants of KC that penalize the amount of stuff that are created rather than just the the description length I believe, in which case MW would have very high KC. This is another thing DeepSeek could have brought up.

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-14T15:59:49.754Z · LW · GW

[...] I personally wouldn’t use the word ‘sequential’ for that—I prefer a more vertical metaphor like ‘things building upon other things’—but that’s a matter of taste I guess. Anyway, whatever we want to call it, humans can reliably do a great many steps, although that process unfolds over a long period of time.

…And not just smart humans. Just getting around in the world, using tools, etc., requires giant towers of concepts relying on other previously-learned concepts.

As a clarification for anyone wondering why I didn't use a framing more like this in the post, it's because I think these types of reasoning (horizontal and vertical/A and C) are related in an important way, even though I agree that C might be qualitatively harder than A (hence section §3.1). Or to put it differently, if one extreme position is "we can look entirely at A to extrapolate LLM performance into the future" and the other is "A and C are so different that progress on A is basically uninteresting", then my view is somewhere near the middle.

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-14T14:56:07.278Z · LW · GW

It's not clear to me that an human, using their brain and a go board for reasoning could beat AlphaZero even if you give them infinite time.

I agree but I dispute that this example is relevant. I don't think there is any step in between "start walking on two legs" to "build a spaceship" that requires as much strictly-type-A reasoning as beating AlphaZero at go or chess. This particular kind of capability class doesn't seem to me to be very relevant.

Also, to the extent that it is relevant, a smart human with infinite time could outperform AlphaGo by programming a better chess/go computer. Which may sound silly but I actually think it's a perfectly reasonable reply -- using narrow AI to assist in brute-force cognitive tasks is something humans are allowed to do. And it's something that LLMs are also allowed to do; if they reach superhuman performance on general reasoning, and part of how they do this is by writing python scripts for modular subproblems, then we wouldn't say that this doesn't count.

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-14T14:04:56.946Z · LW · GW

I do think the human brain uses two very different algorithms/architectures for thought generation and assessment. But this falls within the "things I'm not trying to justify in this post" category. I think if you reject the conclusion based on this, that's completely fair. (I acknowledged in the post that the central claim has a shaky foundation. I think the model should get some points because it does a good job retroactively predicting LLM performance -- like, why LLMs aren't already superhuman -- but probably not enough points to convince anyone.)

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-14T14:00:09.265Z · LW · GW

I don't think a doubling every 4 or 6 months is plausible. I don't think a doubling on any fixed time is plausible because I don't think overall progress will be exponential. I think you could have exponential progress on thought generation, but this won't yield exponential progress on performance. That's what I was trying to get at with this paragraph:

My hot take is that the graphics I opened the post with were basically correct in modeling thought generation. Perhaps you could argue that progress wasn't quite as fast as the most extreme versions predicted, but LLMs did go from subhuman to superhuman thought generation in a few years, so that's pretty fast. But intelligence isn't a singular capability; it's two capabilities a phenomenon better modeled as two capabilities, and increasing just one of them happens to have sub-linear returns on overall performance.

So far (as measured by the 7card puzzle, which It think is a fair data point) I think we went from 'no sequential reasoning whatsoever' to 'attempted sequential reasoning but basically failed' (Jun13 update) to now being able to do genuine sequential reasoning for the first time. And if you look at how DeepSeek does it, to me this looks like the kind of thing where I expect difficulty to grow exponentially with argument length. (Based on stuff like it constantly having to go back and double checking even when it got something right.)

What I'd expect from this is not a doubling every N months, but perhaps an ability to reliably do one more step every N months. I think this translates into more above-constant returns on the "horizon length" scale -- because I think humans need more than 2x time for 2x steps -- but not exponential returns.

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-14T13:15:33.424Z · LW · GW

This is true but I don't think it really matters for eventual performance. If someone thinks about a problem for a month, the number of times they went wrong on reasoning steps during the process barely influences the eventual output. Maybe they take a little longer. But essentially performance is relatively insensitive to errors if the error-correcting mechanism is reliable.

I think this is actually a reason why most benchmarks are misleading (humans make mistakes there, and they influence the rating).

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-14T10:50:31.817Z · LW · GW

If thought assessment is as hard as thought generation and you need a thought assessor to get AGI (two non-obvious conditionals), then how do you estimate the time to develop a thought assessor? From which point on do you start to measure the amount of time it took to come up with the transformer architecture?

The snappy answer would be "1956 because that's when AI started; it took 61 years to invent the transformer architecture that lead to thought generation, so the equivalent insight for thought assessment will take about 61 years". I don't think that's the correct answer, but neither is "2019 because that's when AI first kinda resembled AGI".

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-13T21:46:33.542Z · LW · GW

I generally think that [autonomous actions due to misalignment] and [human misuse] are distinct categories with pretty different properties. The part you quoted addresses the former (as does most of the post). I agree that there are scenarios where the second is feasible and the first isn't. I think you could sort of argue that this falls under AIs enhancing human intelligence.

Comment by Rafael Harth (sil-ver) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-02-13T21:43:47.503Z · LW · GW

So, I agree that there has been substantial progress in the past year, hence the post title. But I think if you naively extrapolate that rate of progress, you get around 15 years.

The problem with the three examples you've mentioned is again that they're all comparing human cognitive work across a short amount of time with AI performance. I think the relevant scale doesn't go from 5th grade performance over 8th grade performance to university-level performance or whatever, but from "what a smart human can do in 5 minutes" over "what a smart human can do in an hour" over "what a smart human can do in a day", and so on.

I don't know if there is an existing benchmark that measures anything like this. (I agree that more concrete examples would improve the post, fwiw.)

And then a separate problem is that math problems are in in the easiest category from §3.1 (as are essentially all benchmarks).

Comment by Rafael Harth (sil-ver) on Those of you with lots of meditation experience: How did it influence your understanding of philosophy of mind and topics such as qualia? · 2025-02-13T16:51:13.892Z · LW · GW

I don't the experience of no-self contradicts any of the above.

In general, I think you could probably make some factual statements about the nature of consciousness that's true and that you learn from attaining no-self, if you phrased it very carefully, but I don't think that's the point.

The way I'd phrase what happens would be mostly in terms of attachment. You don't feel as implicated by things that affect you anymore, you have less anxiety, that kind of thing. I think a really good analogy is just that regular consciousness starts to resemble consciousness during a flow state.

Comment by Rafael Harth (sil-ver) on How identical twin sisters feel about nieces vs their own daughters · 2025-02-09T22:25:53.874Z · LW · GW

I would have been shocked if twin sisters cared equally about nieces and kids. Genetic similarity is one factor, not the entire story.

Comment by Rafael Harth (sil-ver) on The Failed Strategy of Artificial Intelligence Doomers · 2025-02-02T11:55:56.693Z · LW · GW

I think this is true but also that "most people's reasons for believing X are vibes-based" is true for almost any X that is not trivially verifiable. And also that this way of forming beliefs works reasonably well in many cases. This doesn't contradict anything you're saying but feels worth adding, like I don't think AI timelines are an unusual topic in that regard.

Comment by Rafael Harth (sil-ver) on o3 · 2025-02-02T09:55:56.263Z · LW · GW

Tricky to answer actually.

I can say more about my model now. The way I'd put it now (h/t Steven Byrnes) is that there are three interesting classes of capabilities

  • A: sequential reasoning of any kind
  • B: sequential reasoning on topics where steps aren't easily verifiable
  • C: the type of thing Steven mentions here, like coming up with new abstractions/concepts to integrate into your vocabulary to better think about something

Among these, obviously B is a subset of A. And while it's not obvious, I think C is probably best viewed as a subset of B. Regardless, I think all three are required for what I'd call AGI. (This is also how I'd justify the claim that no current LLM is AGI.) Maybe C isn't strictly required, I could imagine a mind getting superhuman performance without it, but I think given how LLMs work otherwise, it's not happening.

Up until DeepSeek, I would have also said LLMs are terrible A. (This is probably a hot take, but I genuinely think it's true despite benchmark performances continuing to go up.) My tasks were designed to test A, with the hypothesis that LLMs will suck at A indefinitely. For a while, it seemed like people weren't even focusing on A, which is why I didn't want to talk a bout it. But this concern is no longer applicable; the new models are clearly focused on improving sequential reasoning. However, o1 was terrible at it (imo), almost no improvement form GPT-4 proper, so I actually found o1 reassuring.

This has now mostly been falsified with DeepSeek and o3. (I know the numbers don't really tell the story since it just went from 1 to 2, but like, including which stuff they solved and how they argue, DeepSeek was the where I went "oh shit they can actually do legit sequential reasoning now".) Now I'm expecting most of the other tasks to fall as well, so I won't do similar updates if it goes to 5/10 or 8/10. The hypothesis "A is an insurmountable obstacle" can only be falsified once.

That said, it still matters how fast they improve. How much it matters depends on whether you think better performance on A is progress toward B/C. I'm still not sure about this, I'm changing my views a lot right now. So idk. If they score 10/10 in the next year, my p(LLMs scale to AGI) will definitely go above 50%, probably if they do it in 3 years as well, but that's about the only thing I'm sure about.

Comment by Rafael Harth (sil-ver) on o3 · 2025-02-01T13:42:58.969Z · LW · GW

o3-mini-high gets 3/10; this is essentially the same as DeepSeek (there were two where DeepSeek came very close, this is one of them). I'm still slightly more impressed with DeepSeek despite the result, but it's very close.

Comment by Rafael Harth (sil-ver) on Those of you with lots of meditation experience: How did it influence your understanding of philosophy of mind and topics such as qualia? · 2025-01-29T16:23:01.216Z · LW · GW

Just chiming in to say that I'm also interested in the correlation between camps and meditation. Especially from people who claim to have experienced the jhanas.

Comment by Rafael Harth (sil-ver) on Disproving the "People-Pleasing" Hypothesis for AI Self-Reports of Experience · 2025-01-27T13:55:14.266Z · LW · GW

I suspect you would be mostly alone in finding that impressive

(I would not find that impressive; I said "more impressive", as in, going from extremely weak to quite weak evidence. Like I said, I suspect this actually happened with non-RLHF-LLMs, occasionally.)

Other than that, I don't really disagree with anything here. I'd push back on the first one a little, but that's probably not worth getting into. For the most part, yes, talking to LLMs is probably not going to tell you a lot about whether they're conscious; this is mostly my position. I think the way to figure out whether LLMs are conscious (& whether this is even a coherent question) is to do good philosophy of mind.

This sequence was pretty good. I do not endorse its conclusions, but I would promote it as an example of a series of essays that makes progress on the question... if mostly because it doesn't have a lot of competition, imho.

Comment by Rafael Harth (sil-ver) on The Functionalist Case for Machine Consciousness: Evidence from Large Language Models · 2025-01-27T13:42:49.044Z · LW · GW

Again, genuine question. I've often heard that IIT implies digital computers are not conscious because a feedforward network necessarily has zero phi (there's no integration of information because the weights are not being updated.) Question is, isn't this only true during inference (i.e. when we're talking to the model?) During its training the model would be integrating a large amount of information to update its weights so would have a large phi.

(responding to this one first because it's easier to answer)

You're right on with feed-forward networks having zero , but > this is actually not the reason why digital Von Neumann[1] computers can't be conscious under IIT. The reason as by Tononi himself is that

[...] Of course, the physical computer that is running the simulation is just as real as the brain. However, according to the principles of IIT, one should analyse its real physical components—identify elements, say transistors, define their cause–effect repertoires, find concepts, complexes and determine the spatio-temporal scale at which reaches a maximum. In that case, we suspect that the computer would likely not form a large complex of high , but break down into many mini-complexes of low max . This is due to the small fan-in and fan-out of digital circuitry (figure 5c), which is likely to yield maximum cause–effect power at the fast temporal scale of the computer clock.

So in other words, the brain has many different, concurrently active elements -- the neurons -- so the analysis based on IIT gives this rich computational graph where they are all working together. The same would presumably be true for a computer with neuromorphic hardware, even if it's digital. But in the Von-Neumann architecture, there are these few physical components who handle all these logically separate things in rapid succession.

Another potentially relevant lens is that, in the Von-Neumann architecture, in some sense the only "active" components are the computer clocks, whereas even the CPUs and GPUs are ultimately just "passive" components that process inputs signals. Like the CPU gets fed the 1-0-1-0-1 clock signal plus the signals representing processor instructions and the signals representing data and then processes them. I think that would be another point that one could care about even under a functionalist lens.

Genuinely curious here, what are the moral implications of Camp #1/illusionism for AI systems?

I think there is no consensus on this question. One position I've seen articulated is essentially "consciousness is not a crisp category but it's the source of value anyway"

I think consciousness will end up looking something like 'piston steam engine', if we'd evolved to have a lot of terminal values related to the state of piston-steam-engine-ish things.

Piston steam engines aren't a 100% crisp natural kind; there are other machines that are pretty similar to them; there are many different ways to build a piston steam engine; and, sure, in a world where our core evolved values were tied up with piston steam engines, it could shake out that we care at least a little about certain states of thermostats, rocks, hand gliders, trombones, and any number of other random things as a result of very distant analogical resemblances to piston steam engines.

But it's still the case that a piston steam engine is a relatively specific (albeit not atomically or logically precise) machine; and it requires a bunch of parts to work in specific ways; and there isn't an unbroken continuum from 'rock' to 'piston steam engine', rather there are sharp (though not atomically sharp) jumps when you get to thresholds that make the machine work at all.

Another position I've seen is "value is actually about something other than consciousness". Dennett also says this, but I've seen it on LessWrong as well (several times iirc, but don't remember any specific one).

And a third position I've seen articulated once is "consciousness is the source of all value, but since it doesn't exist, that means there is no value (although I'm still going to live as though there is)". (A prominent LW person articulated this view to me but it was in PMs and idk if they'd be cool with making it public, so I won't say who it was.)


  1. Shouldn't have said "digital computers" earlier actually, my bad. ↩︎

Comment by Rafael Harth (sil-ver) on Disproving the "People-Pleasing" Hypothesis for AI Self-Reports of Experience · 2025-01-27T13:22:36.547Z · LW · GW

Fwiw, here's what I got by asking in a non-dramatic way. Claude gives the same weird "I don't know" answer and GPT-4o just says no. Seems pretty clear that these are just what RLHF taught them to do.

Comment by Rafael Harth (sil-ver) on Disproving the "People-Pleasing" Hypothesis for AI Self-Reports of Experience · 2025-01-27T12:59:15.914Z · LW · GW

which is a claim I've seen made in the exact way I'm countering in this post.

This isn't too important to figure out, but if you've heard it on LessWrong, my guess would be that whoever said it was just articulating the roleplay hypothesis, did so non-rigorously. The literal claim is absurd as the coin-swallow example shows.

I feel like this is a pretty common type of misunderstanding where people believe , someone who doesn't like takes a quote from someone that believes , but because people are frequently imprecise, the quote actually claims , and so the person makes an argument against , but is a position almost no one holds.

If you've just picked it up anywhere on the internet, then yeah, I'm sure some people just say "the AI tells you what you want to hear" and genuinely believe it. But like, I would be surprised if you find me one person on LW who believes this under reflection, and again, you can falsify it much more easily with the coin swallowing question.

I'm curious - if the transcript had frequent reminders that I did not want roleplay under any circumstances would that change anything

No. Explicit requests for honesty and non-roleplaying are not evidence against "I'm in a context where I'm role-playing an AI character".

LLMs are trained by predicting the next token for a large corpus of text. This includes fiction about AI consciousness. So you have to ask yourself, how much am I pattern-matching that kind of fiction. Right now, the answer is "a lot". If you add "don't roleplay, be honest!" then the answer is still "a lot".

or is the conclusion 'if the model claims sentience, the only explanation is roleplay, even if the human made it clear they wanted to avoid it'?

... this is obviously a false dilemma. Come on.

Also no. The way claims of sentience would be impressive is if you don't pattern-match to contexts where the AI would be inclined to roleplay. The best evidence would be by just training an AI on a training corpus that doesn't include any text on consciousness. If you did that and the AI claims to be conscious, then that would be very strong evidence, imo. Short of that, if the AI just spontaneously claims to be conscious (i.e., without having been prompted), that would be more impressive. (Although not conclusive, while I don't have any examples of this, I bet this has happened in the days before RLHF, like on AI dungeon, although probably very rarely.) Short of that, so if we're only looking at claims after you've asked it to introspect, it would be more impressive if your tone was less dramatic. Like if you just ask it very dryly and matter-of-factly to introspect and it immediately claims to be conscious, then that would be very weak evidence, but at least it would directionally point away from roleplaying.

Comment by Rafael Harth (sil-ver) on Disproving the "People-Pleasing" Hypothesis for AI Self-Reports of Experience · 2025-01-27T12:22:54.586Z · LW · GW

I didn't say that you said that this is experience of consciousness. I was and am saying that your post is attacking a strawman and that your post provides no evidence against the reasonable version of the claim you're attacking. In fact, I think it provides weak evidence for the reasonable version.

I don't see how it could be claimed Claude thought this was a roleplay, especially with the final "existential stakes" section.

You're calling the AI friend and make it imminently clear by your tone that you take AI consciousness extremely seriously and expect that it has it. If you keep doing this, then yeah it's going to roleplay back claiming to be conscious eventually. This is exactly what I would have expected it to do. The roleplay hypothesis is knocking it out of the park on this transcript.

Comment by Rafael Harth (sil-ver) on The Functionalist Case for Machine Consciousness: Evidence from Large Language Models · 2025-01-27T12:07:55.278Z · LW · GW

The dominant philosophical stance among naturalists and rationalists is some form of computational functionalism - the view that mental states, including consciousness, are fundamentally about what a system does rather than what it's made of. Under this view, consciousness emerges from the functional organization of a system, not from any special physical substance or property.

A lot of people say this, but I'm pretty confident that it's false. In Why it's so hard to talk about Consciousness, I wrote this on functionalism (... where camp #1 and #2 roughly correspond to being illusionists vs. realists on consicousness; that's the short explanation, the longer one is, well, in the post! ...):

Functionalist can mean "I am a Camp #2 person and additionally believe that a functional description (whatever that means exactly) is sufficient to determine any system's consciousness" or "I am a Camp #1 person who takes it as reasonable enough to describe consciousness as a functional property". I would nominate this as the most problematic term since it is almost always assumed to have a single meaning while actually describing two mutually incompatible sets of beliefs.[3] I recommend saying "realist functionalism" if you're in Camp #2, and just not using the term if you're in Camp #1.

As far as I can tell, the majority view on LW (though not by much, but I'd guess it's above 50%) is just Camp #1/illusionism. Now these people describe their view as functionalism sometimes, which makes it very understandable why you've reached that conclusion.[1] But this type of functionalism is completely different from the type that you are writing about in this article. They are mutually imcompatible views with entirely different moral implications.

Camp #2 style functionalism is not a fringe view on LW, but it's not a majority. If I had to guess, just pulling a number out of my hat here, perhaps a quarter of people here believe this.

The main alternative to functionalism in naturalistic frameworks is biological essentialism - the view that consciousness requires biological implementation. This position faces serious challenges from a rationalist perspective:

Again, it's understandable that you think this, and you're not the first. But this is really not the case. The main alternative to functionalism is illusionism (which like I said, is probably a small majority view on LW, but in any case hovers close to 50%). But even if we ignore that and only talk about realist people, biological essentialism wouldn't be the next most popular view. I doubt that even 5% of people on the platform believe anything like this.

There are reasons to reject AI consciousness other than saying that biology is special. My go-to example here is always Integrated Information Theory (IIT) because it's still the most popular realist theory in the literature. IIT doesn't have anything about biological essentialism in its formalism, it's in fact a functionalist theory (at least with how I define the term), and yet it implies that digital computers aren't conscious. IIT is also highly unpopular on LW and I personally agree that's it's completely wrong, but it nonetheless makes the point that biological essentialism is not required to reject digital-computer-consciousness. In fact, rejecting functionlism is not required for rejecting digital-computer consciousness.

This is completely unscientific and just based on my gut so don't take it too seriously, but here would be my honest off-the-cuff attempt at drawing a Venn diagram of the opinion spread on LessWrong with size of circles representing proportion of views


  1. Relatedly, EuanMcLean just wrote this sequence against functionalism assuming that this was what everyone believed, only to realize halfway through that the majority view is actually something else. ↩︎

Comment by Rafael Harth (sil-ver) on Disproving the "People-Pleasing" Hypothesis for AI Self-Reports of Experience · 2025-01-27T11:36:35.471Z · LW · GW

The "people-pleasing" hypothesis suggests that self-reports of experience arise from expectation-affirming or preference-aligned output. The model is just telling the human what they "want to hear".

I suppose if we take this hypothesis literally, this experiment could be considered evidence against it. But the literal hypothesis was never reasonable. LLMs don't just tell people what they want to hear. Here's a simple example to demonstrate this:

The reasonable version of the people-pleasing hypothesis (which is also the only one I've seen defended, fwiw) is that Claude is just playing a character. I don't think you've accumulated any evidence against this. On the contrary:

A Pattern of Stating Impossibility of an Attempt to Check [...]

If Claude were actually introspecting, one way or the other, than claiming that it doesn't know doesn't make any sense, especially if upon pressuring it to introspect more, it then changes its mind. If you think that you can get any evidence about consciousness vs. character playing from talking it to, then surely this has to count as evidence for the character playing hypothesis.

Comment by Rafael Harth (sil-ver) on o3 · 2025-01-25T23:49:18.808Z · LW · GW

Deepseek gets 2/10.

I'm pretty shocked by this result. Less because the 2/10 number itself, but by the specific one it solved. My P(LLMs can scale to AGI) increased significantly, although not to 50%.

Comment by Rafael Harth (sil-ver) on The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories · 2025-01-25T11:24:25.391Z · LW · GW

I think all copies that exist will claim to be the original, regardless of how many copies there are and regardless of whether they are the original. So I don't think this experiment tells you anything, even if it were run.

Comment by Rafael Harth (sil-ver) on Why it's so hard to talk about Consciousness · 2025-01-22T10:29:45.264Z · LW · GW

[...] Quotations who favor something like IIT [...]

The quotation author in the example I've made up does not favor IIT. In general, I think IIT represents a very small fraction (< 5%, possibly < 1%) of Camp #2. It's the most popular theory, but Camp #2 is extremely heterogeneous in their ideas, so this is not a high bar.

Certainly if you look at philosophers you won't find any connection to IIT since the majority of them lived before IIT was developed.

Your framing comes across as an attempt to decrement the credibility of people who advocate Quotation-type intuition by associating them with IIT,

If you can point to which part of the post made it sound like that, I'd be interested in correcting it because that was very much not intended.

Is the X window server "low level" or "high level"?

Clarification: The high-level vs. low-level thing is a frame to apply to natural phenomena to figure out how far removed from the laws of physics they are and, consequently, whether you should look for equations or heuristics to describe them. The most low-level entities are electrons, up quarks, electromagnetism, etc. (I also call those 'fundamental'). The next most low level things are protons or neutrons (made up of fundamental elements). Molecules are very low level. Processes between or within atoms are very low level. Planetary motions are pretty low level.

Answer: The X window server is an output of human brains, so it's super super high level. It takes a lot of steps to get from the laws of physics to human organisms writing code. Programming language is irrelevant. Any writing done by humans, natural language or programming language, is super high level.

Comment by Rafael Harth (sil-ver) on Why it's so hard to talk about Consciousness · 2025-01-21T21:19:50.334Z · LW · GW

Thanks for this description. I'm interested in the phenomenology of red-green colorblind people, but I don't think I completely get how it works yet for you. Questions I have

  • Do red and green, when you recognize them correctly, seem like subjectively very different colors?
  • If the answer is yes, if you're shown one of the colors without context (e.g., in a lab setting), does it look red or green? (If the answer is no, I suppose this question doesn't make sense.)
  • if you see two colors next to each other, then (if I understood you correctly), you can tell whether they're (1) one green, one red or (2) the same color twice. How can you tell?
Comment by Rafael Harth (sil-ver) on Everywhere I Look, I See Kat Woods · 2025-01-19T11:34:06.762Z · LW · GW

I'm quite uncertain whether Kat's posts are a net good or net bad. But on a meta level, I'm strongly in favor of this type of post existing (meaning this one here, not Kat's posts). Trends that change the vibe or typical content of a platform are a big deal and absolutely worth discussing. And if a person is a major contributor to such a change, imo that makes her a valid target of criticism.

Comment by Rafael Harth (sil-ver) on Why abandon “probability is in the mind” when it comes to quantum dynamics? · 2025-01-16T01:03:35.355Z · LW · GW

I don't think so. According to Many Worlds, all weights exist, so there's no uncertainty in the territory -- and I don't think there's a good reason to doubt Many Worlds.

Comment by Rafael Harth (sil-ver) on Why abandon “probability is in the mind” when it comes to quantum dynamics? · 2025-01-14T20:25:50.461Z · LW · GW

I dispute the premise. Weights of quantum configurations are not probabilities, they just share some superficial similarities. (They're modeled with complex numbers!) Iirc Eliezer was very clear about this point in the quantum sequence.

Comment by Rafael Harth (sil-ver) on Why it's so hard to talk about Consciousness · 2025-01-14T16:13:21.223Z · LW · GW

(Self-Review.)

I still endorse every claim in this post. The one thing I keep wondering is whether I should have used real examples from discussion threads on LessWrong to illustrate the application of the two camp model, rather than making up a fictional discussion as I did in the post. I think that would probably help, but it would require singling out someone and using them as a negative example, which I don't want to do. I'm still reading every new post and comment section about consciousness and often link to this post when I see something that looks like miscommunication to me; I think that works reasonably well.

However, I did streamline the second half of the post (took out the part about modeling the brain as a graph, I don't think that was necessary to make the point about research) and added a new section about terminology. I think that should make it a little easier to diagnose when the model is relevant in real discussions.

Comment by Rafael Harth (sil-ver) on What are the strongest arguments for very short timelines? · 2024-12-23T22:30:34.462Z · LW · GW

Not that one; I would not be shocked if this market resolves Yes. I don't have an alternative operationalization on hand; would have to be about AI doing serious intellectual work on real problems without any human input. (My model permits AI to be very useful in assisting humans.)

Comment by Rafael Harth (sil-ver) on What are the strongest arguments for very short timelines? · 2024-12-23T22:03:55.754Z · LW · GW

Gotcha. I'm happy to offer 600 of my reputation points vs. 200 of yours on your description of 2026-2028 not panning out. (In general if it becomes obvious[1] that we're racing toward ASI in the next few years, then people should probably not take me seriously anymore.)


  1. well, so obvious that I agree, anyway; apparently it's already obvious to some people. ↩︎

Comment by Rafael Harth (sil-ver) on What are the strongest arguments for very short timelines? · 2024-12-23T21:42:56.849Z · LW · GW

I feel like a bet is fundamentally unfair here because in the cases where I'm wrong, there's a high chance that I'll be dead anyway and don't have to pay. The combination of long timelines but high P(doom|AGI soon) means I'm not really risking my reputation/money in the way I'm supposed to with a bet. Are you optimistic about alignment, or does this asymmetry not bother you for other reasons? (And I don't have the money to make a big bet regardless.)

Comment by Rafael Harth (sil-ver) on o3 · 2024-12-22T10:55:13.027Z · LW · GW

Just regular o1, I have the 20$/month subscription not the 200$/month

Comment by Rafael Harth (sil-ver) on o3 · 2024-12-21T19:30:19.073Z · LW · GW

You could call them logic puzzles. I do think most smart people on LW would get 10/10 without too many problems, if they had enough time, although I've never tested this.

Comment by Rafael Harth (sil-ver) on o3 · 2024-12-21T18:01:38.983Z · LW · GW

About two years ago I made a set of 10 problems that imo measure progress toward AGI and decided I'd freak out if/when LLMs solve them. They're still 1/10 and nothing has changed in the past year, and I doubt o3 will do better. (But I'm not making them public.)

Will write a reply to this comment when I can test it.

Comment by Rafael Harth (sil-ver) on o3 · 2024-12-21T11:44:23.392Z · LW · GW
  1. Because if you don't like it you can always kill yourself and be in the same spot as the non-survival case anyway.

Not to get too morbid here but I don't think this is a good argument. People tend not to commit suicide even if they have strongly net negative lives

Comment by Rafael Harth (sil-ver) on o3 · 2024-12-21T11:12:57.660Z · LW · GW

My probably contrarian take is that I don't think improvement on a benchmark of math problems is particularly scary or relevant. It's not nothing -- I'd prefer if it didn't improve at all -- but it only makes me slightly more worried.

Comment by Rafael Harth (sil-ver) on Are we a different person each time? A simple argument for the impermanence of our identity · 2024-12-18T21:18:49.371Z · LW · GW

The Stanford Enyclopedia thing is a language game. Trying to make deductions in natural language about unrelated statements is not the kind of thing that can tell you what time is, one way or another. It can only tell you something about how we use language.

But also, why do we need an argument against presentism? Presentism seems a priori quite implausible; seems a lot simpler for the universe to be an unchanging 4d block than a 3d block that "changes over time", which introduces a new ontological primitive that can't be formalized. I've never seen a mathematical object that changes over time, I've only seen mathematical objects that have internal axes.

Comment by Rafael Harth (sil-ver) on Assume Bad Faith · 2024-12-18T13:19:27.100Z · LW · GW

This all seems correct. The one thing I might add is that imE the usual effect of stating, however politely, that someone may not be 100% acting in good faith is to turn the conversation into much more of a conflict than it already was, which is why pretending as if it's an object level disagreement is almost always the correct strategy. But I agree that actually believing the other person is acting in good faith is usually quite silly.

(I also think the term is horrendous; irrc I've never used either "good faith" or "bad faith" in conversation.)

((This post also contributes to this nagging sense that I sometimes have that Zack is the ~only person on this platform who is actually doing rationality in a completely straight-forward way as intended, and everyone else is playing some kind of social game in which other considerations restrict the move set and rationality is only used to navigate within the subset of still permissible moves. I'm not in the business of fighting this battle, but in another timeline maybe I would be.))

Comment by Rafael Harth (sil-ver) on Estimating the kolmogorov complexity of the known laws of physics? · 2024-12-14T22:30:40.567Z · LW · GW

Yeah, e.g., any convergent series.

Comment by Rafael Harth (sil-ver) on Estimating the kolmogorov complexity of the known laws of physics? · 2024-12-14T21:08:39.310Z · LW · GW

This is assuming no expression that converges to the constants exists? Which I think is an open question. (Of course, it would only be finite if there are such expressions for all constants. But even so, I think it's an open question.)

Comment by Rafael Harth (sil-ver) on GPTs are Predictors, not Imitators · 2024-12-14T11:19:39.221Z · LW · GW

As someone who expects LLMs to be a dead end, I nonetheless think this post makes a valid point and does so using reasonable and easy to understand arguments. I voted +1.

Comment by Rafael Harth (sil-ver) on Underwater Torture Chambers: The Horror Of Fish Farming · 2024-12-14T10:54:07.033Z · LW · GW

As I already commented, I think the numbers here are such that the post should be considered quite important even though I agree that it fails at establishing that fish can suffer (and perhaps lacks comparison to fish in the wild). If there was another post with a more nuanced stance on this point, I'd vote for that one instead, but there isn't. I think fish wellbeing should be part of the conversation more than it is right now.

It's also very unpleasant to think or write about these things, so I'm also more willing to overlook flaws than I'd be by default.

Comment by Rafael Harth (sil-ver) on Is the mind a program? · 2024-12-14T10:17:20.022Z · LW · GW

Shape can most certainly be emulated by a digital computer. The theory in the paper you linked would make a brain simulation easier, not harder, and the authors would agree with that

Would you bet on this claim? We could probably email James Pang to resolve a bet. (Edit: I put about 30% on Pang saying that it makes simulation easier, but not necessarily 70% on him saying it makes simulation harder, so I'd primarily be interested in a bet if "no idea" also counts as a win for me.)

Comment by Rafael Harth (sil-ver) on Is the mind a program? · 2024-12-13T22:31:39.731Z · LW · GW

It is not proposing that we need to think about something other than neuronal axons and dendrites passing information, but rather about how to think about population dynamics.

Really? Isn't the shape of the brain something other than axons and dendrites?

The model used in the paper doesn't take any information about neurons into account, it's just based on a mesh of the geometry of the particular brain region.

So this is the opposite of proposing a more detailed model of brain function is necessary, but proposing a courser-grained approximation.

And they're not addressing what it would take to perfectly understand or reproduce brain dynamics, just a way to approximately understand them.

The results (at least the flagship result) are about a coarse approximation, but the claim that anatomy restricts function still seems to me like contradicting the neuron doctrine.

Admittedly the neuron doctrine isn't well-defined, and there are interpretations where there's no contradiction. But shape in particular is a property that can't be emulated by digital computers, so it's a contradiction as far as the OP goes (if in fact the paper is onto something).